Hi Prateek, I'm not sure how you are testing this. A Kafka Connect
cluster in distributed mode uses the group management protocol to
coordinate (distribute tasks across workers). This is set to
"sessioned" by default, which aims to minimize task movements during
rebalancing.
On Mon, Jun 16, 2025 at 6:04 AM Prateek Kohli
<prateek.kohli@ericsson.com.invalid> wrote:
>
> Thanks a lot @Vignesh & @Raphael Mazelier for your detailed replies.
>
> Even I thought the same, but I read this and now I'm a bit confused.
>
> "In a Kafka Connect cluster, each worker node is identified by its advertised address. This identity is crucial because connectors and tasks are assigned to specific workers based on it.
>
> When you use a Kubernetes Deployment, rolling updates result in Pods being recreated with new IPs and hostnames. Kafka Connect interprets these as entirely new worker nodes joining the cluster, while the old ones are seen as having left.
>
> As a result, Kafka Connect takes some time (typically around 5 minutes) to recognize that the old nodes have departed and to reassign their tasks to the remaining active workers. During this delay, some tasks may remain inactive, leading to reduced service availability."
>
> Strimzi also switched to using StrimziPodSet some time ago because of this issue.
>
> https://github.com/strimzi/strimzi-kafka-operator/pull/8090
>
> https://github.com/strimzi/strimzi-kafka-operator/issues/4676
>
> Thanks
>
> -----Original Message-----
> From: Vignesh <davidvikimca@gmail.com>
> Sent: 16 June 2025 01:34
> To: users@kafka.apache.org
> Subject: Re: Kafka Connect on Kubernetes: Statefulset vs Deployment
>
> [You don't often get email from davidvikimca@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Kafka Connect is a stateless component by design. It relies on external Kafka topics to persist its state, including connector configurations, offsets, and status updates. In a distributed Kafka Connect cluster, this state is managed through the following configurable topics:
>
> -
>
> config.storage.topic – stores connector configurations
> -
>
> offset.storage.topic – stores source connector offsets
> -
>
> status.storage.topic – stores the status of connectors and tasks
>
> Because Kafka Connect does not maintain any state locally, it is not dependent on a specific IP address or hostname. As a result, it is best to deploy Kafka Connect using a *Kubernetes Deployment* rather than a *StatefulSet*, since Deployments are better suited for stateless applications and provide more flexibility with scaling and rolling updates.
>
> Additionally, it is common practice to expose the Kafka Connect REST API via an *Ingress*, allowing external systems to submit and manage connectors.
> We have deployed several instances of this as deployment for our use case from below repo - FYR
> https://github.com/ibm-messaging/kafka-connect-mq-source
>
> Thanks,
> Vignesh
>
> On Sun, Jun 15, 2025 at 12:12 AM Prateek Kohli <prateekkohli2112@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I'm building a custom Docker image for kafka Connect and planning to
> > run it on Kubernetes. I'm a bit stuck on whether I should use a
> > Deployment or a StatefulSet.
> >
> > From what I understand, the main difference that could affect Kafka
> > Connect is the hostname/IP behaviour. With a Deployment, pod IPs and
> > hostnames can change after restarts. With a StatefulSet, each pod gets
> > a stable hostname (like connect-0, connect-1, etc.)
> >
> > My question is: Does it really matter for Kafka Connect if the pod
> > IPs/hostname change, considering its a stateless application?
> >
> > Thanks
> >
cluster in distributed mode uses the group management protocol to
coordinate (distribute tasks across workers). This is set to
"sessioned" by default, which aims to minimize task movements during
rebalancing.
On Mon, Jun 16, 2025 at 6:04 AM Prateek Kohli
<prateek.kohli@ericsson.com.invalid> wrote:
>
> Thanks a lot @Vignesh & @Raphael Mazelier for your detailed replies.
>
> Even I thought the same, but I read this and now I'm a bit confused.
>
> "In a Kafka Connect cluster, each worker node is identified by its advertised address. This identity is crucial because connectors and tasks are assigned to specific workers based on it.
>
> When you use a Kubernetes Deployment, rolling updates result in Pods being recreated with new IPs and hostnames. Kafka Connect interprets these as entirely new worker nodes joining the cluster, while the old ones are seen as having left.
>
> As a result, Kafka Connect takes some time (typically around 5 minutes) to recognize that the old nodes have departed and to reassign their tasks to the remaining active workers. During this delay, some tasks may remain inactive, leading to reduced service availability."
>
> Strimzi also switched to using StrimziPodSet some time ago because of this issue.
>
> https://github.com/strimzi/strimzi-kafka-operator/pull/8090
>
> https://github.com/strimzi/strimzi-kafka-operator/issues/4676
>
> Thanks
>
> -----Original Message-----
> From: Vignesh <davidvikimca@gmail.com>
> Sent: 16 June 2025 01:34
> To: users@kafka.apache.org
> Subject: Re: Kafka Connect on Kubernetes: Statefulset vs Deployment
>
> [You don't often get email from davidvikimca@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Kafka Connect is a stateless component by design. It relies on external Kafka topics to persist its state, including connector configurations, offsets, and status updates. In a distributed Kafka Connect cluster, this state is managed through the following configurable topics:
>
> -
>
> config.storage.topic – stores connector configurations
> -
>
> offset.storage.topic – stores source connector offsets
> -
>
> status.storage.topic – stores the status of connectors and tasks
>
> Because Kafka Connect does not maintain any state locally, it is not dependent on a specific IP address or hostname. As a result, it is best to deploy Kafka Connect using a *Kubernetes Deployment* rather than a *StatefulSet*, since Deployments are better suited for stateless applications and provide more flexibility with scaling and rolling updates.
>
> Additionally, it is common practice to expose the Kafka Connect REST API via an *Ingress*, allowing external systems to submit and manage connectors.
> We have deployed several instances of this as deployment for our use case from below repo - FYR
> https://github.com/ibm-messaging/kafka-connect-mq-source
>
> Thanks,
> Vignesh
>
> On Sun, Jun 15, 2025 at 12:12 AM Prateek Kohli <prateekkohli2112@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I'm building a custom Docker image for kafka Connect and planning to
> > run it on Kubernetes. I'm a bit stuck on whether I should use a
> > Deployment or a StatefulSet.
> >
> > From what I understand, the main difference that could affect Kafka
> > Connect is the hostname/IP behaviour. With a Deployment, pod IPs and
> > hostnames can change after restarts. With a StatefulSet, each pod gets
> > a stable hostname (like connect-0, connect-1, etc.)
> >
> > My question is: Does it really matter for Kafka Connect if the pod
> > IPs/hostname change, considering its a stateless application?
> >
> > Thanks
> >
Comments
Post a Comment