Hi Zach,
Any issues with partitions broker 2 is leader of?
Also, have you checked b2's server.log?
Cheers,
Liam Clarke-Hutchinson
On Wed, 1 Apr. 2020, 11:02 am Zach Cox, <zcox522@gmail.com> wrote:
> Hi - We have a small Kafka 2.0.0 (Zookeeper 3.4.13) cluster with 3 brokers:
> 0, 1, and 2. Each broker is in a separate rack (Azure zone).
>
> Recently there was an incident, where Kafka brokers and Zookeeper nodes
> restarted, etc. After that occurred, we've had problems where broker 2 is
> consistently out of many ISRs. A pattern we've observed is that broker 2
> will not be in any ISRs of partitions where broker 0 is leader, but will be
> in ISRs of partitions where broker 1 is leader. Then at some point the
> controller will change to a different broker, then 2 will not be in any
> ISRs where 1 is leader, but will be in ISRs where 0 is leader. Each time
> controller changes, this "flip flopping" of 2 in/out of ISRs changes. No
> matter what, 2 never seems to get into all ISRs.
>
> For topics with replicas=3, min.insync.replicas=2, and producers with
> acks=all, we only ever have ISR=(0,1), and occasionally 0 or 1 also briefly
> falls out of ISR, leading to producer retries and sometimes send failures
> for producers that use retries=3.
>
> Any ideas what might be happening here, and how we could fix it? Or
> additional data we could collect to try to diagnose the problem? We are
> planning to upgrade this cluster as soon as we get it working correctly.
>
> Thanks,
> Zach
>
Any issues with partitions broker 2 is leader of?
Also, have you checked b2's server.log?
Cheers,
Liam Clarke-Hutchinson
On Wed, 1 Apr. 2020, 11:02 am Zach Cox, <zcox522@gmail.com> wrote:
> Hi - We have a small Kafka 2.0.0 (Zookeeper 3.4.13) cluster with 3 brokers:
> 0, 1, and 2. Each broker is in a separate rack (Azure zone).
>
> Recently there was an incident, where Kafka brokers and Zookeeper nodes
> restarted, etc. After that occurred, we've had problems where broker 2 is
> consistently out of many ISRs. A pattern we've observed is that broker 2
> will not be in any ISRs of partitions where broker 0 is leader, but will be
> in ISRs of partitions where broker 1 is leader. Then at some point the
> controller will change to a different broker, then 2 will not be in any
> ISRs where 1 is leader, but will be in ISRs where 0 is leader. Each time
> controller changes, this "flip flopping" of 2 in/out of ISRs changes. No
> matter what, 2 never seems to get into all ISRs.
>
> For topics with replicas=3, min.insync.replicas=2, and producers with
> acks=all, we only ever have ISR=(0,1), and occasionally 0 or 1 also briefly
> falls out of ISR, leading to producer retries and sometimes send failures
> for producers that use retries=3.
>
> Any ideas what might be happening here, and how we could fix it? Or
> additional data we could collect to try to diagnose the problem? We are
> planning to upgrade this cluster as soon as we get it working correctly.
>
> Thanks,
> Zach
>
Comments
Post a Comment