Re: Kafka Group coordinator discovery failing for subsequent restarts

Did you see the warning "Error connecting to node" on consumer log?

Best,
Lisheng

Hrishikesh Mishra <sd.hrishi@gmail.com> 于2019年8月29日周四下午2:45写道：

> Please find my reply in blue colour:
>
>
>
> On Thu, Aug 29, 2019 at 11:32 AM Lisheng Wang <wanglisheng81@gmail.com>
> wrote:
>
> > Hi
> >
> > about question 1, it's dosen't matter that how many consumers in same
> > consumer group.
> >
> > So you means the broker which is coordinator did not crashed at all
> before?
> >
>
> We didn't see any shutdown error on Brokers & we faced similar problem
> with multiple coordinators.
>
>
>
> > May i know if only exact one broker(coordinator) is unavailable or many
> > are? if only exact one, you can try to transfer leader of
> _consumer_offset
> > which on that broker to another broker to see if it's no problem any
> more?
> >
> >
> It happened with multiple consumer groups.
>
>
>
>
> > i found the following issue seems similar with yours, FYR:
> >
> >
> >
> https://stackoverflow.com/questions/51952398/kafka-connect-distributed-mode-the-group-coordinator-is-not-available
> >
>
> We have gone through this link, but in our case it not feasible always to
> clean data from offset topic and restart (our cluster size is huge).
>
>
> Best,
> > Lisheng
> >
> >
> > Hrishikesh Mishra <sd.hrishi@gmail.com> 于2019年8月29日周四下午12:19写道：
> >
> > > Hi,
> > >
> > > We are facing following issues with Kafka cluster.
> > >
> > > - Kafka Version: 2.0.0
> > > - We following cluster configuration:
> > > - Number of Broker: 14
> > > - Per Broker: 37GB Memory and 14 Cores.
> > > - Topics: 40 - 50
> > > - Partitions per topic: 32
> > > - Replicas: 3
> > > - Min In Sync Replica: 2
> > > - __consumer_topic partition: 50
> > > - offsets.topic.replication.factor=3
> > > - default.replication.factor=3
> > > - Consumers#: ~4000 (will grow to ~7K)
> > > - Consumer Groups#: ~4000 (will grow to ~7K)
> > >
> > >
> > > Imp: Here one consumer is consuming from one topic and one consumer
> > group
> > > has only one consumer due to some architectural constraints.
> > >
> > > Two major problems we are facing with consumer group:
> > >
> > > - First time when we are starting consumer with new group name it
> > > working very well. But subsequent restart (with previous / older
> group
> > > name) is causing problems from some consumers. We are getting
> > following
> > > errors:
> > >
> > > INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]:
> > [Consumer
> > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> > > Discovered
> > > group coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null)
> > > INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]:
> > [Consumer
> > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> Group
> > > coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null) is
> > > unavailable
> > > or invalid, will attempt rediscovery
> > > INFO [2019-08-28 19:05:34,582] [main] [AbstractCoordinator]:
> > [Consumer
> > > clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
> > > groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2]
> > > Discovered
> > > group coordinator 10.32.197.112:9092 (id: 2147483631 rack: null)
> > >
> > > These messages are keep coming and consumer not able to start /
> poll.
> > > But if we change the group name then it works first time without any
> > > issue
> > > (and fails in subsequent restart). So it also means that there is no
> > > with
> > > issue broker. Will it because of having single consumer in consumer
> > > group,
> > > if yes then what will be the work around here?
> > >
> > > - The second error, we are getting when consumer is up and running.
> > Then
> > > after couple hours, it starts failing and throwing following error:
> > > Consumer clientId=banneXXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX,
> > > groupId=bannerXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX] Offset commit
> > > failed
> > > on partition banneXXXX-7 at offset 13711176: This is not the correct
> > > coordinator
> > > [Consumer
> > >
> > >
> > clientId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2,
> > >
> > groupId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2]
> > > Offset commit failed on partition banXXerGrXXMXX-8 at offset 14741:
> > > This is
> > > not the correct coordinator.
> > >
> > >
> > > I wanted to know following things:
> > >
> > > - What is the max limit of consumer groups in a Kafka cluster, I
> > didn't
> > > find any limitation on internet, all places it mentioned that
> limited
> > > by OS.
> > > - Is there a problem of a consumer group has only one consumer.
> > > - Is there some problem with my Kafka configuration,
> > >
> > >
> > >
> > >
> > > Regards
> > > Hrishikesh
> > >
> >
>

Kafka

Search This Blog

Re: Kafka Group coordinator discovery failing for subsequent restarts

Comments

Post a Comment