Skip to main content

Kafka Group coordinator discovery failing for subsequent restarts

Hi,

We are facing following issues with Kafka cluster.

- Kafka Version: 2.0.0
- We following cluster configuration:
- Number of Broker: 14
- Per Broker: 37GB Memory and 14 Cores.
- Topics: 40 - 50
- Partitions per topic: 32
- Replicas: 3
- Min In Sync Replica: 2
- __consumer_topic partition: 50
- offsets.topic.replication.factor=3
- default.replication.factor=3
- Consumers#: ~4000 (will grow to ~7K)
- Consumer Groups#: ~4000 (will grow to ~7K)


Imp: Here one consumer is consuming from one topic and one consumer group
has only one consumer due to some architectural constraints.

Two major problems we are facing with consumer group:

- First time when we are starting consumer with new group name it
working very well. But subsequent restart (with previous / older group
name) is causing problems from some consumers. We are getting following
errors:

INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]: [Consumer
clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] Discovered
group coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null)
INFO [2019-08-28 19:05:34,481] [main] [AbstractCoordinator]: [Consumer
clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] Group
coordinator 10.XX.XXX.112:9092 (id: 2147483631 rack: null) is unavailable
or invalid, will attempt rediscovery
INFO [2019-08-28 19:05:34,582] [main] [AbstractCoordinator]: [Consumer
clientId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2,
groupId=djXXX#XXX-XXX-XXX-XX-5-1478729-XX-XXXXX-XX-ingestion-v2] Discovered
group coordinator 10.32.197.112:9092 (id: 2147483631 rack: null)

These messages are keep coming and consumer not able to start / poll.
But if we change the group name then it works first time without any issue
(and fails in subsequent restart). So it also means that there is no with
issue broker. Will it because of having single consumer in consumer group,
if yes then what will be the work around here?

- The second error, we are getting when consumer is up and running. Then
after couple hours, it starts failing and throwing following error:
Consumer clientId=banneXXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX,
groupId=bannerXXX#XX-XXX-XXX-XXX-X-1388688-XXX-XXXXX] Offset commit failed
on partition banneXXXX-7 at offset 13711176: This is not the correct
coordinator
[Consumer
clientId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2,
groupId=banXXerGrXXMXX#XX-XX-XXXXX-XXX-5-1478733-XXX-XXXXX-ingestion-v2]
Offset commit failed on partition banXXerGrXXMXX-8 at offset 14741: This is
not the correct coordinator.


I wanted to know following things:

- What is the max limit of consumer groups in a Kafka cluster, I didn't
find any limitation on internet, all places it mentioned that limited by OS.
- Is there a problem of a consumer group has only one consumer.
- Is there some problem with my Kafka configuration,




Regards
Hrishikesh

Comments