Hi Brandon,
Which version of Kafka are the consumers running? My understanding is that if they're running a version lower than the brokers then they could be using a different format for the messages which means the brokers have to convert each record before sending to the consumer.
Thanks,
Jamie
-----Original Message-----
From: Brandon Barron <brandon.barron@live.com>
To: users@kafka.apache.org <users@kafka.apache.org>
Sent: Thu, 30 Jan 2020 16:11
Subject: High CPU in 2.2.0 kafka cluster
Hi,
We had a small cluster (4 brokers) dealing with very low throughput - a couple hundred messages per minute at the very most. In that cluster we had a little under 3300 total consumers (all were kafka streams instances). All broker CPUs were maxed out almost consistently for a few weeks.
We switched traffic to a new cluster eventually. The old cluster sitting idle for a few days was at ~40% CPU, with consumers still running. When I took down all the consumers, the idle CPU on the brokers went to about 4%.
To test, we decided to mirror active traffic in our new cluster to the old cluster (which now has no running consumers). The CPU didn't budge; it's still at ~4% as expected with the low throughput.
One more thing to add: I ran a thread profiler on a couple brokers when the old cluster was taking active traffic with running consumers and the CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.
Can you give any advice on what might be the root cause of this?
Thanks,
Brandon
Which version of Kafka are the consumers running? My understanding is that if they're running a version lower than the brokers then they could be using a different format for the messages which means the brokers have to convert each record before sending to the consumer.
Thanks,
Jamie
-----Original Message-----
From: Brandon Barron <brandon.barron@live.com>
To: users@kafka.apache.org <users@kafka.apache.org>
Sent: Thu, 30 Jan 2020 16:11
Subject: High CPU in 2.2.0 kafka cluster
Hi,
We had a small cluster (4 brokers) dealing with very low throughput - a couple hundred messages per minute at the very most. In that cluster we had a little under 3300 total consumers (all were kafka streams instances). All broker CPUs were maxed out almost consistently for a few weeks.
We switched traffic to a new cluster eventually. The old cluster sitting idle for a few days was at ~40% CPU, with consumers still running. When I took down all the consumers, the idle CPU on the brokers went to about 4%.
To test, we decided to mirror active traffic in our new cluster to the old cluster (which now has no running consumers). The CPU didn't budge; it's still at ~4% as expected with the low throughput.
One more thing to add: I ran a thread profiler on a couple brokers when the old cluster was taking active traffic with running consumers and the CPU was maxed out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.
Can you give any advice on what might be the root cause of this?
Thanks,
Brandon
Comments
Post a Comment