When using a consumer created with librdkafka to receive messages from Kafka, intermittent message latency issues are observed. The time difference between message receipt and the timestamp in the message body exceeds 1 second, although most messages are received within about 10ms.
Environment Information
Software Versions
librdkafka version: 2.11.0
Operating System: CentOS 7.6
Kafka version: 3.6.2 (zookeeper mode deployment)
Kafka Cluster
Number of nodes: 3 nodes
Server configuration: 64 vCPU, 128GB RAM
Network: Gigabit network, connected to the same switch, low network latency
disk: HDD RAID1
Topic Configuration
Test Topic (test):
Partitions: 1
Replicas: 2
message.timestamp.type=LogAppendTime
min.insync.replicas=1
Load Topics (testA, testB, testC, testD):
Each topic: 128 partitions, 2 replicas
Total message rate: 80,000 messages/second (20,000 messages/second per topic)
Message size: 500 bytes per message
Consumer Configuration (librdkafka)
fetch.wait.max.ms: 10 (500 still have this issue,so i change to 10)
All other configurations are librdkafka defaults
Reproduction Steps
Create four load topics (testA, testB, testC, testD), each with 128 partitions and 2 replicas
Deploy test programs to send 20,000 messages per second (500 bytes each) to each of the four load topics
Create test topic test with the configuration mentioned above
Use a test program to send 1 message per second (100 bytes) to the test topic
Create a consumer that subscribes to the test topic
The consumer prints the received message time and the timestamp in the message
Observe that most messages are received within about 10ms, but occasionally messages are delayed by more than 1 second
Key Observations
The test consumer program runs on the partition leader node of the test topic (eliminating node clock differences)
Intermittent latency occurs under high load (80k msg/s)
Low-throughput topic (1 msg/s) experiences delays in a high-throughput background
Latency is intermittent, not continuous
Using librdkafka version 2.11.0, CentOS 7.6 operating system
I used tcpdump to capture network packets and observed that the consumer frequently initiates fetch request requests, and Kafka's fetch responses are also very fast. However, it requires multiple requests and responses before the message can be received, which is the main source of the delay.
Perhaps this is not an issue with librdkafka. I set log.cleaner.threads=4 and num.replica.fetchers=4 on Kafka, but the problem still persists. After upgrading Kafka to version 4.0 with Kraft deployment and following the same test steps, the delay issue still exists, but the frequency is much lower, with delays around 500ms.
Does anyone have new directions to suggest for further troubleshooting this issue?
| |
杜杰
|
Comments
Post a Comment