Hi everyone,
I recently observed some behavior in a KRaft-based Kafka cluster (v3.7.2)
that I was hoping to get some technical clarification on.
My Setup:
-
*Architecture:* 3 Controllers / 3 Brokers (KRaft mode).
-
*Version:* 3.7.2.
The Scenario:
1.
I brought down *2 out of 3 controllers*, effectively breaking the KRaft
quorum. As expected, describe topics and metadata quorum commands failed.
2.
While the quorum was down, I brought down *1 Broker*.
3.
*Producer Behavior:* Remained fully functional. It automatically
redirected messages to the remaining nodes for the affected partition
leaders without issue.
4.
*Consumer Behavior:* The consumer initially stopped when the broker went
down, but upon restarting the consumer, it resumed processing perfectly
fine. I also waited for 5 mins for any caches to clear, it was able to
consume after restart. Though this wasn't the case across all tests...