Kafka

Posts

Showing posts from July, 2024

Data Standardization Best Practices

Evening Team, Trying to establish a Kafka standardization for multiple organizations. We're pushing the same data sets more or less to our big data platform but there are output variations across the board. Wondering if you guys have an assessment team of sorts who can help steer us. V/r, Cameron G. Crumsey MAJ, 26A ARCYBER G36 Data Cell Fort Eisenhower, GA NIPR: cameron.g.crumsey.mil@army.mil <mailto: cameron.g.crumsey.mil@army.mil > SIPR: cameron.g.crumsey.mil@mail.smil.mil <mailto: cameron.g.crumsey.mil@mail.smil.mil > NSAnet: cgcrums@nsa.ic.gov <mailto: cgcrums@nsa.ic.gov > Cell: 254-290-2411 Office: 762-206-3809 ATW!

FW: Second attempt of migration of ZK to Kraft after rollback fails

Hi, I've tested same scenario on versions 3.7.1 and 3.8.0. Seems like the problem still exists. Does anyone managed to migrate the cluster to Kraft after performing rollback following current documentation? Any help would be appreciated. From: Zubel, Edgar Sent: Wednesday, July 24, 2024 8:48 AM To: users@kafka.apache.org Subject: Second attempt of migration of ZK to Kraft after rollback fails Hi, Currently I'm testing this workflow: 1. Migrate ZK to Kraft (v3.7.0); 2. Stop migration process after "Enter Migration Mode on the brokers" or "Migrating brokers to KRaft" completes (this leaves the opportunity to perform rollback); 3. Perform rollback; 4. Migrate ZK to Kraft once again; After testing this scenario I was not able to perform successful migration on the second attempt after the rollback was completed. Logs that I get after migrating the cluster second time: 1. On the first node kafka logs are spammin...

Re: kafka-producer-perf-test.sh transaction support?

Hi Anindya, Currently, setting transaction.id in producer props won't enable transactions. Users need to set --transaction-duration-ms to enable transactions in kafka-producer-perf-test. We recognize this can be confusing, so there's a JIRA ticket for it: https://issues.apache.org/jira/browse/KAFKA-16900 I've submitted a pull request to address this issue. With the update, users can enable transactions by either: - Setting transaction.id =<id> via --producer-props - Setting transaction.id =<id> in the config file via --producer.config - Setting --transaction-id <id> - Setting --transaction-duration-ms=<ms> I believe this will resolve your issue. If you have any thoughts or suggestions, please feel free to share them on the PR or JIRA. On 2019/10/18 19:18:15 Anindya Haldar wrote: > Anyone who might have experienced this, or have a known solution? Would appreciate some insights here. > > Sincerely, > Anindya Halda...

Re: [kafka-clients] [ANNOUNCE] Apache Kafka 3.8.0

+1. Thanks, Josep! Colin On Mon, Jul 29, 2024, at 10:32, Chris Egerton wrote: > Thanks for running the release, Josep! > > > On Mon, Jul 29, 2024, 13:31 'Josep Prat' via kafka-clients < kafka-clients@googlegroups.com > wrote: >> The Apache Kafka community is pleased to announce the release for Apache >> Kafka 3.8.0 >> >> This is a minor release and it includes fixes and improvements from 456 >> JIRAs. >> >> All of the changes in this release can be found in the release notes: >> https://www.apache.org/dist/kafka/3.8.0/RELEASE_NOTES.html >> >> An overview of the release can be found in our announcement blog post: >> https://kafka.apache.org/blog#apache_kafka_380_release_announcement >> >> You can download the source and binary release (Scala 2.12 and Scala >> 2.13) from: >> https://kafka.apache.org/downloads#3.8.0 >> >> -...

Re: [kafka-clients] [ANNOUNCE] Apache Kafka 3.8.0

Thanks for running the release, Josep! On Mon, Jul 29, 2024, 13:31 'Josep Prat' via kafka-clients < kafka-clients@googlegroups.com > wrote: > The Apache Kafka community is pleased to announce the release for Apache > Kafka 3.8.0 > > This is a minor release and it includes fixes and improvements from 456 > JIRAs. > > All of the changes in this release can be found in the release notes: > https://www.apache.org/dist/kafka/3.8.0/RELEASE_NOTES.html > > An overview of the release can be found in our announcement blog post: > https://kafka.apache.org/blog#apache_kafka_380_release_announcement > > You can download the source and binary release (Scala 2.12 and Scala > 2.13) from: > https://kafka.apache.org/downloads#3.8.0 > > > --------------------------------------------------------------------------------------------------- > > > Apache Kafka is a distributed streaming platform with fou...

[ANNOUNCE] Apache Kafka 3.8.0

The Apache Kafka community is pleased to announce the release for Apache Kafka 3.8.0 This is a minor release and it includes fixes and improvements from 456 JIRAs. All of the changes in this release can be found in the release notes: https://www.apache.org/dist/kafka/3.8.0/RELEASE_NOTES.html An overview of the release can be found in our announcement blog post: https://kafka.apache.org/blog#apache_kafka_380_release_announcement You can download the source and binary release (Scala 2.12 and Scala 2.13) from: https://kafka.apache.org/downloads#3.8.0 --------------------------------------------------------------------------------------------------- Apache Kafka is a distributed streaming platform with four core APIs: ** The Producer API allows an application to publish a stream of records to one or more Kafka topics. ** The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them. ...

Second attempt of migration of ZK to Kraft after rollback fails

Hi, Currently I'm testing this workflow: 1. Migrate ZK to Kraft (v3.7.0); 2. Stop migration process after "Enter Migration Mode on the brokers" or "Migrating brokers to KRaft" completes (this leaves the opportunity to perform rollback); 3. Perform rollback; 4. Migrate ZK to Kraft once again; After testing this scenario I was not able to perform successful migration on the second attempt after the rollback was completed. Logs that I get after migrating the cluster second time: 1. On the first node kafka logs are spamming: [2024-07-24 07:43:16,552] lvl=INFO [NodeToControllerChannelManager id=1 name=quorum] Client requested disconnect from node 3002 logger=org.apache.kafka.clients.NetworkClient [2024-07-24 07:43:16,552] lvl=INFO [zk-broker-1-to-controller-quorum-channel-manager]: Recorded new controller, from now on will use node zubeltest2.example.com:9090 (id: 3002 rack: null) logger=kafka.server.NodeToControllerRequestThread ...

Kafka Consumer Fetch position Out Of Range error

Hi, I'm using Kafka 2.5.1 broker and Kafka Connect Confluent image 7.1.1. We are using a sink connector to read from Kafka. We occasionally see Fetch Position OutOfRange error like this [2024-07-19 00:54:59,456] INFO [Consumer > clientId=connector-consumer-CPSSectorRouterTestEventSinkConnector-1, > groupId=connect-CPSSectorRouterTestEventSinkConnector] Fetch position > FetchPosition{offset=897705, offsetEpoch=Optional[0], > currentLeader=LeaderAndEpoch{leader=Optional[ 10.26.53.192:9092 (id: 2 > rack: null)], epoch=0}} is out of range for partition > producer-perf-test-v6-1, resetting offset > (org.apache.kafka.clients.consumer.internals.Fetcher) > [2024-07-19 00:54:59,458] INFO [Consumer > clientId=connector-consumer-CPSSectorRouterTestEventSinkConnector-1, > groupId=connect-CPSSectorRouterTestEventSinkConnector] Resetting offset for > partition producer-perf-test-v6-1 to position FetchPosition{offset=598398, > offsetEpoch...

Kafka 3.5.2 is able to read __consumer_offsets from partitions with metadata version 3 when inter.broker.protocol.version is set to 2.0

Hi all, I am working on upgrading the Kafka cluster from the version 2.0.1 to 3.5.2. The main concern in that update is passing version 2.1 which has the following note in the upgrade document: ``` Note that 2.1.x contains a change to the internal schema used to store consumer offsets. Once the upgrade is complete, it will not be possible to downgrade to previous versions. ``` In the rolling upgrade section the documentation says the following: ``` Restart the brokers one by one for the new protocol version to take effect. Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version. ``` So, as soon as inter.broker.protocol.version is set to 2.1+ you should not be able to rollback to the old version. I did reproduce the issue with the rollback with the following steps: 1. Start cluster with Kafka 2.0.1 binaries and inter.broker.protocol.version=2.0. Produce some messages in the test topic 2. R...

Re: Requesting a Confluence account to create a KIP for a PR

It's not in our control. The INFRA team needs to create accounts manually at this point, and we don't know at what cadence they find time to do this. Sorry :( If an account is create, I believe you should get an email. We know that this current workaround is very tedious, but there is unfortunately nothing we can do on our end. Let me follow up on the Jira ticket. -Matthias On 7/12/24 3:39 AM, Franck L. wrote: > Hello, > > I created a PR about my JIRA-16707 (finally in WIP as I will have to do > a KIP). > > I asked for an account for Confluence in JIRA > https://issues.apache.org/jira/browse/INFRA-25451 , I didn't see an > answer yet (or I missed it). > > Can anyone help me? > > Best regards >

Buffer Utilization Metric

Hello, I have set up a grafana dashboard that has the Log Cleaner metrics added when we upgraded to Apache Kafka 3.5. We didn't have these metrics before when we were using Kafka 2.7. There is one metric that always show 0%, and this is the Log Cleaner Max Buffer Utilization Percent metric. Other log cleaner metrics like the Cleaner Recopy Percent and Max Clean Time seconds are fine, I see values on those. Does anyone have a dashboard set up with this metric and are you seeing metric values > 0% ? Also, looking at the Kafka logs, there is a log that shows Buffer Utilization Percent, and I see values there, not just 0%. The Apache Kafka code is different: /* a metric to track the maximum utilization of any thread's buffer in the last cleaning */ https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogCleaner.scala#L126 /* Buffer utilization stats on logs https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/L...

Requesting a Confluence account to create a KIP for a PR

Hello, I created a PR about my JIRA-16707 (finally in WIP as I will have to do a KIP). I asked for an account for Confluence in JIRA https://issues.apache.org/jira/browse/INFRA-25451 , I didn't see an answer yet (or I missed it). Can anyone help me? Best regards

Re: Error creating a topic with 10k partitions but not altering existing topic to 10k partitions? Why

Kafka 3.6.2 – thanks, Paul From: Ömer Şiar Baysal < osiarbaysal@gmail.com > Date: Thursday, 11 July 2024 at 9:57 PM To: users@kafka.apache.org < users@kafka.apache.org > Subject: Re: Error creating a topic with 10k partitions but not altering existing topic to 10k partitions? Why [You don't often get email from osiarbaysal@gmail.com . Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] EXTERNAL EMAIL - USE CAUTION when clicking links or attachments Hi, I have never seen PolicyViolationException before with vanilla Kafka. What is the flavor? This may caused by the controller mutation rate checks introduced by KIP-599, it may be triggered for non-existing topics but not for the existing resources. Hope this help you track it down. OSB On Thu, Jul 11, 2024, 08:04 Brebner, Paul <Paul.Brebner@netapp.com.invalid> wrote: > Hi – just curious if anyone can suggest why the following occurs: > ...

Re: Error creating a topic with 10k partitions but not altering existing topic to 10k partitions? Why

Hi, I have never seen PolicyViolationException before with vanilla Kafka. What is the flavor? This may caused by the controller mutation rate checks introduced by KIP-599, it may be triggered for non-existing topics but not for the existing resources. Hope this help you track it down. OSB On Thu, Jul 11, 2024, 08:04 Brebner, Paul <Paul.Brebner@netapp.com.invalid> wrote: > Hi – just curious if anyone can suggest why the following occurs: > > 1 – try to create a topic with 10,000 partitions with Kafka CLI > (kafka-topics.sh) > Fails with ERROR org.apache.kafka.common.errors.PolicyViolationException: > Unable to perform excessively large batch operation. > > 2- create a topic with 1 partition with Kafka CLI, and alter to 10k > partitions - works fine > > Paul > > >

Error creating a topic with 10k partitions but not altering existing topic to 10k partitions? Why

Hi – just curious if anyone can suggest why the following occurs: 1 – try to create a topic with 10,000 partitions with Kafka CLI (kafka-topics.sh) Fails with ERROR org.apache.kafka.common.errors.PolicyViolationException: Unable to perform excessively large batch operation. 2- create a topic with 1 partition with Kafka CLI, and alter to 10k partitions - works fine Paul

StreamThread shutdown calls completeShutdown only in CREATED state

Hi Kafka Dev/Users, While running tests in `StreamThreadTest.java` in kafka/streams, I noticed the test left many lingering threads. Though the class runs `shutdown` after each test, the shutdown only executes `completeShutdown` if the StreamThread is in CREATED state. See https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/streams/src/test/java/org/apache/kafka/streams/processor/internals/StreamThreadTest.java#L231 and https://github.com/apache/kafka/blob/0b11971f2c94f7aadc3fab2c51d94642065a72e5/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L1435 For example, you may run test org.apache.kafka.streams.processor.internals.StreamThreadTest#shouldNotCloseTaskProducerWhenSuspending with commit 0b11971f2c94f7aadc3fab2c51d94642065a72e5. When the test calls `thread.shutdown()`, the thread is in `PARTITIONS_REVOKED` state. Thus, `completeShutdown` is not called. The test creates three lingering threads: 2 `Stat...

Re: Changing consumer group in live

In the end you create a new consumer group. That's totally fine. The old consumer group will still "exist", ie, the committed offsets for the group are still there, until they will be eventually purged by the brokers (I think the default is 7 days to keep the metadata of consumer groups, after a group becomes empty/stale). It's configurable how long to keep it -- in general, it's desirable to keep it for some time, as an app could be offline for a while but come back online later and resume where it left off. HTH -Matthias On 7/6/24 10:56 AM, Mohamed Basid wrote: > Hello, > > Let us consider one application in a consumer group consuming events from > Kafka in live. Just like any code change, I changed the consumer group name > of the application to a new group. I understand my application might start > reading from the beginning offset or the latest based on the setting. Is > there any under-the hood problem...

Re: Kafka 20k topics metadata update taking long time

Following up to check if someone can help. And is this the right approach or should i open a Jira ticket for same. Thanks, Amit On Fri, Jul 5, 2024 at 9:42 AM Amit Chopra < amit.chopra@broadcom.com > wrote: > But 20K topics is also not a realistic assumption to have. > This is our existing scale with AWS kinesis 20k shards. And we are moving > over to kafka and thus testing with equivalent scale. We are looking at an > ingestion rate of 2.5 GBps. > > I don't see an alarming difference in the latency results from the two > scenarios. > The output I shared before was showing the latency being introduced per > record being sent due to metadata lookup. It is about 3-5 ms per record. > The main aspect is that it causes only ~300 records to be sent per second > with 20k topics. While can send ~3000 records per second with 1 topic (20k > partitions). > > Also sharing the throughput difference in test results. > ...

Re: [EXTERNAL] Changing consumer group in live

@Mohamed Depending on topic retention and offset, and how much data in topic, your app could start reprocessing lot of data and eventually struggling in terms of performance. Jose Manuel Vega Monroy Software Engineer Team Leader Direct: +350 Mobile: +34(0) 633710634 WHG (International) Ltd | 6/1 Waterport Place | Gibraltar | From: Mohamed Basid <luk4mdbasid@gmail.com> Date: Monday, 8 July 2024 at 12:13 To: users@kafka.apache.org <users@kafka.apache.org> Subject: [EXTERNAL] Changing consumer group in live Hello, Let us consider one application in a consumer group consuming events from Kafka in live. Just like any code change, I changed the consumer group name of the application to a new group. I understand my application might start reading from the beginning offset or the latest based on the setting. Is there any under-the hood problem...

Re: Installing kafka

Hi Christian, Are you sure are in the same directory you downloaded the kafka_2.13-3.7.1.tgz file to? You might need to change the command to point to the right directory maybe. Best ------------------ Josep Prat Open Source Engineering Director, Aiven josep.prat@aiven.io | +491715557497 | aiven.io Aiven Deutschland GmbH Alexanderufer 3-7, 10117 Berlin Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen Amtsgericht Charlottenburg, HRB 209739 B On Mon, Jul 8, 2024, 12:12 Christian Scharrer <1christian.scharrer@gmx.de.invalid> wrote: > Hi there, > > when wanting to install Kafka using the command tar -xzf > kafka_2.13-3.7.1.tgz > > my Mac says no, produces the following error message: tar: Error opening > archive: Failed to open 'kafka_2.13-3.7.1.tgz' > > Any idea how I can still install Kafka? > > Thanks for helping. > Christian

Installing kafka

Hi there, when wanting to install Kafka using the command tar -xzf kafka_2.13-3.7.1.tgz my Mac says no, produces the following error message: tar: Error opening archive: Failed to open 'kafka_2.13-3.7.1.tgz' Any idea how I can still install Kafka? Thanks for helping. Christian

Changing consumer group in live

Hello, Let us consider one application in a consumer group consuming events from Kafka in live. Just like any code change, I changed the consumer group name of the application to a new group. I understand my application might start reading from the beginning offset or the latest based on the setting. Is there any under-the hood problem for the application or the earlier consumer group for doing the change? Any resource to read about it? Thanks Basid

Re: Kafka 20k topics metadata update taking long time

Just to make sure I am following correctly, we do not see issues with 1 topic 20k partitions. With 25 brokers, the load is fairly distributed and we can achieve aggregate throughput of ~2.5 GBps. With 20k topics with 1 partition each, we see much reduced throughput. Thanks, Amit On Fri, Jul 5, 2024 at 12:08 AM Brebner, Paul <Paul.Brebner@netapp.com.invalid> wrote: > OK so repeating with Java Kafka producer there is no problem – it's > specific to the Kafka CLI Producer! Paul > > From: Brebner, Paul <Paul.Brebner@netapp.com.INVALID> > Date: Friday, 5 July 2024 at 1:21 PM > To: users@kafka.apache.org < users@kafka.apache.org > > Subject: Re: Kafka 20k topics metadata update taking long time > EXTERNAL EMAIL - USE CAUTION when clicking links or attachments > > > Repeating my tests today with a bit more caution I can get up to around > 47,000 partitions for a single topic before the producer fails with ...

Re: Kafka 20k topics metadata update taking long time

But 20K topics is also not a realistic assumption to have. This is our existing scale with AWS kinesis 20k shards. And we are moving over to kafka and thus testing with equivalent scale. We are looking at an ingestion rate of 2.5 GBps. I don't see an alarming difference in the latency results from the two scenarios. The output I shared before was showing the latency being introduced per record being sent due to metadata lookup. It is about 3-5 ms per record. The main aspect is that it causes only ~300 records to be sent per second with 20k topics. While can send ~3000 records per second with 1 topic (20k partitions). Also sharing the throughput difference in test results. Test - 1 topic with 20k partitions ./kafka-producer-perf-test.sh --topic amit-1 --num-records 100000000 --record-size 3000 --throughput -1 --producer-props acks=all bootstrap.servers=test-cluster-kafka-bootstrap:9092 batch.size=3000 buffer.memory=100000000 linger.ms =50 14897 records sen...

Re: MM2, how to configure offset-syncs topic name

Hello, If it's possible for your conf : could you try something like: *Setting 2 partitions on your topic A *run both MM2 at the same time with the same group of consumer but different clientID. In this way, Kafka Broker shoulds affect one partition by MM2 clientID, and so, messages will be "_alternatively_" mirrored based on the distribution of messages over partitions of topic A I guess. Best regards On 7/4/24 09:38, gustavo panizzo wrote: > Hello > > I have a source kafka cluster and 2 destination kafka clusters to which > i want to mirror messages _alternatively_, with this I mean that if > topic A was mirrored up to offset=3 to destination cluster 1 I want > mirrormaker to mirror offset=4 onwards to destination cluster 2 > > I dont care for consumergroup offset replication, i only care about > messages > > > A way i think i can make this work is to have both mm2 instances > configured (but on...

Re: Kafka 20k topics metadata update taking long time

OK so repeating with Java Kafka producer there is no problem – it's specific to the Kafka CLI Producer! Paul From: Brebner, Paul <Paul.Brebner@netapp.com.INVALID> Date: Friday, 5 July 2024 at 1:21 PM To: users@kafka.apache.org < users@kafka.apache.org > Subject: Re: Kafka 20k topics metadata update taking long time EXTERNAL EMAIL - USE CAUTION when clicking links or attachments Repeating my tests today with a bit more caution I can get up to around 47,000 partitions for a single topic before the producer fails with a bootstrap broker disconnected warning (in practice the producer cannot send), here's a graph of the producer time (to send 1k messages) using producer CLI with increasing partitions – it blows up near the end. No error logs on Kafka brokers or controllers, and everything else still works – i.e. can still increase partitions on the big partition topic, can still produce/consume on another topic with 3 partitions. RF=3 and 3x4 core brokers a...