Kafka

Posts

Showing posts from October, 2021

Re: Kafka Subscription

Hi Sabari, To subscribe, you should send an email to users-subscribe@kafka.apache.org . Not users@kafka.apache.org . Thanks. Luke On Sun, Oct 31, 2021 at 11:46 PM Sabari MayaKrishnan < sabarim@teezle.com > wrote: > Hi, > > For Kafka subscription >

Re: Stream to KTable internals

Thank you for your response and the links to the presentations. *However, this seems tobe orthogonal to your issue?* Yes. From what I see in the code it looks like you have a single consumer subscribed to multiple topics. Please correct me if I'm wrong. *By default, timestamp synchronization is disabled. Maybeenabling it would help?* We are using a timestamp extractor that returns 0. We did that because we were almost always missing joins on startup, and this seemed to be the only way to bootstrap enough records at startup to avoid the missed join. We found a post that said doing that would make the KTable act like the GlobalKTable at startup. So far this works great, we never miss a join on a startup. If I use "timestamp synchronization" do I have to remove the zero timestamp extractor? If I remove the zero timestamp extractor will timestamp synchronization take care of the missed join issue on startup? I'm guessing the issue here is that ...

Producer Timeout issue in kafka streams task

Hi All, I am getting below issue in streams application. Kafka cluster is a 3 broker cluster (v 2.5.1) and I could see that 2 of the 3 brokers restarted at the same time when below exception occurred in streams application so I can relate below exception to those brokers restarts. However, what is worrying me is the streams application did not process any events after below exception. So the question is: 1. how can i make the streams application resilient to broker issues e.g. the producer underneath streams should have connected to another broker instance at the time 1 broker went down, but possible the 2nd broker went down immediately that's why it timed out 2. In general how does streams handle broker issue and when does it decide to connect to another broker instance in case one instance seems to be in error? {"@timestamp":"2021-10-30T12:19:43.486+00:00","@version":"1","message":"Exception processing proc...

Kafka Subscription

Hi, For Kafka subscription

Re: Stream to KTable internals

Yes, a StreamThread has one consumer. The number of StreamThreads per instance is configurable via `num.stream.threads`. Partitions are assigned to threads similar to consumer is a plain consumer group. It seems you run with the default of one thread per instance. As you spin up 12 instances, it results in 12 threads for the application. As you have 12 partitions, using more threads won't be useful as no partitions are left for them to process. For a stream-table joins, there will be one task per "partition pair" that computes the join for those partitions. So you get 12 tasks, and each thread processes one task in your setup. Ie, a thread consumer is reading data for both input topics. Pausing happens on a per-partition bases: for joins there is two buffers per task (one for each input topic partition). It's possible that one partition is paused while the other is processed. However, this seems to be orthogonal to your issue? For a G...

Re: gRPC with Kafka

Hi Matthew, If you can give more details about the course objectives that would help me understand where your questions are coming from but I think they have to do with the mixup between message formats and communication mechanisms. I think you are mixing up the communication mechanisms with the message formats. REST and gRPC are communication mechanisms while JSON and Protocol Buffers are message encoding formats. Kafka has its own binary protocol that clients use when interacting with the brokers but the message can be encoded in protocol buffer, JSON, Avro, CSV or any binary format of your preference. So when you say gRPC message, I assume you are referring to messages from the producers encoded in protocol buffer format. The Kafka connect framework and certain Kafka clients (producers, consumers, streams) support using protocol buffers alongside the Confluent Schema Registry) so you can use it in your apps in language this is supported. The console sc...

Re: Stream to KTable internals

Yes, this helped. I have some additional questions. Does StreamThread have one consumer? (Looks like it, but just want to confirm) Is there a separate StreamThread for each topic including the KTable? If a KTable is a StreamThread and there is a StreamTask for that KTable, could my buffer be getting filled up, and the mainConsumer for the KTable be getting paused? I see this code in StreamTask#addRecords. // if after adding these records, its partition queue's buffered size has been // increased beyond the threshold, we can then pause the consumption for this partition if (newQueueSize > maxBufferedSize) { mainConsumer.pause(singleton(partition)); } Is there any specific logging that I can set to debug or trace that would help me troubleshoot? I'd prefer not to turn debug and/or trace on for every single class. Thanks, Chad On Sat, Oct 30, 2021 at 5:20 AM Luke Chen < showuon@gmail.com > wrote: ...

gRPC with Kafka

I am involved with a course whose project has introduced REST, Kafka and gRPC on its topic of message passing. However, in the project phase of this section, we are tasked with passing gRPC messages through Kafka to handle large amounts of location data. Has anyone used these two items together? The course teaches them separately and uses the kafka console scripts and so its a bit confusing trying to put the two together. Should I utilize Kafka's python implementation? Matthew Glassman

Re: Stream to KTable internals

Thank you for the information. This is actually not my issue. In my scenario the records were written to the KTable topic before the record was written to the KStream topic. We only get a handful of these issues per day, and they seem to happen when many transactions are being run through the system. This stream application was previously running using a GlobalKTable and we saw one incident in 30 days. Now with the KTable we see them a few times a day. This is not happening during startup, these PODs have been running for a few days. When we switched from GlobalKTable we went from 3 running instances to 12 running instances of our application (we have 12 partitions on each topic). We also moved from Kafka 2.8.0 client libraries to 3.0.0. When these happen we usually figure it out about an hour after, and to fix the issue we just put the exact same message back on the KStream topic and everything processes correctly. Does GlobalKTable read records from the topic differen...

Re: Stream to KTable internals

Hi Chad, > I'm wondering if someone can point me to the Kafka streams internal code that reads records for the join? --> You can check StreamThread#pollPhase, where stream thread (main consumer) periodically poll records. And then, it'll process each topology node with these polled records in stream tasks (StreamTask#process). Hope that helps. Thanks. Luke On Sat, Oct 30, 2021 at 5:42 PM Gilles Philippart <gilles.philippart@fundingcircle.com.invalid> wrote: > Hi Chad, this talk around 24:00 clearly explains what you're seeing > https://www.confluent.io/events/kafka-summit-europe-2021/failing-to-cross-the-streams-lessons-learned-the-hard-way/ > < > https://www.confluent.io/events/kafka-summit-europe-2021/failing-to-cross-the-streams-lessons-learned-the-hard-way/ > > > > Gilles > > > On 30 Oct 2021, at 04:02, Chad Preisler < chad.preisler@gmail.com > wrote: > > > > Hello, > ...

Re: Stream to KTable internals

Hi Chad, this talk around 24:00 clearly explains what you're seeing https://www.confluent.io/events/kafka-summit-europe-2021/failing-to-cross-the-streams-lessons-learned-the-hard-way/ < https://www.confluent.io/events/kafka-summit-europe-2021/failing-to-cross-the-streams-lessons-learned-the-hard-way/ > Gilles > On 30 Oct 2021, at 04:02, Chad Preisler < chad.preisler@gmail.com > wrote: > > Hello, > > I have a stream application that does a KStream to KTable left join. We > seem to be occasionally missing joins (KTable side is null). > > I'm wondering if someone can point me to the Kafka streams internal code > that reads records for the join? I've poked around the Kafka code base, but > there is a lot there. I imagine there is some consumer poll for each side > of the join, and possibly a background thread for reading the KTable topic. > > I figure there are several possible causes of this issue, and s...

Stream to KTable internals

Hello, I have a stream application that does a KStream to KTable left join. We seem to be occasionally missing joins (KTable side is null). I'm wondering if someone can point me to the Kafka streams internal code that reads records for the join? I've poked around the Kafka code base, but there is a lot there. I imagine there is some consumer poll for each side of the join, and possibly a background thread for reading the KTable topic. I figure there are several possible causes of this issue, and since nothing is really jumping out in my code, I was going to start looking at the Kafka code to see if there is something I can do to fix this. Thanks, Chad

Re: Contributors list

Hi Shylaja, You're all set now. Thanks for your interest in Apache Kafka. -Bill On Thu, Oct 28, 2021 at 9:20 PM Kokoori, Shylaja < shylaja.kokoori@intel.com > wrote: > Hi, > > I have created a ticket and would like to work on it. Documentation says > "you need to be in the contributors list of Apache Kafka in order to be > assigned to a JIRA ticket". > Can I be added to the contributors list? > Full name is Shylaja Kokoori > > Thanks, > -Shylaja > - > > >

Re: New Kafka Consumer : unknown member id

Hi, Which version of kafka client are you using? I can't find this error message in the source code. When googling this error message, it showed the error is in Kafka v0.9. Could you try to use the V3.0.0 and see if that issue still exist? Thank you. Luke On Thu, Oct 28, 2021 at 11:15 PM Kafka Life < lifekafka999@gmail.com > wrote: > Dear Kafka Experts > > We have set up a group.id (consumer ) = XXXXYYY > But when tried to connect to kafka instance : i get this error message. I > am sure this consumer (group id does not exist in kafka) .We user plain > text protocol to connect to kafka 2.8.0. Please suggest how to resolve this > issue. > > DEBUG AbstractCoordinator:557 - [Consumer clientId=XXXXX, groupId=XXXXYYY] > Attempt to join group failed due to unknown member id. >

Contributors list

Hi, I have created a ticket and would like to work on it. Documentation says "you need to be in the contributors list of Apache Kafka in order to be assigned to a JIRA ticket". Can I be added to the contributors list? Full name is Shylaja Kokoori Thanks, -Shylaja -

Re: MirorMaker 2 - Accessing State Store ChangeLogs

Hi All Anybody has inputs on my question below ? Regards, Neeraj > On 26 Oct 2021, at 7:12 am, Neeraj Vaidya <neeraj.vaidya@yahoo.co.in.invalid> wrote: > > Hello Experts > Any comments on this ? > > Regards, > Neeraj > > >> On 24 Oct 2021, at 11:55 am, Neeraj Vaidya < neeraj.vaidya@yahoo.co.in > wrote: >> >> Hi All, >> In an Active-Active MM2 (MirrorMaker 2) setup, I have the following Data Centres (DC1,DC2) : >> >> DC1: >> Topic T1 >> Kafka Streams which consumes from T1 and updates a local state store (example : "MyStateStore"). the applicationId is "myapp". >> >> DC2: >> Topic T1 >> The MM2 process replicates the records from T1 to DC1.T1 on this data centre. >> MM2 also replicates myapp-MyStateStore-changelog to DC1.myapp-MyStateStore-changelog. >> >> MM2 configuration: >> "sync.g...

Re: Kafka streams - message not materialized in intermediate topics

Hi Tomer, To dump the topology you can do `System.out.println(topology.describe().toString())`. But if you can post just the code that would be fine as well. I understand about the logs, one thing to do is grep out any sensitive information, but I get it if you can't do that. Thanks, Bill On Thu, Oct 28, 2021 at 1:23 PM Tomer Cohen < ilan012@gmail.com > wrote: > Hi Bill, > > Is there an easy way to dump the topology to share? > > The logs contain sensitive information, is there something else that can be > provided? > > Thanks, > > Tomer > > On Thu, Oct 28, 2021 at 12:23 PM Bill Bejeck <bill@confluent.io.invalid> > wrote: > > > Hi Tomer, > > > > Can you share your topology and any log files? > > > > Thanks, > > Bill > > > > On Thu, Oct 28, 2021 at 12:07 PM Tomer Cohen < ilan012@gmail.com > wrote: > > > > > Hi Bill/Matth...

Re: Kafka streams - message not materialized in intermediate topics

Hi Bill, Is there an easy way to dump the topology to share? The logs contain sensitive information, is there something else that can be provided? Thanks, Tomer On Thu, Oct 28, 2021 at 12:23 PM Bill Bejeck <bill@confluent.io.invalid> wrote: > Hi Tomer, > > Can you share your topology and any log files? > > Thanks, > Bill > > On Thu, Oct 28, 2021 at 12:07 PM Tomer Cohen < ilan012@gmail.com > wrote: > > > Hi Bill/Matthias, > > > > Thanks for the replies. > > > > The issue is I never see a result, I have a log that shows the message > > coming in, but the adder/subtractor is never invoked for it even though > it > > should. So no result gets published to the intermediate topic I have. > > > > Thanks, > > > > Tomer > > > > On Thu, Oct 28, 2021 at 11:57 AM Bill Bejeck <bill@confluent.io.invalid> > > wrote: > > ...

Re: Kafka streams - message not materialized in intermediate topics

Hi Tomer, Can you share your topology and any log files? Thanks, Bill On Thu, Oct 28, 2021 at 12:07 PM Tomer Cohen < ilan012@gmail.com > wrote: > Hi Bill/Matthias, > > Thanks for the replies. > > The issue is I never see a result, I have a log that shows the message > coming in, but the adder/subtractor is never invoked for it even though it > should. So no result gets published to the intermediate topic I have. > > Thanks, > > Tomer > > On Thu, Oct 28, 2021 at 11:57 AM Bill Bejeck <bill@confluent.io.invalid> > wrote: > > > Tomer, > > > > As Matthias pointed out for a single, final result you need to use the > > `suppress()` operator. > > > > But back to your original question, > > > > they are processed by the adder/subtractor and are not > > > materialized in the intermediate topics which causes them not to be > > > outputted ...

Re: Kafka streams - message not materialized in intermediate topics

Hi Bill/Matthias, Thanks for the replies. The issue is I never see a result, I have a log that shows the message coming in, but the adder/subtractor is never invoked for it even though it should. So no result gets published to the intermediate topic I have. Thanks, Tomer On Thu, Oct 28, 2021 at 11:57 AM Bill Bejeck <bill@confluent.io.invalid> wrote: > Tomer, > > As Matthias pointed out for a single, final result you need to use the > `suppress()` operator. > > But back to your original question, > > they are processed by the adder/subtractor and are not > > materialized in the intermediate topics which causes them not to be > > outputted in the final topic > > > > Is the issue you never see a result or were you curious about the > intermediate calculations? > HTH, > Bill > > On Thu, Oct 28, 2021 at 1:05 AM Matthias J. Sax < mjsax@apache.org > wrote: > > > For this...

Re: Kafka streams - message not materialized in intermediate topics

Tomer, As Matthias pointed out for a single, final result you need to use the `suppress()` operator. But back to your original question, they are processed by the adder/subtractor and are not > materialized in the intermediate topics which causes them not to be > outputted in the final topic > Is the issue you never see a result or were you curious about the intermediate calculations? HTH, Bill On Thu, Oct 28, 2021 at 1:05 AM Matthias J. Sax < mjsax@apache.org > wrote: > For this case, you can call `aggregate(...).suppress()`. > > -Matthias > > On 10/27/21 12:42 PM, Tomer Cohen wrote: > > Hi Bill, > > > > Thanks for the prompt reply. > > > > Setting to 0 forces a no collection window, so if I get 10 messages to > > aggregate for example, it will send 10 updates. But I only want to > publish > > the final state only. > > > > Thanks, > > > > Tomer ...

New Kafka Consumer : unknown member id

Dear Kafka Experts We have set up a group.id (consumer ) = XXXXYYY But when tried to connect to kafka instance : i get this error message. I am sure this consumer (group id does not exist in kafka) .We user plain text protocol to connect to kafka 2.8.0. Please suggest how to resolve this issue. DEBUG AbstractCoordinator:557 - [Consumer clientId=XXXXX, groupId=XXXXYYY] Attempt to join group failed due to unknown member id.

Re: Contributors list

Done. On 10/27/21 11:55 PM, Michael Negodaev wrote: > Hello! > Please add me to the contributor list in JIRA with username "mnegodaev". > Thank you. >

Contributors list

Hello! Please add me to the contributor list in JIRA with username "mnegodaev". Thank you.

Re: Kafka streams - message not materialized in intermediate topics

For this case, you can call `aggregate(...).suppress()`. -Matthias On 10/27/21 12:42 PM, Tomer Cohen wrote: > Hi Bill, > > Thanks for the prompt reply. > > Setting to 0 forces a no collection window, so if I get 10 messages to > aggregate for example, it will send 10 updates. But I only want to publish > the final state only. > > Thanks, > > Tomer > > On Wed, Oct 27, 2021 at 2:10 PM Bill Bejeck <bill@confluent.io.invalid> > wrote: > >> Hi Tomer, >> >> From the description you've provided, it sounds to me like you have a >> stateful operation. >> >> The thing to keep in mind with stateful operations in Kafka Streams is that >> every result is not written to the changelog and forwarded downstream. >> Kafka Streams uses a cache for stateful operations and it's only on cache >> flush either when it's full or when Kafka Streams commits (ev...

Re: Apache Kafka : start up scripts

Start here https://github.com/apache/kafka/blob/trunk/bin/kafka-server-start.sh https://github.com/apache/kafka/tree/trunk/core/src/main/scala/kafka/server Also take a look at the logs when the server starts up. That should give you some insights. On Wed, Oct 27, 2021 at 5:03 PM Kafka Life < lifekafka999@gmail.com > wrote: > Dear Kafka experts > > when an broker is started using start script , could any of you please let > me know the sequence of steps that happens in the back ground when the node > UP > > like : when the script is initiated to start , > 1/ is it checking indexes .. ? > 2/ is it checking isr ? > 3/ is URP being made to zero.. ? > > i tried to look in ther server log but could not under the sequence of > events performed till the node was up .. could some one please help .. > > Thanks >

Apache Kafka : start up scripts

Dear Kafka experts when an broker is started using start script , could any of you please let me know the sequence of steps that happens in the back ground when the node UP like : when the script is initiated to start , 1/ is it checking indexes .. ? 2/ is it checking isr ? 3/ is URP being made to zero.. ? i tried to look in ther server log but could not under the sequence of events performed till the node was up .. could some one please help .. Thanks

Re: Kafka streams - message not materialized in intermediate topics

Hi Bill, Thanks for the prompt reply. Setting to 0 forces a no collection window, so if I get 10 messages to aggregate for example, it will send 10 updates. But I only want to publish the final state only. Thanks, Tomer On Wed, Oct 27, 2021 at 2:10 PM Bill Bejeck <bill@confluent.io.invalid> wrote: > Hi Tomer, > > From the description you've provided, it sounds to me like you have a > stateful operation. > > The thing to keep in mind with stateful operations in Kafka Streams is that > every result is not written to the changelog and forwarded downstream. > Kafka Streams uses a cache for stateful operations and it's only on cache > flush either when it's full or when Kafka Streams commits (every 30 seconds > by default) that Kafka Streams writes the results of the stateful > operations to the changelog and forwards the records downstream to other > processors. > > If you want every Kafka Streams...

Re: Kafka streams - message not materialized in intermediate topics

Hi Tomer, From the description you've provided, it sounds to me like you have a stateful operation. The thing to keep in mind with stateful operations in Kafka Streams is that every result is not written to the changelog and forwarded downstream. Kafka Streams uses a cache for stateful operations and it's only on cache flush either when it's full or when Kafka Streams commits (every 30 seconds by default) that Kafka Streams writes the results of the stateful operations to the changelog and forwards the records downstream to other processors. If you want every Kafka Streams to forward every record you'll need to set the `StreamConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG` to 0. If I haven't understood your experience accurately can you provide a few more details? Thanks, Bill On Wed, Oct 27, 2021 at 9:48 AM Tomer Cohen < ilan012@gmail.com > wrote: > Hello Kafka team, > > I am seeing an odd behavior when using kafka streams...

multiple tenants accessing Confluent kafka on GCP

Hello All - For multiple external tenants accessing Kafka, is there any issue with providing the same endpoint (ie. set of brokers, ports) to the multiple producers ? what are the best practices wrt multiple tenants producing data on (Confluent) Kafka topics, Kafka being installed on GCP. tia!

Kafka streams - message not materialized in intermediate topics

Hello Kafka team, I am seeing an odd behavior when using kafka streams. During periods of heavier volumes, there are messages coming in. However, they do look like they are processed by the adder/subtractor and are not materialized in the intermediate topics which causes them not to be outputted in the final topic. Is there any way to debug this or log out when a message is dropped in the stream and not processed for whatever reason? Thanks, Tomer

Replica leader election stops when zk connection is re-established

Hi all! We have a problem with 6 of our Kafka clusters since we upgraded to 2.8.0 from 2.3.1 a few months back. A seventh cluster is still on 2.3.1 and never had this problem. The cluster runs fine for a random period, days, or weeks. Suddenly when creating new topics, they never get assigned partitions. It gets no ISR, and leader is "none". When using the zkCli to browse the topic it has no partitions. When this happens, we have been forced to restart the Kafka service on the "controller" host that will cause a new controller to be elected and that solves the problem. I've found out that after the Zookeeper leader host rebooted, the Kafka "Controller" host stopped with "Processing automatic preferred replica leader election" messages in the log, even though it reconnected fine. This seems related. When trying to run the kafka-leader-election.sh (using --bootstrap-server) for all topics it fails saying that none of the partitions/top...

Re: Apache Kafka mirrormaker2 : sync.group.offsets.enabled and sync.group.offsets.interval.seconds

Hello All, Any inputs on my question below ? Regards, Neeraj Sent from my iPhone > On 26 Oct 2021, at 10:29 am, Neeraj Vaidya <neeraj.vaidya@yahoo.co.in.invalid> wrote: > > Anyone has inputs on my question below ? > > Regards, > Neeraj > > Sent from my iPhone > >> On 24 Oct 2021, at 10:23 pm, Neeraj Vaidya <neeraj.vaidya@yahoo.co.in.invalid> wrote: >> >> When replicating data from one datacentre (DC1) to another (DC2) using MM2, I can see that the consumer offsets are actually getting synchronized from source to target cluster in realtime. And not really at the default sync frequency time of 60 seconds. >> >> I have enabled both the above config options (in the subject of this post) in MM2 config. >> >> The way I tested this is by making consumers listen to the topic , say T1 in source cluster as well as DC2.T1 in target cluster. >> >> If I produce a messa...

Re: [kafka-clients] [VOTE] 2.7.2 RC0

Thanks Bill. That is greatly appreciated :) We need more PMC members with binding votes to participate. You can do it! On Tue, Oct 26, 2021 at 1:25 PM Bill Bejeck < bbejeck@gmail.com > wrote: > Hi Mickael, > > Thanks for running the release. > > Steps taken > > 1. Validated checksums > 2. Validated signatures > 3. Built from source > 4. Ran all the unit tests > 5. Spot checked various JavaDocs > > > +1(binding) > > On Tue, Oct 26, 2021 at 4:43 AM Luke Chen < showuon@gmail.com > wrote: > >> Hi Mickael, >> >> Thanks for the release. I did: >> 1. Verified checksums and signatures >> 2. Run quick start steps >> 3. Verified the CVE-2021-38153 is indeed fixed in kafka-2.7.2-src.tgz >> < https://home.apache.org/~mimaison/kafka-2.7.2-rc0/kafka-2.7.2-src.tgz >. >> >> +1 (non-binding) >> >> Thank you. >...

Re: [kafka-clients] [VOTE] 2.7.2 RC0

Hi Mickael, Thanks for running the release. Steps taken 1. Validated checksums 2. Validated signatures 3. Built from source 4. Ran all the unit tests 5. Spot checked various JavaDocs +1(binding) On Tue, Oct 26, 2021 at 4:43 AM Luke Chen < showuon@gmail.com > wrote: > Hi Mickael, > > Thanks for the release. I did: > 1. Verified checksums and signatures > 2. Run quick start steps > 3. Verified the CVE-2021-38153 is indeed fixed in kafka-2.7.2-src.tgz > < https://home.apache.org/~mimaison/kafka-2.7.2-rc0/kafka-2.7.2-src.tgz >. > > +1 (non-binding) > > Thank you. > Luke > > On Tue, Oct 26, 2021 at 3:41 PM Tom Bentley < tbentley@redhat.com > wrote: > > > Hi Mickael, > > > > As with 2.6.3 RC0, I have: > > > > * Verified checksums and signatures > > * Built jars and docs from the source jar > > * Run the unit and integration tests...