Skip to main content

Posts

Showing posts from December, 2021

Re: [VOTE] 3.1.0 RC0

I have ran some initial validations using the steps here https://github.com/izzyacademy/apache-kafka-release-party - Validation of Hashes looks good - Validation of GPG Keys looks good - Validation of Source Code was skipped since the build for the 3.1 branch already passed - Validation of Site Documentation looks good for 3.1.0 and the other versions - Running Multi-Node Brokers in Legacy Mode (With Zookeeper) looks good with no issues - Running Brokers in KRaft Mode (without Zookeeper) had some issues with the node configurations and I am still debugging this to see if it is a user error on my end or issues with the release candidate. It appears I may have ran into some issues from changes checked in for KAFKA-13456 to restrict configuration of nodes in KRaft mode I will spend some time on this and will report back next week. Thanks David for running the release Israel Ekpo Lead Instructor, IzzyAcademy.com https://w...

Re: Kafka Topics

Hi Ola, I would suggest you can go with single Topic with multiple partitions. Once the data gets received from the Topic, you can do a DB update kind of a stuff to store the data , then use the data for analysing. Also, the below URL can be used to do the Topic sizing. eventsizer.io Thanks C Suresh On Thursday, December 30, 2021, Ola Bissani < ola.bissani@easysoft.com.lb > wrote: > Dears, > > I'm looking for a way to get real-time updates using my service, I believe > kafka is the way to go but I still have an issue on how to use it. > > My system gets data from devices using GPRS, I then read this data and > analyze it to check what action I should do afterwards. I need the > analyzing step to be as fast as possible. I was thinking of two options: > > The first option is to gather all the data sent from all the devices into > one huge topic and then getting all the data from this topic and analyzing > i...

Re: Kafka-Real Time Update

That really was a helpful overview, Israel. Might make a good blog post! šŸ˜€ Ola, C# would make it so that you can't use Kafka Streams, but you may not need it. The Kafka Consumer API, which is available in C#, might be enough for you. For a good explanation of topics, partitions, and pretty much everything else Israel mentioned, I would suggest you go to http://developer.confluent.io There you'll find free video courses, quick-starts, tutorials, and more. Sounds like you are at the beginning of an exciting journey! Enjoy! Dave > On Dec 30, 2021, at 8:29 AM, Ola Bissani < ola.bissani@easysoft.com.lb > wrote: > > Dear Israel, > > Thank you so much for your support, I will check the links you sent in your email to start my service. > > As for your question, yes the events generated by the devices are similar in data structures. I would also like to state that my service will be either done in java or C#. Would using...

RE: Kafka-Real Time Update

Dear Israel, Thank you so much for your support, I will check the links you sent in your email to start my service. As for your question, yes the events generated by the devices are similar in data structures. I would also like to state that my service will be either done in java or C#. Would using C# be an issue? Also is there some link you recommend I can check before writing my code. I have also one more question, in your mail you mentioned using one topic with many partitions, I would like to state that the number of devices I'm using is dynamic, are you suggesting I create a partition for each device and would it be possible if I don't know the exact number of devices I have, or should I create multiple partition for the purpose of multi-processing only? Thank you, Best Regards Ola Bissani Developer Manager Easysoft Mobile Lebanon : +961 3 61 16 90 Office Lebanon :+961 1 33 55 15/17 E mail: ola.bissani@easysoft.com.lb web s...

Re: Kafka-Real Time Update

Ola, Let's review the Apache Kafka ecosystem briefly, and then I will make an attempt to address your concerns: In the Kafka Ecosystem, we have the following components: - Brokers (stores events in logical containers called Topics. Topics are analogous to Tables in relational databases like MySQL or PostgreSQL) - Producers (the generate events and sends them to the brokers for storage - Consumers (picks up the events from the Topics and processes or consumes them) - Streams (at a high level combines Consumer and Producer mechanism to process events in near real time and send them back to the Topics) - Schema Registry (keeps track of data structures in the topics. Can be used for Avro, JSON, Protobuf formats) https://kafka.apache.org/documentation/#api https://github.com/confluentinc/schema-registry There are two main things to consider here in your scenario. Each of the devices is a prospective Producer of events that will be sent to the topic. Y...

Kafka-Real Time Update

Dears,   I'm looking for a way to get real-time updates using my service, I believe kafka is the way to go but I still have an issue on how to use it.   My system gets data from devices using GPRS, I then read this data and analyze it to check what action I should do afterwards. I need the analyzing step to be as fast as possible. I was thinking of two options:   The first option is to gather all the data sent from all the devices into one huge topic and then getting all the data from this topic and analyzing it. The downside of this option is that the data analysis step is delaying my work since I was to loop through the topic data, on the other hand the advantage is that I have a manageable number of topics ( only 1 topic).   The other option is to divide the data I'm gathering into several small topics by allowing each device to have its own topic, take into consideration that the number of devices is large, I'm talking about more that 5000 devices. The downside o...

RE: Kafka Topics

Dears, I'm looking for a way to get real-time updates using my service, I believe kafka is the way to go but I still have an issue on how to use it. My system gets data from devices using GPRS, I then read this data and analyze it to check what action I should do afterwards. I need the analyzing step to be as fast as possible. I was thinking of two options: The first option is to gather all the data sent from all the devices into one huge topic and then getting all the data from this topic and analyzing it. The downside of this option is that the data analysis step is delaying my work since I was to loop through the topic data, on the other hand the advantage is that I have a manageable number of topics ( only 1 topic). The other option is to divide the data I'm gathering into several small topics by allowing each device to have its own topic, take into consideration that the number of devices is large, I'm talking about more that 5000 devices. The downside of t...

Re: Kafka Streams - one topic moves faster the other one

Hi Miguel, Yes, the grace period is the solution to fix the problem. Alternatively, you can try to set a higher value for " max.task.idle.ms " configuration, because this is some kind of out-of-order data. Let's say, A topic has 1 record per second (fast), B topic has 1 record per minute (slow). You can set the " max.task.idle.ms " as 60 seconds or higher, to force the stream to wait for 1 minute for the empty topic B, before processing the records. From the document: https://kafka.apache.org/30/documentation/streams/developer-guide/config-streams#max-task-idle-ms *Any config value greater than zero indicates the number of extra milliseconds that Streams will wait if it has a caught-up but empty partition. In other words, this is the amount of time to wait for new data to be produced to the input partitions to ensure in-order processing of data in the event of a slow producer. * Hope it helps. Thank you. Luke On Thu, Dec 30, 202...

Kafka Streams - one topic moves faster the other one

Hi team So I ran into a complicated issue, something which I believe Kafka Streams is not prepared for. Basically my app is reading from two topics and joining them. But when testing the in my staging environment I found, one topic moves faster than the other one basically pushing stream time forward Some partitions are even months apart. I found a question on SO detailing something similar: https://stackoverflow.com/questions/69126351/bulk-processing-data-through-a-join-with-kafka-streams-results-in-skipping-reco The problem for me is that joins are no longer working. Setting a huge grace period has somehow alleviated the problem for now but I don't that that's the right approach and not all events join at the end anyway? Have other users faced something similar, if so, how can it be resolved? Can we somehow delay the processing to make them aligned some how thanks - Miguel

Re: [External] : Re: Unit Test falling for Kafka 2.8.0

Hi Mohammad, Sorry, I have no idea why it failed in your env. I can run ` ./gradlew :spotlessScalaCheck` successfully without error. That is, I can confirm the source code has no errors. You might need to google for the fix to your development environment. Thank you. Luke On Mon, Dec 27, 2021 at 9:34 PM mohammad shadab < mohammad.s.shadab@oracle.com > wrote: > Hi Luke, > > Thanks a lot for your time and assistance. > > Some how its wasn't attached, attaching again Kafka 3.0 unit test report. > > For my work i need to generate binaries from kafka source code which i am > doing locally , so there is no PR in repo. > I am just running unit test( i think that should run fine as it ran for me > earlier). > > if i run complete test or integration its not running at all , that i am > not considering as of now. > > below is error for kafka 2.8.0 ( same is for kafka 3.0) > =================================...

Re: purge data in topic

Couple of ways - Decrease the retention to a very low value - Use delete records cmd tool On Tue, 28 Dec 2021, 4:27 pm Shutter X, <shutter@finmail.com.invalid> wrote: > HI list, > > what's the way to fast purge all data in a topic? > The topic has been running for many days, having huge data stored. > > Thanks. >

Re: [External] : Re: Unit Test falling for Kafka 2.8.0

Hi Luke,  Thanks a lot for your time and assistance. Some how its wasn't attached, attaching again Kafka 3.0 unit test report. For my work i need to generate binaries from kafka source code which i am doing locally , so there is no PR in repo.  I am just running unit test( i think that should run fine as it ran for me earlier). if i run complete test or integration its not running at all , that i am not considering as of now.  below is error for kafka 2.8.0 ( same is for kafka 3.0) =========================================== gradle test Starting a Gradle Daemon (subsequent builds will be faster) > Starting Daemon> IDLE<-------------> 0% INITIALIZING [65ms]> Evaluating settings<-------------> 0% INITIALIZING [170ms]<-------------> 0% INITIALIZING [276ms] ...................................................... ...

Re: [External] : Re: Unit Test falling for Kafka 2.8.0

Hi Mohammad, I can't see your report. But usually we won't run all tests on local environment. If you want to submit PR to Kafka repo, there will be jenkins build/tests for you. And also, there are some flaky tests. You can check the jenkins build results for v3.0 here: https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.0/ Thank you. Luke On Fri, Dec 24, 2021 at 9:26 PM mohammad shadab < mohammad.s.shadab@oracle.com > wrote: > Thanks a lot Luke for this piece of information. > > Yesterday i downloaded kafka 3.0 and its even falling there. > Do i need to take it from git as i tool from > https://kafka.apache.org/downloads > < https://urldefense.com/v3/__https://kafka.apache.org/downloads__;!!ACWV5N9M2RV99hQ!e8QWe1ktCCKG2r0b2uBU9ydXmVOXfq-7ERFXY0M6KKBWBvdg5Az6ZPZjUZ47_TgqfcjQbw$ > > . > > attaching kafka 3.0 unit test report. > > > - Gradle 7.3.3 > - > - java version "11....