Skip to main content

Posts

Showing posts from June, 2022

[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022

To all committers and non-committers. This is a final call to apply for travel/hotel assistance to get to and stay in New Orleans for ApacheCon 2022. Applications have been extended by one week and so the application deadline is now the 8th July 2022. The rest of this email is a copy of what has been sent out previously. We will be supporting ApacheCon North America in New Orleans, Louisiana, on October 3rd through 6th, 2022. TAC exists to help those that would like to attend ApacheCon events, but are unable to do so for financial reasons. This year, We are supporting both committers and non-committers involved with projects at the Apache Software Foundation, or open source projects in general. For more info on this year's applications and qualifying criteria, please visit the TAC website at http://www.apache.org/travel/ Applications have been extended until the 8th of July 2022. Important: Applicants have until the closing date above to submit their ...

Re: a little problem in quickstart

Thanks Mason & Chris, The change to current doc is in kafka-site repo: https://github.com/apache/kafka-site I've commented on the PR https://github.com/apache/kafka/pull/12252 to ask for submitting another PR in kafka-site repo. Thank you. Luke On Mon, Jun 27, 2022 at 12:28 AM Chris Egerton < fearthecellos@gmail.com > wrote: > Hi Mason, > > You're correct that the quickstart should use 'libs' instead of 'lib'. > This has already been fixed in the docs for the upcoming 3.3.0 release with > https://github.com/apache/kafka/pull/12252 . We might consider backporting > that change; I've CC'd Luke Chen, who merged that fix and might be able to > help with backporting it (I'd take it on myself but I'm not well-versed in > how the site docs work, especially with making changes for already-released > versions). > > Cheers, > > Chris > > On Sun, Jun 26, 2022 at 11:12 AM...

Re: a little problem in quickstart

Hi Mason, You're correct that the quickstart should use 'libs' instead of 'lib'. This has already been fixed in the docs for the upcoming 3.3.0 release with https://github.com/apache/kafka/pull/12252 . We might consider backporting that change; I've CC'd Luke Chen, who merged that fix and might be able to help with backporting it (I'd take it on myself but I'm not well-versed in how the site docs work, especially with making changes for already-released versions). Cheers, Chris On Sun, Jun 26, 2022 at 11:12 AM Men Lim < zulu208@gmail.com > wrote: > You don't need to put in the jar file name in the plug in.path variable. > Something like plugin.path=/kafka/plugin. Then have the jar file in that > plugin folder. Restart the worker and it will pick it up. > > On Sun, Jun 26, 2022 at 8:04 AM mason lee < cjxiuyi@gmail.com > wrote: > > > Hi I'm new to Kafka and i can not pass step 6 in...

Re: a little problem in quickstart

You don't need to put in the jar file name in the plug in.path variable. Something like plugin.path=/kafka/plugin. Then have the jar file in that plugin folder. Restart the worker and it will pick it up. On Sun, Jun 26, 2022 at 8:04 AM mason lee < cjxiuyi@gmail.com > wrote: > Hi I'm new to Kafka and i can not pass step 6 in > https://kafka.apache.org/quickstart , finally I found that the word 'lib' > in > 'echo "plugin.path=lib/connect-file-3.2.0.jar' should be 'libs'. > It bothered me for a while, I think a change would be better. >

a little problem in quickstart

Hi I'm new to Kafka and i can not pass step 6 in https://kafka.apache.org/quickstart , finally I found that the word 'lib' in 'echo "plugin.path=lib/connect-file-3.2.0.jar' should be 'libs'. It bothered me for a while, I think a change would be better.

Re: Query regarding kafka controller shutdown

Thanks for your reply. Replication factor is 2 and min.insync.replicas is default (1) I will get back to you on the producer's ack settings. Regards, Dhiraj On Sun, Jun 5, 2022 at 1:20 PM Liam Clarke-Hutchinson < lclarkeh@redhat.com > wrote: > Off the top of my head, it looks like it lost network connectivity to some > extent. > > Question - what settings were used for topics like efGamePlay? What is min > insync replicas, replication factor, and what acks settings is the producer > using? > > Cheers, > > Liam > > On Fri, 3 Jun 2022 at 22:55, dhiraj prajapati < dhirajpraj@gmail.com > > wrote: > > > Hi all, > > Recently we faced an issue with one of our production kafka clusters: > > - It is a 3 node cluster > > - kafka server version is 1.0 > > > > *Issue*: > > One of the brokers had some problem resulting in the following: > > 1. The broker lo...

Re: apply to be a contributor (jira)

Hi, Done. Thanks for your interest in Apache Kafka. -Bill On Fri, Jun 24, 2022 at 10:38 AM Justinwins < hutingwins@163.com > wrote: > Justinwins is my account in jira. I found some bugs of kafka mm2 ,and want > to assign them to myself. > Now i am applying to be the contributor. > > > thanks.

Re: end of life

Hi Jonas, You can find this information here: https://cwiki.apache.org/confluence/display/KAFKA/Time+Based+Release+Plan#TimeBasedReleasePlan-WhatIsOurEOLPolicy%3F= > What Is Our EOL Policy? > > Given 3 releases a year and the fact that no one upgrades three times a > year, we propose making sure (by testing!) that rolling upgrade can be done > from each release in the past year (i.e. last 3 releases) to the latest > version. > > We will also attempt, as a community to do bugfix releases as needed for > the last 3 releases. > So according to this, 2.5.x is definitely EOL, and 2.8.x should have been EOL-ed with the 3.2.0 release. Best, On Wed, Jun 22, 2022 at 8:48 AM Jonas Thaarup LĆøvig <jotl@netcompany.com.invalid> wrote: > Hi > > > > I am writing because I'm having a hard time finding a list or > documentation of you support policy on kafka version, is their anywhere I > can find, wh...

end of life

Hi   I am writing because I’m having a hard time finding a list or documentation of you support policy on kafka version, is their anywhere I can find, when different version goes End of support/life At my firm we are using 2.5.1 and 2.8.1 I’m pretty sure they are both EOL but I need some concrete documentation to take to my manager   Hope you can help   Med venlig hilsen / Best regards Jonas Thaarup LĆøvig Operations Engineer Toldbod Plads 1 DK - 9000 Aalborg Denmark Mobile +45 2114 0384 Reception +45 7013 1440 jotl@netcompany.com www.netcompany.com/da    

Re: source topic partitions not assigned evenly to kafka stream threads

@Matthias Can you help with this since I remember having conversation with you in the past on this topic wherein it was mentioned that partition assignment to stream task logic might change in future releases On Mon, Jun 13, 2022, 11:05 Pushkar Deole < pdeole2015@gmail.com > wrote: > Hi, > > I have a microservice hosting kafka streams application. I have 3 > instances of microservice hosted in 3 pods, each is having 2 kafka stream > threads, thus total 6 stream threads as part of consumer group. > There are 3 source topics: A, B, C each having 12, 6, 6 partitions > respectively i.e. total 24 source topic partitions. > > Now, the issue I am facing is the distribution of source topic partitions > among stream threads. Considering that 6 streams threads and overall 24 > topic partitions, each stream thread is assigned 4 partitions, so no issue > there. However the main issue arises is that of partitions assigned to > stre...

Re: How it is safe to break message ordering but not idempotency after getting an OutOfOrderSequenceException?

In general, the producer would retry internally. If it sends batch X, and didn't get an ack back for it, and gets a sequence error for X+1 that it sent after X, the producer would resend X again (as it knows X was never received by the broker) and afterwards X+1. Only if the producer did get an ack for X (and thus purged the batch from it's internal buffer), it would raise the error to the application as batch X cannot be resend, because the broker did ack, but somehow seems to have lost the data. (Reason could be some miss-configuration or unclean leader election as examples.) For the "risks reordering of sent records" the issue is as follows: after the exception, the producer would not drop buffered records, but would bump it's epoch including a sequence number reset. Thus, if you call `send()` for the record of X now to they would be put _after_ the already buffered record from X-1 (X-1 can still be retried with the new epoch and ...

Kraft - Adding new controllers in production

Hello! I sent this message a couple of months ago, but I did not get a response. I hope it is ok to try again. I'm wondering if there's a right way to add new controllers to an already existing cluster without downtime. I've tried the following: I have three controllers joined in a cluster then one by one I change configuration to 4 voters, stop controller, delete quorum-state, start controller. When it is done for all 3 existing controllers I start the fourth one. In most of my experiments there is a consensus on who is the leader after the 4th controller is up, but there were some cases where the leader couldn't be elected and I had to restart each controller again. Although the leader was elected after the transition, I'm not sure if during the transition the cluster has not temporarily lost an ability to elect a leader. I think that this problem should potentially go away if there is atleast 5 controllers to begin with. Then I do the same thi...

Re: Kafka consumer filtering

Hi - depending on the rules for how to filter/drop incoming messages (and depending on the mechanics of the library you use to consume the messages), it might be possible to filter out messages based on message headers, maybe? That way you would not need to deserialize the message key/value before deciding if the message should be dropped or not. /Anders On Mon, Jun 13, 2022 at 5:53 PM abdelali elmantagui < abdelalielmantagui@gmail.com > wrote: > Hi All, > > I started a couple of weeks ago learning Kafka, and my goal is to optimize > an existing architecture that uses Kafka in its components. > The problem is that there many microservices that produce messages/events > to the the kafka topic and in the other hand theres other microservices > that consumes these messages/events and each microservice have to consume > all the messages and then filter which message are interested in and that > create a problem of huge memory usage ...

Re: Help on MM2 -- only copy new data

Thanks, Jamie: I tried, but still not working☹, here is the updated properties file: clusters=CDL,PC CDL.bootstrap.servers=xxx PC.bootstrap.servers=xx PC.security.protocol=SSL consumer.auto.offset.reset=earliest group.id =test-mm2-2 CDL->PC.enabled=true topics= .* groups= .* topic.blacklist= .*[\~\.]internal, .*\.replica, __.*, heartbeats,nas\.vantage.*,INTEGRATION.*, ADPRM\..*, ADP_LIFION_INTEGRATION\..*, AES\..*, BENEFITS\..*, DMX\..*, NASBILLING\..*, TAAS\..* groups.blacklist= console-consumer-.*, connect-.*, __.* tasks.max= 10 #emit.checkpoints.enabled= true #sync.group.offsets.enabled= true #offset.lag.max= 9223372036854775807 replication.factor= 3 checkpoints.topic.replication.factor= 3 heartbeats.topic.replication.factor= 3 offset-syncs.topic.replication.factor= 3 offset.storage.replication.factor= 3 status.storage.replication.factor= 3 config.storage.replication.factor= 3 transaction.state.log.replication.factor= 3 On 6/13/22, 4:25 ...

Re: Kafka consumer filtering

Hi abdelali, If you can't get your producers to send the different types of events to different topics (or you don't want to) you could use Kafka streams to filter the data in the topic to new topics that are subsets of the data.  I have also seen apache spark used to do similar. Thanks, Jamie  Sent from the all-new AOL app for iOS On Monday, June 13, 2022, 4:53 pm, abdelali elmantagui < abdelalielmantagui@gmail.com > wrote: Hi All, I started a couple of weeks ago learning Kafka, and my goal is to optimize an existing architecture that uses Kafka in its components. The problem is that there many microservices that produce messages/events to the the kafka topic and in the other hand theres other microservices that consumes these messages/events and each microservice have to consume all the messages and then filter which message are interested in and that create a problem of huge memory usage because of the huge anount of objects created...

Re: Help on MM2 -- only copy new data

Hi An, Try setting consumer.auto.offset.reset=earliest  This will mean if the consumers in the MM2 tasks don't have an offset for a topic partition they will start at the beginning. If you've already enabled the below you might have to use a new group.id so that the consumers no longer have an offset that is valid. Since when you stated MM2 with below config it will now have an offset stored in the offset storage topic for each topic partition being consumed and therefore the auto.offset.reset config won't be applied. Thanks, Jamie Sent from the all-new AOL app for iOS On Tuesday, June 14, 2022, 12:19 am, An, Hongguo (CORP) <Hongguo.An@ADP.com.INVALID> wrote: Hi: I am trying to use MM2  ( 3.3.0) to migration Kafka to a new server, but looks like it only copys new data, any existing data is not copied. Here is the connection properties: clusters=CDL,PC CDL.bootstrap.servers=xxx PC.bootstrap.servers=xxx PC.security....

Re: Help on MM2 -- only copy new data

Correction, its 3.2.0 From: "An, Hongguo (CORP)" <Hongguo.An@ADP.com> Date: Monday, June 13, 2022 at 4:12 PM To: " users@kafka.apache.org " < users@kafka.apache.org > Subject: Help on MM2 -- only copy new data Hi: I am trying to use MM2 ( 3.2.0) to migration Kafka to a new server, but looks like it only copys new data, any existing data is not copied. Here is the connection properties: clusters=CDL,PC CDL.bootstrap.servers=xxx PC.bootstrap.servers=xxx PC.security.protocol=SSL CDL->PC.enabled=true topics= .* groups= .* topic.blacklist= .*[\~\.]internal, .*\.replica, __.*, heartbeats,nas\.vantage.*,INTEGRATION.*, ADPRM\..*, ADP_LIFION_INTEGRATION\..*, AES\..*, BENEFITS\..*, DMX\..*, NASBILLING\..*, TAAS\..* groups.blacklist= console-consumer-.*, connect-.*, __.* tasks.max= 10 #emit.checkpoints.enabled= true #sync.group.offsets.enabled= true #offset.lag.max= 9223372036854775807 rep...

Help on MM2 -- only copy new data

Hi: I am trying to use MM2 ( 3.3.0) to migration Kafka to a new server, but looks like it only copys new data, any existing data is not copied. Here is the connection properties: clusters=CDL,PC CDL.bootstrap.servers=xxx PC.bootstrap.servers=xxx PC.security.protocol=SSL CDL->PC.enabled=true topics= .* groups= .* topic.blacklist= .*[\~\.]internal, .*\.replica, __.*, heartbeats,nas\.vantage.*,INTEGRATION.*, ADPRM\..*, ADP_LIFION_INTEGRATION\..*, AES\..*, BENEFITS\..*, DMX\..*, NASBILLING\..*, TAAS\..* groups.blacklist= console-consumer-.*, connect-.*, __.* tasks.max= 10 #emit.checkpoints.enabled= true #sync.group.offsets.enabled= true #offset.lag.max= 9223372036854775807 replication.factor= 3 checkpoints.topic.replication.factor= 3 heartbeats.topic.replication.factor= 3 offset-syncs.topic.replication.factor= 3 offset.storage.replication.factor= 3 status.storage.replication.factor= 3 config.storage.repli...

Kafka consumer filtering

Hi All, I started a couple of weeks ago learning Kafka, and my goal is to optimize an existing architecture that uses Kafka in its components. The problem is that there many microservices that produce messages/events to the the kafka topic and in the other hand theres other microservices that consumes these messages/events and each microservice have to consume all the messages and then filter which message are interested in and that create a problem of huge memory usage because of the huge anount of objects created in the memory after deserilaization of these messages. Am asking for any concept or solution that can help in this situation. Kind Regards, Abdelali +--------------------+ +-----------------+ | | +------------------+ | microservices |---------->| Kafka topic | --------> | microservices | +-----------------+ | | +------------------+ +--------------------+

source topic partitions not assigned evenly to kafka stream threads

Hi, I have a microservice hosting kafka streams application. I have 3 instances of microservice hosted in 3 pods, each is having 2 kafka stream threads, thus total 6 stream threads as part of consumer group. There are 3 source topics: A, B, C each having 12, 6, 6 partitions respectively i.e. total 24 source topic partitions. Now, the issue I am facing is the distribution of source topic partitions among stream threads. Considering that 6 streams threads and overall 24 topic partitions, each stream thread is assigned 4 partitions, so no issue there. However the main issue arises is that of partitions assigned to stream thread from each topic. So, sometimes, I am getting 4 partitions from Topic A assigned to stream thread 1, while other stream thread will get partitions from Topic B and C only. What I am expecting is, partitions from each topic will get evenly distributed among 6 threads, so each stream thread should get 2 partitions of topic A, and 1 partition of to...

Re: Random continuous TimeoutException with Topic not present on one KafkaProducer out of many in multithreaded env

incream block time to 3 minutes ! it is an exception about update metadata,try it! On Tue, Jun 7, 2022 at 3:15 PM Deepak Jain < deepak.jain@cumulus-systems.com > wrote: > Hi Luke, > > The complete exception is > > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Topic realtimeImport_1 not > present in metadata after 250 ms. > at > org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1316) > at > org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:985) > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885) > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:773) > > Even though the topic is created and used but it still throws this > exception and fails the operation. > > R...

Re: Broker allows transactions with generation.id -1 and could lead to duplicates

Hi Gabriel, Sounds like a bug to me (although we didn't document anywhere about the generation id will always start from 0). You can file a jira and we can discuss it there. Thank you. Luke On Fri, Jun 10, 2022 at 9:35 PM Gabriel Giussi < gabrielgiussi@gmail.com > wrote: > I did the following test that allowed me to introduce a duplicate message > in the output topic. > > > 1. Client A starts the consumer and the producer and holds a reference to > the current groupMetadata wich has generation.id -1 since the consumer > didn't join the group yet > 2. Client A joins the group and gets assigned partition 0 and 1 > 3. Client A polls a message with offset X from partition 1, produces to > output topic and enters a long gc pause (before calling > sendOffsetsToTransation) > 4. Client B starts the consumer and the producer, also getting a reference > to groupMetadata with generation.id -1 > 5. Client B join...

Broker allows transactions with generation.id -1 and could lead to duplicates

I did the following test that allowed me to introduce a duplicate message in the output topic. 1. Client A starts the consumer and the producer and holds a reference to the current groupMetadata wich has generation.id -1 since the consumer didn't join the group yet 2. Client A joins the group and gets assigned partition 0 and 1 3. Client A polls a message with offset X from partition 1, produces to output topic and enters a long gc pause (before calling sendOffsetsToTransation) 4. Client B starts the consumer and the producer, also getting a reference to groupMetadata with generation.id -1 5. Client B joins the group and gets assigned partition 1 6. Client B polls a message with offset X from partition 1, produces to output topic, sends offset with generation.id -1, and commits successfully. 7. Client A comes back and send offsets with generation.id -1 and commits successfully I did this test because it wasn't so clear for me at which moment I had to ...

Mirror Maker 2.0

Good morning,   I would like to have more information about the acquisition of the licence Mirror Maker 2.0 and the commercial support that is offered   Thank you very much !         JosĆ©e Brodeur ConseillĆØre Approvisionnement stratĆ©gique, Technologies d’affaires PremiĆØre vice-prĆ©sidence OpĆ©rations   MontrĆ©al 514 281-7000, poste 5102108 1 866 866-7000, poste 5102108       Ce courriel est confidentiel, peut ĆŖtre protĆ©gĆ© par le secret professionnel et est adressĆ© exclusivement au destinataire. Il est strictement interdit Ć  toute autre personne de diffuser, distribuer ou reproduire ce message. Si vous l'avez reƧu par erreur, veuillez immĆ©diatement le dĆ©truire et aviser l'expĆ©diteur. Merci.    

Kafka use case for my setup

Hi there, I would like to automate some of my tasks using Apache Kafka. Previously i used to do the same using Apache Airflow and which worked fine. But i want to explore the same using Kafka whether this works better than Airflow or not. 1) Kafka runs on Server A 2) Kafka searches for a file named test.xml on Server B, here kafka search for every 10 or 20 mins whether this file created or not. 3) Once kafka sense the file created, then the job starts as follows a)Create a jira ticket and update all the executions on jira for each events b) Trigger a rsync command c) Then unarchive the files using tar command d) Some script to execute using the unarchive files e) Then archive the files and rsync to different location f) Send email once all task finished Please advise if this is something kafka intelligent to begin with? Or if you have any other open source products which can do this actions , please let me know. By the way i prefer to setup these on docker-com...

Kafka use case for my setup

Hi there, I would like to automate some of my tasks using Apache Kafka. Previously i used to do the same using Apache Airflow and which worked fine. But i want to explore the same using Kafka whether this works better than Airflow or not. 1) Kafka runs on Server A 2) Kafka searches for a file named test.xml on Server B, here kafka search for every 10 or 20 mins whether this file created or not. 3) Once kafka sense the file created, then the job starts as follows a)Create a jira ticket and update all the executions on jira for each events b) Trigger a rsync command c) Then unarchive the files using tar command d) Some script to execute using the unarchive files e) Then archive the files and rsync to different location f) Send email once all task finished Please advise if this is something kafka intelligent to begin with? Or if you have any other open source products which can do this actions , please let me know. By the way i prefer to setup these on docker-com...

Re: How it is safe to break message ordering but not idempotency after getting an OutOfOrderSequenceException?

Thanks for the answer Matthias. I still have doubts about the meaning of "risks reordering of sent record". If I understood correctly the example you gave is something like this 1. Producer sends batch with sequence number X 2. That request gets lost in the network 3. Producer sends batch with sequence number X+1 4. Broker receives batch with sequence number X+1 and returns an error and the Producer throws a OutOfOrderSequenceException In that situation we could keep retrying sending batch with sequence number X+1 but we will keep getting a OutOfOrderSequenceException, or we ideally also resend a batch with sequence number X, and after being accepted send the one with X+1. If what I'm saying is correct then I can't see how this can reorder the messages, I mean if both batches include a message being written to topic A, could messages from batch with sn X+1 end up being persisted with an offset lesser than the ones from the batch with sn X? Does this ...

Re: How it is safe to break message ordering but not idempotency after getting an OutOfOrderSequenceException?

Yes, the broker de-dupes using the sequence number. But for example, if a sequence number is skipped, you could get this exception: the current batch of messages cannot be appended to the log, as one batch is missing, and the producer would need to re-send the previous/missing batch with lower sequence number before it can move to the "next" (ie current) batch. Does this make sense? -Matthias On 5/27/22 10:43 AM, Gabriel Giussi wrote: > The docs say > "This exception indicates that the broker received an unexpected sequence > number from the producer, which means that data may have been lost. If the > producer is configured for idempotence only (i.e. if enable.idempotence is > set and no transactional.id is configured), it is possible to continue > sending with the same producer instance, but doing so risks reordering of > sent record" > > Isn't the broker using the monotonically increasing sequence num...

Re: Newbie how to get key/value pojo out of a stream?

`enable.auto.commit` is a Consumer config and does not apply to Kafka Stream. In Kafka Streams, you basically always have auto commit enabled, and you can control how frequently commits happen via ` commit.interval.ms `. Also on `close()` Kafka Streams would commit offsets. -Matthias On 5/31/22 12:29 PM, Luca wrote: > Hi Andy, > > The defaults are sensible enough that, under normal operational conditions, your app should pick up from where it left. To dig a little more into this, I suggest you look into `auto.offset.reset` and `enable.auto.commit` options. > > In case, you do need to reprocess everything, kafka streams comes with a handy reset tool. You can read about it here: https://kafka.apache.org/32/documentation/streams/developer-guide/app-reset-tool.html > > Luca > > On Tue, May 31, 2022, at 5:17 PM, andrew davidson wrote: >> Thanks Luca >> >> This is exactly what I was looking for. >> >...

RE: Random continuous TimeoutException with Topic not present on one KafkaProducer out of many in multithreaded env

Hi Luke, The complete exception is java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Topic realtimeImport_1 not present in metadata after 250 ms. at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:1316) at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:985) at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885) at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:773) Even though the topic is created and used but it still throws this exception and fails the operation. Regards, Deepak From: Luke Chen < showuon@gmail.com > Sent: 07 June 2022 11:46 To: Deepak Jain < deepak.jain@cumulus-systems.com > Cc: users@kafka.apache.org Subject: Re: Random continuous TimeoutException with Topic not present on one KafkaProducer out of many in m...