Kafka

Posts

Showing posts from May, 2019

Re: Transactional markers are not deleted from log segments when policy is compact

Thanks Jonathan! That should help. On Fri, 31 May 2019, 6:44 pm Jonathan Santilli, < jonathansantilli@gmail.com > wrote: > Hello Pranavi, > > it sounds like this was solved in the release candidate 2.2.1-RC1 ( > https://issues.apache.org/jira/browse/KAFKA-8335 ) > Take a look at it. > > Hope that helps. > > > Cheers! > -- > Jonathan > > > > > On Fri, May 31, 2019 at 8:59 AM Pranavi Chandramohan < > pranavic@campaignmonitor.com > wrote: > > > Hi all, > > > > We use Kafka version 2.11-1.1.1. We produce and consume transactional > > messages and recently we noticed that 2 partitions of the > __consumer_offset > > topic have very high disk usage (256GB) > > When we looked at the log segments for these 2 partitions, there were > files > > that were 6 months old. > > By dumping the content of an old log segment using the following comma...

Re: Transaction support multiple producer instance

Hello Wenxuan, One KIP that we are considering so far is KIP-447: https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics It does not directly address your scenarios, but I'm wondering if you can adjust your code to group the producers if they are in the consumer -> producer pattern, in which case KIP-447 would help you making sure this group of producers, although each would still has its own txn, can be blocked on completing together. Guozhang On Fri, May 31, 2019 at 4:10 AM wenxuan < choose_home@126.com > wrote: > Hi Sandeep, > > Thanks for your replay. > > I have split the large message, but the problem is the split message can't > handled in one JVM or physical machine, even by multi-thread producer, > cause CPU or other resource bottleneck. > > So I need make multiple producer instances in different physical machine > as a distributed system. However...

Re: [VOTE] 2.2.1 RC1

+1 non-binding We've been running it for a few days in a few clusters, so far no issues. I also ran unit tests and checked signatures. Thanks Vahid for running this release On Fri, May 31, 2019 at 9:01 AM Andrew Schofield < andrew_schofield@live.com > wrote: > > +1 (non-binding) > > Built and ran source and sink connectors. > > Andrew Schofield - IBM > > On 31/05/2019, 08:55, "Viktor Somogyi-Vass" < viktorsomogyi@gmail.com > wrote: > > +1 (non-binding) > > 1. Ran unit tests > 2. Ran some basic automatic end-to-end tests over plaintext and SSL too > 3. Ran systests sanity checks > > Viktor > > On Thu, May 23, 2019 at 6:04 PM Harsha < kafka@harsha.io > wrote: > > > +1 (binding) > > > > 1. Ran unit tests > > 2. System tests > > 3. 3 node cluster with few manual tests. > > > ...

Re: Transaction support multiple producer instance

Hi Sandeep, Thanks for your replay. I have split the large message, but the problem is the split message can't handled in one JVM or physical machine, even by multi-thread producer, cause CPU or other resource bottleneck. So I need make multiple producer instances in different physical machine as a distributed system. However sharing the same txn id is not supported. Is there any way to solve this. Thanks, Wenxuan On 2019/05/31 10:28:53, Sandeep Nemuri < n...@gmail.com > wrote: > How about splitting the large message and then produce?> > > On Fri, May 31, 2019 at 3:39 PM wenxuan < ch...@126.com > wrote:> > > > Hi Jonathan,> > >> > > Thanks for your reply.> > >> > > I get mass message to send beyond the limit of one JVM or physical> > > machine, so I need make more than one producer in the same transaction.> > >> > > Since multiple producer c...

Re: Transaction support multiple producer instance

How about splitting the large message and then produce? On Fri, May 31, 2019 at 3:39 PM wenxuan < choose_home@126.com > wrote: > Hi Jonathan, > > Thanks for your reply. > > I get mass message to send beyond the limit of one JVM or physical > machine, so I need make more than one producer in the same transaction. > > Since multiple producer can't share the same transaction id, Is there way > to achieve multiple producer transaction described above. > > Thanks, > Wenxuan > > On 2019/05/31 08:34:14, Jonathan Santilli < j...@gmail.com > wrote: > > Hello Wenxuan, there reason of the Exception, by design the transaction > Id> > > must be unique per producer instance, this is from the Java docs:> > > > > > https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html > > > > > > "The purpose of the transactional.id is to e...

Re: Transaction support multiple producer instance

Hi Jonathan, Thanks for your reply. I get mass message to send beyond the limit of one JVM or physical machine, so I need make more than one producer in the same transaction. Since multiple producer can't share the same transaction id, Is there way to achieve multiple producer transaction described above. Thanks, Wenxuan On 2019/05/31 08:34:14, Jonathan Santilli < j...@gmail.com > wrote: > Hello Wenxuan, there reason of the Exception, by design the transaction Id> > must be unique per producer instance, this is from the Java docs:> > > https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html > > > "The purpose of the transactional.id is to enable transaction recovery> > across multiple sessions of a single producer instance. It would typically> > be derived from the shard identifier in a partitioned, stateful,> > application. As such, it should be unique to eac...

Re: Transactional markers are not deleted from log segments when policy is compact

Hello Pranavi, it sounds like this was solved in the release candidate 2.2.1-RC1 ( https://issues.apache.org/jira/browse/KAFKA-8335 ) Take a look at it. Hope that helps. Cheers! -- Jonathan On Fri, May 31, 2019 at 8:59 AM Pranavi Chandramohan < pranavic@campaignmonitor.com > wrote: > Hi all, > > We use Kafka version 2.11-1.1.1. We produce and consume transactional > messages and recently we noticed that 2 partitions of the __consumer_offset > topic have very high disk usage (256GB) > When we looked at the log segments for these 2 partitions, there were files > that were 6 months old. > By dumping the content of an old log segment using the following command > > kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration > --print-data-log --files 00000000003949894887.log | less > > > we found that all the records were COMMIT transaction markers. > > offset: 1924582627 position: 183 Cre...

Re: Transaction support multiple producer instance

Hello Wenxuan, there reason of the Exception, by design the transaction Id must be unique per producer instance, this is from the Java docs: https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html "The purpose of the transactional.id is to enable transaction recovery across multiple sessions of a single producer instance. It would typically be derived from the shard identifier in a partitioned, stateful, application. As such, it should be unique to each producer instance running within a partitioned application." You must have a reason to instantiate multiple producers, however, have you try just to instantiate one producer? "The producer is *thread safe* and sharing a single producer instance across threads will generally be faster than having multiple instances." Hope that helps. Cheers, -- Jonathan On Fri, May 31, 2019 at 5:47 AM wenxuan < choose_home@126.com > wrote: > Hey guys, ...

Re: [VOTE] 2.2.1 RC1

+1 (non-binding) Built and ran source and sink connectors. Andrew Schofield - IBM On 31/05/2019, 08:55, "Viktor Somogyi-Vass" < viktorsomogyi@gmail.com > wrote: +1 (non-binding) 1. Ran unit tests 2. Ran some basic automatic end-to-end tests over plaintext and SSL too 3. Ran systests sanity checks Viktor On Thu, May 23, 2019 at 6:04 PM Harsha < kafka@harsha.io > wrote: > +1 (binding) > > 1. Ran unit tests > 2. System tests > 3. 3 node cluster with few manual tests. > > Thanks, > Harsha > > On Wed, May 22, 2019, at 8:09 PM, Vahid Hashemian wrote: > > Bumping this thread to get some more votes, especially from committers, > so > > we can hopefully make a decision on this RC by the end of the week. > > > > Thanks, > > --Vahid > > > > On Mon,...

Transactional markers are not deleted from log segments when policy is compact

Hi all, We use Kafka version 2.11-1.1.1. We produce and consume transactional messages and recently we noticed that 2 partitions of the __consumer_offset topic have very high disk usage (256GB) When we looked at the log segments for these 2 partitions, there were files that were 6 months old. By dumping the content of an old log segment using the following command kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files 00000000003949894887.log | less we found that all the records were COMMIT transaction markers. offset: 1924582627 position: 183 CreateTime: 1548972578376 isvalid: true keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 126015 producerEpoch: 0 sequence: -1 isTransactional: true headerKeys: [] endTxnMarker: COMMIT coordinatorEpoch: 28 Why are the commit transaction markers not compacted and deleted? Log cleaner config max.message.bytes 10000120 min.cleanable.dirty.ratio 0.1 compression.ty...

Re: [VOTE] 2.2.1 RC1

+1 (non-binding) 1. Ran unit tests 2. Ran some basic automatic end-to-end tests over plaintext and SSL too 3. Ran systests sanity checks Viktor On Thu, May 23, 2019 at 6:04 PM Harsha < kafka@harsha.io > wrote: > +1 (binding) > > 1. Ran unit tests > 2. System tests > 3. 3 node cluster with few manual tests. > > Thanks, > Harsha > > On Wed, May 22, 2019, at 8:09 PM, Vahid Hashemian wrote: > > Bumping this thread to get some more votes, especially from committers, > so > > we can hopefully make a decision on this RC by the end of the week. > > > > Thanks, > > --Vahid > > > > On Mon, May 13, 2019 at 8:15 PM Vahid Hashemian < > vahid.hashemian@gmail.com > > > wrote: > > > > > Hello Kafka users, developers and client-developers, > > > > > > This is the second candidate for release of Apache Kafka 2.2.1. > > > > ...

Transaction support multiple producer instance

Hey guys, I have problem in the below scenario. I hope to run multiple producer instances that send message concurrently in the same transaction, and the transaction is committed when all the producer send message successfully. Otherwise, if one producer failed, the transaction is aborted and no message will be consumed. However, when multiple producer share the same txn id, throw the following exception: org.apache.kafka.common.KafkaException: Cannot execute transactional method because we are in an error state at org.apache.kafka.clients.producer.internals.TransactionManager.maybeFailWithError(TransactionManager.java:784) at org.apache.kafka.clients.producer.internals.TransactionManager.beginTransaction(TransactionManager.java:215) at org.apache.kafka.clients.producer.KafkaProducer.beginTransaction(KafkaProducer.java:606) at com.matt.test.kafka.producer.ProducerTransactionExample.main(ProducerTransactionExample.java:68) Caused by: org.apache.kafka.common....

Re: MirrorMaker 2.0 XDCR / KIP-382

Jeremy, thanks for double checking. I think you are right -- this is a regression introduced here [1]. For context, we noticed that heartbeats were only being sent to target clusters, whereas they should be sent to every cluster regardless of replication topology. To get heartbeats running everywhere, I seem to be updating the configuration across all clusters, yielding the behavior you are seeing. Thanks for reporting this. I'll get this fixed within the next few days and let you know. In the meantime, you can use the same configuration in both DCs, and set "tasks.max" to some high number to ensure the replication load is balanced across DCs. Ryanne 1: https://github.com/apache/kafka/pull/6295/commits/4dde001d5a521188005deb488fec5129a43eac6a#diff-823506b05664108f35046387b5fb43ecR104 On Thu, May 30, 2019 at 11:53 AM Jeremy Ford < jeremy.l.ford@gmail.com > wrote: > Apologies, copy/paste issue. Config should look like: > > In DC1:...

Re: MirrorMaker 2.0 XDCR / KIP-382

Apologies, copy/paste issue. Config should look like: In DC1: DC1->DC2.enabled = true DC2->DC1.enabled = false In DC2: DC1->DC2.enabled = false DC2->DC1.enabled = true Running 1 mm2 node in DC1 / DC2 each. If I start up the DC1 node first, then DC1 data is replicated to DC2. DC2 data does not replicate. Inverting the start order inverts the cluster that gets replicated. Running the DC2 config locally and debugging it, it seems that that the Source connector task is not started. I'm wondering if somehow the two DCs are conflicting about what should be running since they share the same group names/ connect name, etc. I tried overriding the groud.id and name of the connectors which resulted in no replication. Not quite sure what could be going wrong. On Thu, May 30, 2019 at 11:26 AM Ryanne Dolan < ryannedolan@gmail.com > wrote: > Hey Jeremy, it looks like you've got a typo or copy-paste artifact in the > configura...

PR Review Request

Hello, Could someone review/merge the PR ? https://github.com/apache/kafka/pull/6771 It had some transient integration test failure, but now it's passing. Thanks,

Re: MirrorMaker 2.0 XDCR / KIP-382

Hey Jeremy, it looks like you've got a typo or copy-paste artifact in the configuration there -- you've got DC1->DC2 listed twice, but not the reverse. That would result in the behavior you are seeing, as DC1 actually has nothing enabled. Assuming this is just a mistake in the email, your approach is otherwise correct. Ryanne On Thu, May 30, 2019 at 7:56 AM Jeremy Ford < jeremy.l.ford@gmail.com > wrote: > I am attempting to setup a simple cross data center replication POC using > the new mirror maker branch. The behavior is not quite what I was > expecting, so it may be that I have made some assumptions in terms of > deployment that are incorrect or my setup is incorrect (see below). When I > run the two MMs, it seems like replication will work for one DC but not for > the other. If I run MM2 on just one node and enable both pairs, then > replication works as expected. However, that deployment does not match the > descr...

Re: Restart of kafka-streams with more than one partition on topic is reprocessing the data from beginning of the topic

Hello Kalyani, am really happy that it worked for you, am also waiting for the release. Thanks a lot Vahid for managing the release. Cheers! -- Jonathan On Thu, May 30, 2019 at 1:30 PM kalyani.yarlagadda1@gmail.com < kalyani.yarlagadda1@gmail.com > wrote: > Hi Jonathan, > > Thanks for the suggestion. It is working fine with the kafka 2.2.1-rc1 > vesrion. > Do we have any tentative release date for this version. Please let me know > if you have any info on this. > Thanks in Advance. > > Thanks > Kalyani > > On 2019/05/24 08:00:33, Jonathan Santilli < jonathansantilli@gmail.com > > wrote: > > Hello Kalyani, > > > > try testing the RC kafka-2.2.1-rc1, for what you describe seems to be a > > problem that has been solved in the version 2.2.1 ( > > https://issues.apache.org/jira/browse/KAFKA-7895 ) (which is under voting > > right now 2.2.1-RC1 > > https://ho...

Re: Restart of kafka-streams with more than one partition on topic is reprocessing the data from beginning of the topic

Hi Kalyani, I'm glad to hear that the issue you reported will be fixed in 2.2.1. The release is currently being voted on and if no issues is reported on the current release candidate we'd roll it out within the next week or so. --Vahid On Thu, May 30, 2019, 05:30 kalyani.yarlagadda1@gmail.com < kalyani.yarlagadda1@gmail.com > wrote: > Hi Jonathan, > > Thanks for the suggestion. It is working fine with the kafka 2.2.1-rc1 > vesrion. > Do we have any tentative release date for this version. Please let me know > if you have any info on this. > Thanks in Advance. > > Thanks > Kalyani > > On 2019/05/24 08:00:33, Jonathan Santilli < jonathansantilli@gmail.com > > wrote: > > Hello Kalyani, > > > > try testing the RC kafka-2.2.1-rc1, for what you describe seems to be a > > problem that has been solved in the version 2.2.1 ( > > https://issues.apache.org/jira/browse/KAFKA-78...

MirrorMaker 2.0 XDCR / KIP-382

I am attempting to setup a simple cross data center replication POC using the new mirror maker branch. The behavior is not quite what I was expecting, so it may be that I have made some assumptions in terms of deployment that are incorrect or my setup is incorrect (see below). When I run the two MMs, it seems like replication will work for one DC but not for the other. If I run MM2 on just one node and enable both pairs, then replication works as expected. However, that deployment does not match the described setup in the KIP-382 documentation. Should I be using the MM driver to deploy in both DCs? Or do I need to use a connect cluster instead? Is my configuration (included below) possibly incorrect? Thanks, Jeremy Ford Setup: I have two data centers. I have MM2 deployed in both DCs on a single node. I am using the MIrrorMaker driver for the deployment. The configuration for both DCs is exactly the same, except the enabled flag. Config File: cluster...

Re: Restart of kafka-streams with more than one partition on topic is reprocessing the data from beginning of the topic

Hi Jonathan, Thanks for the suggestion. It is working fine with the kafka 2.2.1-rc1 vesrion. Do we have any tentative release date for this version. Please let me know if you have any info on this. Thanks in Advance. Thanks Kalyani On 2019/05/24 08:00:33, Jonathan Santilli < jonathansantilli@gmail.com > wrote: > Hello Kalyani, > > try testing the RC kafka-2.2.1-rc1, for what you describe seems to be a > problem that has been solved in the version 2.2.1 ( > https://issues.apache.org/jira/browse/KAFKA-7895 ) (which is under voting > right now 2.2.1-RC1 > https://home.apache.org/~vahid/kafka-2.2.1-rc1/RELEASE_NOTES.html ) > I have tested the App that was suffering that problem and now is solved. Of > course, you need to test your own App. > > Hope that helps. > > Cheers! > -- > Jonathan > > > > On Thu, May 23, 2019 at 5:37 PM kalyani yarlagadda < > kalyani.yarlagadda1@gmail.com >...

Re: Can kafka brokers have SASL_SSL, SSL and PLAINTEXT listener at same time?

Hi Shrinivas, Yes, you just have each listen on a different port.and change your clients to point to the one you wish to use. Thanks, Steve On Thu, May 30, 2019, 6:57 AM Shrinivas Kulkarni < shrinivas.n.kulkarni@gmail.com wrote: > Hi, > > I want to migrate from SSL to SASL_SSL on the Kafka. Is there a way to > enable both SSL and SASL at the same time in a Kafka cluster. Also, I want > PLAINTEXT to be enabled for the internal users. Is this possible in Kafka > 1.1.1 > > Please let me know. > > Thanking in anticipation, > Shrinivas >

Can kafka brokers have SASL_SSL, SSL and PLAINTEXT listener at same time?

Hi, I want to migrate from SSL to SASL_SSL on the Kafka. Is there a way to enable both SSL and SASL at the same time in a Kafka cluster. Also, I want PLAINTEXT to be enabled for the internal users. Is this possible in Kafka 1.1.1 Please let me know. Thanking in anticipation, Shrinivas

Re: Microservices?

Hi, If you want to know how Kafka is designed and implemented, please see the documentation under https://kafka.apache.org/documentation/ Especially sections "Getting Started", "Design", and "Implementation". Best, Bruno On Mon, May 27, 2019 at 6:03 AM V1 < vyshnavkr@gmail.com > wrote: > > Hi team Kafka, > I'm looking forward to contribute to Apache Kafka. > May I know if Kafka is built on microservices architecture?

Re: Stream application :: State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED continuously

Hi Guozhang, Yes, indeed. We found that whenever the changelog offsets were very high it happened. For now, we are trying the rc1 version on our staging environment. Will update on this thread if the fix that Jonathan mentioned above worked for us as well or not. (for downstream to be flooded with suppress) On Fri, May 24, 2019 at 11:18 PM Guozhang Wang < wangguoz@gmail.com > wrote: > Hello Nayanjyoti, > > Regarding the KIP-328, on-disk buffer is indeed being implemented but it > has not been completed and unfortunately has to slip to the next release. > > Now about the "PARTITIONS_REVOKED to PARTITIONS_ASSIGNED" issue, it is > possible that if you are restoring tons of data from the changelog, then it > took long time and while you are doing it since stream did not call > consumer.poll() in time it would be kicked out of the group again. > > > Guozhang > > > On Tue, May 21, 2019 at 5:50 AM Jonath...