Kafka

Posts

Showing posts from April, 2019

Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

Hi Ashok, I think some people may be able to give you advice, but please start a new thread instead of replying to an existing message. This just helps keep all the messages organized. Thanks! -John On Thu, Apr 25, 2019 at 6:12 AM ASHOK MACHERLA < iAshok7@outlook.com > wrote: > Hii, > > what I asking > > I want to know about kafka partitions > > > we have getting data about 200GB+ from sources to kafka for daily . > > I need to know how many partitions are required to pull data from source > without pileup. > > please suggest us to fix this issue. > > is there any mathematical rules to create specific no.of partitions for > Topic.??? > > > please help me.... > > Sent from Outlook< http://aka.ms/weboutlook > > ________________________________ > From: Jose Lopez < joseariaslopez@gmail.com > > Sent: 25 April 2019 16:34 > To: users@kafka.apache.org > Subj...

Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

Hey, Jose, This is an interesting thought that I hadn't considered before. I think (tentatively) that windowsFor should *not* take the grace period into account. What I'm thinking is that the method is supposed to return "all windows that contain the provided timestamp" . When we keep window1 open until stream time 7, it's because we're waiting to see if some record with a timestamp in range [0,5) arrives before the overall stream time ticks past 7. But if/when we get that event, its own timestamp is still in the range [0-5). For example, its timestamp is *not* 6 (because then it would belong in window2, not window1). Thus, window1 does not "contain" the timestamp 6, and therefore, windowsFor(6) is not required to return window 1. Does that seem right to you? -John On Thu, Apr 25, 2019 at 6:04 AM Jose Lopez < joseariaslopez@gmail.com > wrote: > Hi all, > > Given that gradePeriodMs is "the time to admit ...

My IDE which is sitting outside of the kube cluster creates a producer that attempts to connect to kafka using the cluster dns name of the headless service

Hi! The Kafka chart being used is here https://github.com/bitnami/charts/tree/master/bitnami/kafka | chart | application | | kafka-1.10.1 | 2.2.0 | $ kubectl get svc -n kafka NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kafka-janitha ClusterIP 10.103.152.8 <none> 9092/TCP 5d19h kafka-janitha-headless ClusterIP None <none> 9092/TCP 5d19h kafka-janitha-zookeeper ClusterIP 10.108.191.161 <none> 2181/TCP,2888/TCP,3888/TCP 5d19h kafka-janitha-zookeeper-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP 5d19h I am then port-forwarding like so kubectl port-forward --namespace postgresql svc/postgres-janitha-postgresql 5432:5432 --address 0.0.0.0 & kubectl port-forward --namespace kafka svc/kafka-janitha 9092:9092 ...

Re: KSQL Question

Hi Shalom, This is the Github repo for KSQL: https://github.com/confluentinc/ksql However, in order to get that running you have to download few libraries KSQL depends on. And you'll need Kafka. For the sake of experimentation you are probably better off using the all-in-one Confluent Platform. It will save you some time. I hope that helps. --Vahid On Tue, Apr 30, 2019 at 7:38 AM shalom sagges < shalomsagges@gmail.com > wrote: > Hi All, > > I'm new to Kafka and wanted to experience with KSQL. > However, I can't find a place to download it without downloading the entire > Confluent Platform package. > Please correct me if I'm wrong, but I understand that KSQL is free, so > there must be a place where I can download only KSQL. > Can someone please assist me with this? > > Thanks a lot! :) > -- Thanks! --Vahid

KSQL Question

Hi All, I'm new to Kafka and wanted to experience with KSQL. However, I can't find a place to download it without downloading the entire Confluent Platform package. Please correct me if I'm wrong, but I understand that KSQL is free, so there must be a place where I can download only KSQL. Can someone please assist me with this? Thanks a lot! :)

RE: Required guidelines for kafka upgrade

Dear Team Members Currently we are using Kafka 0.10.1, Zookeeper 3.4.5 versions, we are planning to upgrade to Kafka 2.2.0 version. We have 3 node Kafka cluster(3 Zookeeper nodes , 3 Kafka nodes) As suggested by you, Can I follow these steps 1. First, I have to download, new Kafka 2.1.0 tar file, and untar that file. In config directory server.properties I have to add the below new two parameters inter.broker.protocol.version=0.10.1 log.message.format.version=0.10.1 1. After that stop old Kafka broker version(0.10.1) and then start new Kafka broker (2.2.0) in one node. During this 1st node running with latest version and remaining two nodes are running with old version. And then stop old Kafka in second node, and then start new Kafka broker (2.2.0) node, Same thing we have to do it remaining third. Now all three Kafka nodes are running with new version 2.2.0 1. After that in server.proeprties we have t...

Mirror Maker tool is not running

Dear Team Please help us , our mirror maker tool is not running properly Please look into this belowmirror maker log file exceptions ****************************************************************** [2019-03-11 18:34:31,906] ERROR Error when sending message to topic audit-logs with key: null, value: 304134 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept. [2019-03-11 18:34:31,909] FATAL [mirrormaker-thread-15] Mirror maker thread failure due to (kafka.tools.MirrorMaker$MirrorMakerThread) java.lang.IllegalStateException: Cannot send after the producer is closed. at org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:185) at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:474) at org.apac...

usubscribe

Re: How to count number of available messages per topic?

-----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIzBAEBCgAdFiEE8osu2CcCCF5douGQu8PBaGu5w1EFAlzHa44ACgkQu8PBaGu5 w1FqPg/+OV64wEGxnp6hOoi8q0F+SY/UlZntqgxrAmeWgDKrZR0AGu2WVowZcKrR 6Or4CaYMFYVqmMNsCJFx+zEJbdKHJF0jsOBushcnrf4cYZ+S7Za92ZZ9Naxo0Cvo 6iCJZJYO6kAmsGvcgeCiSQvgBep5y6px8qo4bfauywOyKsVmC0m9NxvADfmoMCpv 8EC4Mt10Jb19EuKLJE5e+EbJa7MGK1lWumDfrCTrVHZRM1phMFv0DGEkOi4LnvhE bhPA3cm7b2S4SMTTYNEOZ/0R4B4vgycSo3U2WxnjGsNcFozR6JOzmhXZcmQ18dXf uuAbWcaBL2rf7jR6IwFuhebhUWrgKTh8DvTA5rEObFntX6qEEkVlrLBRW72OZAgD JJcQ/Eb0mZzlC4U4JDC7DJCwUZ2X7hohcxZXpMKemsRR8axCeel4t+t069M5XdCY mMCGo+B1gKxD1Mnq56mLd1/O9shwMBC1Lu6j3y25FblpI/8tDAVHwYTAq7ewHhGL v6BDR10qPY8jcijjnc0YZ24fw++5muvsOubCorfjDOE8ZCSecD6V9ioT7uF84WYD DfBjVkJVXXIon6F+Y0FoOQQKQbi5JZZpexq1YRWuQSJr8584pwB8gGKaulje9U5c KpARQpzgI/4oeB2ofPNLrUk6dcSw+GWegGDezoE8E+VL1wu8PTE= =2yH5 -----END PGP SIGNATURE----- Not really. The only exact way, would be to consume the topic, but this is rather expensive. Note: Th...

usubscribe

unsubscribe

Re: AW: Configuration of log compaction

I received the same error as well. Were any of you able to fix this issue? On 2018/12/18 09:42:58, Claudia Wegmann < c...@kasasi.de > wrote: > Hi Liam,> > > > > thanks for the pointer. I found out, that the log cleaner on all kafka brokers died with the following error message:> > > > > [2018-12-04 15:33:24,886] INFO Cleaner 0: Caught segment overflow error during cleaning: Detected offset overflow at offset -1 in segment LogSegment(baseOffset=3605669, size=6970326) (kafka.log.LogCleaner)> > > [2018-12-04 15:33:24,958] ERROR [kafka-log-cleaner-thread-0]: Error due to (kafka.log.LogCleaner)> > > java.lang.IllegalArgumentException: requirement failed: Split operation is only permitted for segments with overflow> > > at scala.Predef$.require(Predef.scala:277)> > > at kafka.log.Log.splitOverflowedSegment(Log.scala:1873)> > > at kafka.log.Clea...

Re: How to count number of available messages per topic?

Hey Matthias, Is there better way than Peter's suggestion? Is there a definite way to count the available messages regarding compaction and transactions? On Sun, Apr 28, 2019 at 7:41 PM Matthias J. Sax < matthias@confluent.io > wrote: > This won't work if your topic is compacted though. Also, if you are > using transactions, it might not be accurate, depending on how many > transaction markers are in the topics. > > -Matthias > > On 4/28/19 2:59 PM, Peter Bukowinski wrote: > > You'll need to do this programmatically with some simple math. There's a > binary included with kafka called kafka-run-class that you can use to > expose earliest and latest offset information. > > > > This will return the earliest unexpired offsets for each partition in a > topic: > > > > kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list > localhost:9092 --topic TOPIC --time -2 > > >...

Re: Slow cluster recover after a restart

Hello again community, any thoughts about this? I will really appreciate any clue here. Now we are facing another problem (after the previous one), even more serious since it does not allow the Broker to start: ERROR There was an error in one of the threads during logs loading: org.apache.kafka.common.errors.TransactionCoordinatorFencedException: Invalid coordinator epoch: 7 (zombie), 8 (current) (kafka.log.LogManager) ... ERROR [KafkaServer id=3] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) ... ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) This happens after the second restart according to the "Upgrading From Previous Versions" guide ( https://kafka.apache.org/documentation/#upgrade_2_2_0 ) Step four says: 4.- Restart the brokers one by one for the new protocol version to take effect. Once the brokers begin using the latest protocol version, it will no longer be possible to downgrade the cluster to an older version. When...

Re: How to count number of available messages per topic?

-----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIzBAEBCgAdFiEE8osu2CcCCF5douGQu8PBaGu5w1EFAlzF16AACgkQu8PBaGu5 w1FRLhAApU42TawOMGsMmU2CexQ2UoXuI5Rbd5BTatWpUBfTgvK6UTGVesgcw5s/ 5Zrv08tDnbSzRQRzn4AAnOcfZqFxtKtAEK7HIcDdyZqhP61rtNPh90Wz9xhEGUS1 +07OW2HiqgBKt46d+i866RL/54EIQmfQPpvBitI7NuAOKXYij3Q9t+QhQNgTHehj 3tBdGGcal3TT+xgTxd1vxKSQ78Kau9PRleEp6hA2v318LXPFJF5s7RmMcDjRvnEi nJlwADcQx3Jt13ebFMcCDNOyTYqBvnVP765zpnH4bQw4XLwfRwR1Y1XIw3BcLYgD 68pLJrswnHSEKjI4bcaz+HYGBZ1GS22jx4z1b2ROu5cghqUZ5bdWTKf7DE+Qedpb IsVoqrD7v6UC/W3gjgJSWTGM/BLzvgasvDDLy+OWu5VaGZekzO1RQeeHqxZGhnxI ZE/Z6oYchHkNKlB1ZBS/KBnytlBERmVMxoS03weIlc2240u2jp4anyuHPqX8/8N2 GgXRo8cf35YlQ0efsJXyNkbEs+YtinwksRO8WhFoOK8idoIsPw7deRoH9gR9uprI irfzzKZi5tLTEA6XJjpBf9gQ+WQBsw/W+9x5zCoCoeTvKNW9OKde+mbQWfw2m1sB 8DA+RAl+VuUzpun3RVJtq2xnRHoyFKPr2LOr7vJB0DYBTWOqo34= =+7pL -----END PGP SIGNATURE----- This won't work if your topic is compacted though. Also, if you are using transactions, it might not be...

Re: How to count number of available messages per topic?

You'll need to do this programmatically with some simple math. There's a binary included with kafka called kafka-run-class that you can use to expose earliest and latest offset information. This will return the earliest unexpired offsets for each partition in a topic: kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic TOPIC --time -2 This will return the latest offset: kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic TOPIC --time -1 With that info, subtract the latest from the earliest per partition, sum the results, and you'll have the number of messages available in your topic. -- Peter > On Apr 28, 2019, at 1:41 AM, jaaz jozz < jazzlofi2@gmail.com > wrote: > > Hello, > I want to count how many messages available in each topic in my kafka > cluster. > I understand that just looking at the latest offset available is not > correct, because older messages ...

How to count number of available messages per topic?

Hello, I want to count how many messages available in each topic in my kafka cluster. I understand that just looking at the latest offset available is not correct, because older messages may have been already purged due to retention policy. So what is the correct way of counting that? Thanks, Jazz.

Re: duplicate packets in kafka topic

Hi, What are duplicate messages in your use case? 1) different messages with the same content 2) the same message that is send multiple times to the broker due to retries in the producer 3) something else What do you mean with "identify those duplicates"? What do you want to do with them? For case 1), you could write all messages in a topic and then identify the duplicates with a Kafka Streams application, process them and write the results again to a topic. Be aware that identifying duplicate messages let grow the state in the Kafka Stream application to the sum of the sizes of all unique messages, because you have to store all messages in your state to able to find duplicate future messages. That is not feasible in most cases. To limit your state in the Streams application you can restrict the identification of duplicates to a time window. For example, identify all duplicate messages of the last hour. Within a window of one hour, you would only proces...

Segments Not Compacted Due to Empty Transactional Record Batches

We're seeing an issue with some __consumer_offsets partitions that host consumer groups being used in EOS-style read-process-write cycles. Although the brokers' log cleaners are running, we are finding more and more segment files for these partitions, for example: ``` $ ls -lh data/__consumer_offsets-40/ total 68G -rw-r--r-- 1 root root 2.9K Mar 20 14:49 00000000000000000000.index -rw-r--r-- 1 root root 90M Mar 20 14:49 00000000000000000000.log -rw-r--r-- 1 root root 4.3K Mar 20 14:49 00000000000000000000.timeindex -rw-r--r-- 1 root root 2.8K Mar 20 15:49 00000000001494007647.index -rw-r--r-- 1 root root 87M Mar 20 15:49 00000000001494007647.log -rw-r--r-- 1 root root 4.1K Mar 20 15:49 00000000001494007647.timeindex -rw-r--r-- 1 root root 2.7K Mar 20 17:06 00000000001495428980.index -rw-r--r-- 1 root root 86M Mar 20 17:06 00000000001495428980.log -rw-r--r-- 1 root root 4.0K Mar 20 17:06 00000000001495428980.timeindex -rw-r--r-- 1 root root 2.7K Mar 20 18:18...

Re: Required guidelines for kafka upgrade

Answers inline. first what we upgrade zookeeper or kafka?? - I can't really comment about ZK upgrade. Recently i upgraded only 10 node Kafka Cluster in production. Some of the Key Notes : 1) You need to set inter.broker.protocol.version = 1.1.0( Your Current Broker Version ) 2) Start upgrading your brokers one by one & Verify the overall functionality of your Cluster . Do not forget to verify the Clients too. 3) After Completing upgrade , set the version to 2.2.0 if you are upgrading latest version & restart brokers one by one. Pls be aware you cant downgrade after this step. 4) Pls keep in mind : The default value for ssl.endpoint.identification.algorithm was changed to https. So you need to set ssl.endpoint.identification.algorithm = empty to make it work as previous version of broker. Go through the documentation and you will more information (1.5 Upgrading From Previous Versions) https://kafka.apache.org/documentation/#upgrad...

Re: Required guidelines for kafka upgrade

Ashok, The guideline is well-explained, so please follow that. https://kafka.apache.org/documentation/#upgrade The process works, so try and follow what it recommends. Thanks, On Fri, 26 Apr 2019 at 12:36, ASHOK MACHERLA < iAshok7@outlook.com > wrote: > Dear senthil > > That I know, > Could you please explain in details, > any documents, step by steps required. > > first what we upgrade zookeeper or kafka?? > being upgrade can we pointing old logs directory?? > > Could please explain like that. > > Sent from Outlook< http://aka.ms/weboutlook > > ________________________________ > From: SenthilKumar K < senthilec566@gmail.com > > Sent: 26 April 2019 16:03 > To: users@kafka.apache.org > Subject: Re: Required guidelines for kafka upgrade > > Hi , You can refer official documentation to upgrade Kafka Cluster . > Section: > 1.5 Upgrading From Previous Versions > >...

Re: Required guidelines for kafka upgrade

Dear senthil That I know, Could you please explain in details, any documents, step by steps required. first what we upgrade zookeeper or kafka?? being upgrade can we pointing old logs directory?? Could please explain like that. Sent from Outlook< http://aka.ms/weboutlook > ________________________________ From: SenthilKumar K < senthilec566@gmail.com > Sent: 26 April 2019 16:03 To: users@kafka.apache.org Subject: Re: Required guidelines for kafka upgrade Hi , You can refer official documentation to upgrade Kafka Cluster . Section: 1.5 Upgrading From Previous Versions Last week we did broker upgrade from 1.1.0 to 2.2.0. I think the current stable version is 2.2.0. --Senthil On Fri, Apr 26, 2019, 3:54 PM ASHOK MACHERLA < iAshok7@outlook.com > wrote: > Dear Team > > Right now, we are using Kafka 0.10.1 version, zookeeper 3.4.6 version, we > are planning to upgrade to New Kafka & Zookeeper version. > > Plea...

Re: Required guidelines for kafka upgrade

Hi , You can refer official documentation to upgrade Kafka Cluster . Section: 1.5 Upgrading From Previous Versions Last week we did broker upgrade from 1.1.0 to 2.2.0. I think the current stable version is 2.2.0. --Senthil On Fri, Apr 26, 2019, 3:54 PM ASHOK MACHERLA < iAshok7@outlook.com > wrote: > Dear Team > > Right now, we are using Kafka 0.10.1 version, zookeeper 3.4.6 version, we > are planning to upgrade to New Kafka & Zookeeper version. > > Please suggest us, what is current stable version of Kafka, Zookeeper??? > > Could you please explain what is steps to upgrade to Kafka, Zookeeper?? > > Is there any documentation for that upgrade guide?? > > Thanks a lot!! > > > Sent from Outlook< http://aka.ms/weboutlook > >

Required guidelines for kafka upgrade

Dear Team Right now, we are using Kafka 0.10.1 version, zookeeper 3.4.6 version, we are planning to upgrade to New Kafka & Zookeeper version. Please suggest us, what is current stable version of Kafka, Zookeeper??? Could you please explain what is steps to upgrade to Kafka, Zookeeper?? Is there any documentation for that upgrade guide?? Thanks a lot!! Sent from Outlook< http://aka.ms/weboutlook >

duplicate packets in kafka topic

I have multiple clients who can send duplicate packets multiple time to same kafka topic.. Is there a way to identify those duplicate packets.

Re: Kafka Topics

Hi Ashock, Check this article of mine Real Time Processing of Trade Data with Kafka, Flume, Spark, Hbase and MongoDB https://www.linkedin.com/pulse/real-time-processing-trade-data-kafka-flume-spark-talebzadeh-ph-d-/ under *Kafka setup* HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw < https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 25 Apr 2019 at 16:18, ASHOK MACHERLA < iAshok7@outlook.com > wrote: > Dear Team > > Could you please ...

Re: [ANNOUNCE] New Kafka PMC member: Matthias J. Sax

Congrats Matthias! On Tue, Apr 23, 2019 at 4:24 AM Becket Qin < becket.qin@gmail.com > wrote: > Congrats, Matthias! > > On Sat, Apr 20, 2019 at 10:28 AM Matthias J. Sax < matthias@confluent.io > > wrote: > > > Thank you all! > > > > -Matthias > > > > > > On 4/19/19 3:58 PM, Lei Chen wrote: > > > Congratulations Matthias! Well deserved! > > > > > > -Lei > > > > > > On Fri, Apr 19, 2019 at 2:55 PM James Cheng < wushujames@gmail.com > > > <mailto: wushujames@gmail.com >> wrote: > > > > > > Congrats!! > > > > > > -James > > > > > > Sent from my iPhone > > > > > > > On Apr 18, 2019, at 2:35 PM, Guozhang Wang < wangguoz@gmail.com > > > <mailto: wangguoz@gmail.com >> wrote: > > > > > > > ...

Slow cluster recover after a restart

Hello, we are updating one of our clusters from version 2.0 to 2.2. The cluster has 4 brokers. After stopping the first broker the cluster was still operating normally as expected, receiving data from producers and sending data to consumers. When starting the first broker again with the new version 2.2, the brokers start showing lots of the following messages: INFO [Transaction Marker Request Completion Handler 4]: Sending client-x's transaction marker for partition topic-name-1 has failed with error org.apache.kafka.common.errors.NotLeaderForPartitionException, retrying with current coordinator epoch 0 (kafka.coordinator.transaction.TransactionMarkerRequestCompletionHandler) ...and INFO [ReplicaFetcher replicaId=1, leaderId=4, fetcherId=8] Retrying leaderEpoch request for partition another-topic-1 as the leader reported an error: NOT_LEADER_FOR_PARTITION (kafka.server.ReplicaFetcherThread) In the end, all Brokers start showing logs for the transaction ...

Re: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

Hii, what I asking I want to know about kafka partitions we have getting data about 200GB+ from sources to kafka for daily . I need to know how many partitions are required to pull data from source without pileup. please suggest us to fix this issue. is there any mathematical rules to create specific no.of partitions for Topic.??? please help me.... Sent from Outlook< http://aka.ms/weboutlook > ________________________________ From: Jose Lopez < joseariaslopez@gmail.com > Sent: 25 April 2019 16:34 To: users@kafka.apache.org Subject: [Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp) Hi all, Given that gradePeriodMs is "the time to admit late-arriving events after the end of the window", I'd expect it is taken into account in windowsFor(timestamp). E.g.: sizeMs = 5 gracePeriodMs = 2 advanceMs = 3 timestamp = 6 | window | windowStart | windowEnd | windowsEnd + gracePeriod | | 1 | 0 ...

[Streams] TimeWindows ignores gracePeriodMs in windowsFor(timestamp)

Hi all, Given that gradePeriodMs is "the time to admit late-arriving events after the end of the window", I'd expect it is taken into account in windowsFor(timestamp). E.g.: sizeMs = 5 gracePeriodMs = 2 advanceMs = 3 timestamp = 6 | window | windowStart | windowEnd | windowsEnd + gracePeriod | | 1 | 0 | 5 | 7 | | 2 | 5 | 10 | 12 | ... Current output: windowsFor(timestamp) returns window 2 only. Expected output: windowsFor(timestamp) returns both window 1 and window 2 Do you agree with the expected output? Am I missing something? Regards, Jose

Kafka Topics

Dear Team Could you please tell me about kafka topics, we have getting data from source to kafka about 200GB for daily basics how many partitions are required to pull data from source. tell me about how to set no.of partitions for a topic?? is there any mathematical rules for partitions?? Sent from Outlook< http://aka.ms/weboutlook >

Re: Too many commits

That's the problem I think. The gist of it is ( without going back on my part and looking at the details of the docs) Kafka isn't getting a poll from your consumer in time and thinks it's dead (crashed or hung)...then you try to commit something that is no longer valid. The solution is pausing the consumer. However pausing the consumer doesn't mean you can stop polling, it just means it's not going to give you anything back when you poll -- so you still need the loop to keep running. That means you need to: 1. get the message 2. pause the consumer (call pause on it) 3. run your message processing in another thread 4. allow your consumer to keep polling while its paused and your message is being processed 5. when processing thread is done call resume on the consumer. I remember seeing a nice example of this by someone online, but I'm sorry I don't have the link to it off hand. Hope this helps. On Thu, Ap...

Re: Too many commits

Yeah it's taking time , that's why I am doing manually commits to achieve at least once semantics . Thanks Yubraj Singh On Thu, Apr 25, 2019, 12:49 PM Dimitry Lvovsky < dlvovsky@gmail.com > wrote: > Are your processes taking a long time between commits — does consuming each > message take a long while? > > On Thu, Apr 25, 2019 at 08:50 yuvraj singh < 19yuvrajsingh90@gmail.com > > wrote: > > > Hi all , > > > > In my application i am committing every offset to kafka one by one and my > > max poll size is 30 . I am facing lot of commit failures so is it because > > of above reasons ? > > > > Thanks > > Yubraj Singh > > > > [image: Mailtrack] > > < > > > https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > > > > > Sender > > notified by > > Mailtrack > > ...

Re: Too many commits

Are your processes taking a long time between commits — does consuming each message take a long while? On Thu, Apr 25, 2019 at 08:50 yuvraj singh < 19yuvrajsingh90@gmail.com > wrote: > Hi all , > > In my application i am committing every offset to kafka one by one and my > max poll size is 30 . I am facing lot of commit failures so is it because > of above reasons ? > > Thanks > Yubraj Singh > > [image: Mailtrack] > < > https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > > > Sender > notified by > Mailtrack > < > https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > > > 04/25/19, > 12:16:48 PM >

Too many commits

Hi all , In my application i am committing every offset to kafka one by one and my max poll size is 30 . I am facing lot of commit failures so is it because of above reasons ? Thanks Yubraj Singh [image: Mailtrack] < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > Sender notified by Mailtrack < https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& > 04/25/19, 12:16:48 PM

RE: kafka consumer metadata expire

Hi, Have you tried setting the METADATA_MAX_AGE_CONFIG (default: 300,000ms) smaller? It seems the consumer won't actually update the metadata info until it's out of date. >-----Original Message----- >From: Shengnan YU [mailto: ysnakie@hotmail.com ] >Sent: Wednesday, April 24, 2019 1:43 PM >To: users@kafka.apache.org >Subject: kafka consumer metadata expire > >Hi everyone >How to update kafka consumer's metadata when some topics are deleted. >The metadata in consumer will not be expired or remove outdated topic, >which leads to UNKNOWN_TOPIC exception when fetching metadata from >the cluster. Thank you very mych. > >< https://maas.mail.163.com/dashi-web- >extend/html/proSignature.html?ftlId=1&name=ysnakie&uid=ysnakie%40hot > mail.com &iconUrl=http%3A%2F%2Fmail- > online.nosdn.127.net %2Fsmc8371a9788890d59e567ed336b96676b.jpg&items >=%5B%22ysnakie% 40hotmail.com %22%5D> >[ ht...

Re: Source Connector Task in a distributed env

Thank you Ryann & Hans. I will look into it. The spooldir, I explored it too and found that it too suits for standalone as you mentioned. 'Venkata On Wed 24 Apr, 2019, 22:34 Hans Jespersen, < hans@confluent.io > wrote: > Your connector sounds a lot like this one > https://github.com/jcustenborder/kafka-connect-spooldir > > I do not think you can run such a connector in distributed mode though. > Typically something like this runs in standalone mode to avoid conflicts. > > -hans > > > On Wed, Apr 24, 2019 at 1:08 AM Venkata S A < asaideep@gmail.com > wrote: > > > Hello Team, > > > > I am developing a custom Source Connector that watches a > > given directory for any new files. My question is in a Distributed > > environment, how will the tasks in different nodes handle the file Queue? > > > > Referring to this sample > > < ...

Re: Kafka question on Stream Processing

Hi Gagan, If you want to read a message, you need to poll the message from the broker. The brokers have only very limited notion of message content. They only know that a message has a key, a value, and some metadata, but they are not able to interpret the contents of those message components. The clients are responsible to read and process the messages. For reading and processing, the clients need to poll the messages from the brokers. For the processing you want to do, you could use Kafka Streams. See https://kafka.apache.org/documentation/streams/ for more information. Have a look at the branch DSL operation there. Best regards, Bruno On Wed, Apr 24, 2019 at 1:54 AM Gagan Sabharwal < gagansab@gmail.com > wrote: > Hi team, > > Say we have a client which has pushed a message to a topic. The message has > a a simple structure > > Task - Time of task > Send an email - 1530 > > Now say that this message is consumed by a con...

Re: Source Connector Task in a distributed env

Your connector sounds a lot like this one https://github.com/jcustenborder/kafka-connect-spooldir I do not think you can run such a connector in distributed mode though. Typically something like this runs in standalone mode to avoid conflicts. -hans On Wed, Apr 24, 2019 at 1:08 AM Venkata S A < asaideep@gmail.com > wrote: > Hello Team, > > I am developing a custom Source Connector that watches a > given directory for any new files. My question is in a Distributed > environment, how will the tasks in different nodes handle the file Queue? > > Referring to this sample > < > https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory > > > , > poll() in SourceTask is polling the directory at specified interval for a > new files and fetching the files in a Queue as below: > > Queue<File> queue = ((DirWatcher)...

Re: Source Connector Task in a distributed env

Venkata, the example you have linked creates a single task config s.t. there is no parallelism -- a single task runs on the cluster, regardless of the number of nodes. In order to introduce parallelism, your SourceConnector needs to group all known files among N partitions and return N task configs for N tasks. You can use ConnectorUtils.groupPartitions() for this. In each task config, specify the specific group of files for that task, as grouped by groupPartitions(). Then your SourceConnector can watch for new files. Anytime a new file is detected, call context.requestTaskReconfiguration(), which will restart this process. Ryanne On Wed, Apr 24, 2019 at 3:08 AM Venkata S A < asaideep@gmail.com > wrote: > Hello Team, > > I am developing a custom Source Connector that watches a > given directory for any new files. My question is in a Distributed > environment, how will the tasks in different nodes handle the file Queue? > ...

Kafka consumer downgrade issue

Hi all, Recently we upgraded our application from the more primitive Java client APIs (kafka.javaapi.consumer.SimpleConsumer, kafka.api.FetchRequest and friends) to the more friendly poll-based org.apache.kafka.clients.consumer.KafkaConsumer using Kafka Java client libraries version 1.1.0. The upgrade went fine and meant we could remove a LOT of custom code we had previously needed to use. This was also released into a version of the application that went into QA / staging environments of a client of ours. The upgrade (environment was not wiped) in the staging environment went fine as well without any errors. Due to an unrelated serious issue that we discovered a couple of days later, we had to advice the client a roll-back from the new application version (still had not gone into production). We didn't think this would lead to any issues. What DID occur when the application was downgraded in the staging client environment, however, was Kafka clients starting to get a lot ...

Kafka question on Stream Processing

Hi team, Say we have a client which has pushed a message to a topic. The message has a a simple structure Task - Time of task Send an email - 1530 Now say that this message is consumed by a consumer subscribed to this topic. Since Topic already has a storage, what I intend to do is just read the message (not poll it) and see if it is before 1530 then send it to the tail of the partition of that topic. Does Kafka provide such an Api? Next time when the consumer reads the message and see if the current time is after 1530, it will poll the message and execute the task. Regards Gagan

Source Connector Task in a distributed env

Hello Team, I am developing a custom Source Connector that watches a given directory for any new files. My question is in a Distributed environment, how will the tasks in different nodes handle the file Queue? Referring to this sample < https://github.com/DataReply/kafka-connect-directory-source/tree/master/src/main/java/org/apache/kafka/connect/directory > , poll() in SourceTask is polling the directory at specified interval for a new files and fetching the files in a Queue as below: Queue<File> queue = ((DirWatcher) task).getFilesQueue(); > So, When in a 3 node cluster, this is run individually by each task. But then, How is the synchronization happening between all the tasks in different nodes to avoid duplication of file reading to kafka ? Thank you, Venkata S

Re: Kafka work stealing question

Thanks Alan. Regards Gagan On Wed 24 Apr, 2019, 2:41 AM Alan Salewski, < asalewski@rdc.com > wrote: > Hi Gagan, > > It is true that a consumer will only poll for messages from the partitions > to > which it is assigned, but any single consumer can be assigned to poll from > multiple partitions. That will happen whenever the number of partitions is > greater than the number of consumers. > > In your example of 5 partitions, if you only had two consumers, then one > consumer would poll from 3 partitions and the other would poll from 2. > > Similarly, if you have 5 consumers and 20 partitions, then each of the > consumers will poll from 4 different partitions. > > It is worth pointing out, too, that if you have more consumers than you > have > partitions, some of your consumers will not receive any work at all because > there are not enough partitions for each consumer to be assigned at least > one....