Skip to main content

Posts

Showing posts from May, 2024

Re: Regarding Kafka connect task to partition relationship for both source and sink connectors

For sink connectors, I believe you can scale up the tasks to match the partitions on the topic. But I don't believe this is the case for source connectors; the number of partitions on the topic you're producing to has nothing to do with the number of connector tasks. It really depends on the individual source connector and if the source data-type could benefit from multiple tasks. For example, the JDBC source connector (a very popular connector) only supports 1 task - even if you're querying multiple tables. Bottom line: you'll need to check the documentation for the connector in question to see if it supports multiple tasks. Alex On Thu, May 30, 2024 at 7:51 AM Sébastien Rebecchi < srebecchi@kameleoon.com > wrote: > Hello > > Confirmed. Partition is the minimal granularity level, so having more > consumers than the number of partitions of a topic for a same consumer > group is useless, having P partitions means maximu...

Re: Regarding Kafka connect task to partition relationship for both source and sink connectors

Hello Confirmed. Partition is the minimal granularity level, so having more consumers than the number of partitions of a topic for a same consumer group is useless, having P partitions means maximum parallelism is reached using P consumers. Regards, Sébastien. Le jeu. 30 mai 2024 à 14:43, Yeikel Santana < email@yeikel.com > a écrit : > Hi everyone, > > > From my understanding, if a topic has n partitions, we can create up to n > tasks for both the source and sink connectors to achieve the maximum > parallelism. Adding more tasks would not be beneficial, as they would > remain idle and be limited to the number of partitions of the topic > > > Could you please confirm if this understanding is correct? > > > If this understanding is incorrect could you please explain the > relationship if any? > > > Thank you! > > >

Re: [EXTERNAL] Regarding Kafka connect task to partition relationship for both source and sink connectors

The docs say: "Each task is assigned to a thread. Each task is capable of handling multiple Kafka partitions, but a single partition must be handled by only one task." From what I understand additional tasks would sit idle. From: Yeikel Santana < email@yeikel.com > Date: Thursday, May 30, 2024 at 7:43 AM To: users@kafka.apache.org < users@kafka.apache.org > Subject: [EXTERNAL] Regarding Kafka connect task to partition relationship for both source and sink connectors Hi everyone, From my understanding, if a topic has n partitions, we can create up to n tasks for both the source and sink connectors to achieve the maximum parallelism. Adding more tasks would not be beneficial, as they would remain idle and be limited to the number of partitions of the topic Could you please confirm if this understanding is correct? If this understanding is incorrect could you please explain the relationship if any? Thank you! This e-mail and an...

Regarding Kafka connect task to partition relationship for both source and sink connectors

Hi everyone, From my understanding, if a topic has  n partitions, we can create up to n tasks for both the source and sink connectors to achieve the maximum parallelism. Adding more tasks would not be beneficial, as they would remain idle and be limited to the number of partitions of the topic Could you please confirm if this understanding is correct? If this understanding is incorrect could you please explain the relationship if any? Thank you!

Console consumer crashing

Hi folks, I was troubleshooting a program trying to receive a kafka message (local kafka, for development), and decided I needed to verify kafka itself was happy. So I created a topic named 'fubar' and started the consumer listening to it. Then I started the console producer in another terminal window and typed in foo<return> and it appeared nicely on the consumer terminal. Then I typed bar<return> and got this: NS2-MacBook-Pro:kafka_2.13-3.7.0 gus$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fubar --from-beginning foo [2024-05-29 11:56:07,183] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.protocol.types.SchemaException: Buffer underflow while parsing response for request with header RequestHeader(apiKey=FETCH, apiVersion=16, clientId=console-consumer, correlationId=608, headerVersion=2) at org.apache.kafka.clients.NetworkClient.parseRespo...

Re: broker crashing when running in raft mode

One more reply to myself. For reasons that are not quite clear we found that there were log directories left on brokers that corresponded to topics or partitions that no longer exist or that no longer reside on a given broker. Those directories don't and won't have a topicID which is causing kafka to fail when running as a broker in k-raft mode. I created this script: > #!/bin/bashset -e -u -o pipefail > unset KAFKA_OPTS; unset JMX_PORT > > cd /opt/kafka > > rm -f /tmp/topics_and_parts > > ./bin/kafka-topics.sh --describe --bootstrap-server broker:9092 | grep > 'Topic:' |grep 'Partition:' >/tmp/topics_and_parts > > id=${HOSTNAME##kafka-} > pushd data/topics &>/dev/null > for logDir in *; do > topic=${logDir%-*} > partition=${logDir##*-} > if [ ! -d "$logDir" ]; then > continue > fi > if [ "$topic" == "__cluster_metadata"...

Re: outerjoin not joining after window

> Can someone confirm that each >>>> partition has its own stream time and that the stream time for a partition >>>> only advances when a record is written to the partition after the window >>>> closes? That's correct. On 5/21/24 10:11 AM, Chad Preisler wrote: > After reviewing the logs, I think I understand what happens with the > repartition topics. Looks like they will be assigned to one or more > instances. In my example I ran three instances of the application (A, B, > C). Looks like the two repartition topics got assigned to A and B. The six > partitions from the input topics got split evenly across all three running > instances A, B, and C. Since the repartitioned streams are what I'm joining > on, I guess the join will run on two instances, and any input topic > processing will run across all three. Is that correct? > > Still would like clarification regarding some records appea...

Re: Request to be added to kafka contributors list

It's working now. Thank you Matthias! ________________________________ From: Matthias J. Sax < mjsax@apache.org > Sent: Wednesday, May 22, 2024 2:58 To: users@kafka.apache.org < users@kafka.apache.org > Subject: Re: Request to be added to kafka contributors list Ok. Hopefully it's working now. Sorry for the hiccup. -Matthias On 5/21/24 1:14 AM, Fan Yang wrote: > Hi Matthia, > > I tried sign out and sign in, still can't find the "Assign" button, my JIRA ID is fanyan, could you help me set it again? > > Best, > Fan > > ________________________________ > From: Matthias J. Sax < mjsax@apache.org > > Sent: Saturday, May 18, 2024 4:06 > To: users@kafka.apache.org < users@kafka.apache.org > > Subject: Re: 回复: Request to be added to kafka contributors list > > Did you sign out and sign in again? > > On 5/17/24 9:49 AM, Yang Fan wrote: >> Thanks Matthias, >...

Re: Fwd: Request to be added to kafka contributors list

Hello, 1/ for sure 2/ after rebasing my code change, I'll do the full test. To answer to the question I'm under Debian 12, OpenJDK 17.0.10 on my dev machine. Best regards Le 2024-05-21 16:46, Greg Harris a écrit : > Hi Franck, > > Thank you for contributing to Apache Kafka! > > 1. Git is generally permissive of this, as long as there are no merge > conflicts. If you have merge conflicts with `trunk`, you will need to > resolve them before a committer can merge your changes, so rebasing on > trunk before opening the PR is a good idea :) > > 2. > Are you on an M1 mac, with a recent (>11) JDK? I've been experiencing > some consistent failures recently [1] and haven't figured it out yet. > You may also be getting a flaky failure: a test which is > nondeterministic and sometimes fails. We are constantly trying to burn > down the list of flaky tests [2], but there are still some around. > A...

Re: Request to be added to kafka contributors list

Ok. Hopefully it's working now. Sorry for the hiccup. -Matthias On 5/21/24 1:14 AM, Fan Yang wrote: > Hi Matthia, > > I tried sign out and sign in, still can't find the "Assign" button, my JIRA ID is fanyan, could you help me set it again? > > Best, > Fan > > ________________________________ > From: Matthias J. Sax < mjsax@apache.org > > Sent: Saturday, May 18, 2024 4:06 > To: users@kafka.apache.org < users@kafka.apache.org > > Subject: Re: 回复: Request to be added to kafka contributors list > > Did you sign out and sign in again? > > On 5/17/24 9:49 AM, Yang Fan wrote: >> Thanks Matthias, >> >> I still can't find "Assign to me" button beside Assignee and Reporter. Could you help me set it again? >> >> Best regards, >> Fan >> ________________________________ >> 发件人: Matthias J. Sax < mjsax@apache.org > >...

Re: outerjoin not joining after window

After reviewing the logs, I think I understand what happens with the repartition topics. Looks like they will be assigned to one or more instances. In my example I ran three instances of the application (A, B, C). Looks like the two repartition topics got assigned to A and B. The six partitions from the input topics got split evenly across all three running instances A, B, and C. Since the repartitioned streams are what I'm joining on, I guess the join will run on two instances, and any input topic processing will run across all three. Is that correct? Still would like clarification regarding some records appearing to not get processed: I think the issue is related to certain partitions not getting records to advance stream time (because of low volume). Can someone confirm that each partition has its own stream time and that the stream time for a partition only advances when a record is written to the partition after the window closes? On Tue, May 21, 2024 at 10:2...

Re: Fwd: Request to be added to kafka contributors list

Hi Franck, Thank you for contributing to Apache Kafka! 1. Git is generally permissive of this, as long as there are no merge conflicts. If you have merge conflicts with `trunk`, you will need to resolve them before a committer can merge your changes, so rebasing on trunk before opening the PR is a good idea :) 2. Are you on an M1 mac, with a recent (>11) JDK? I've been experiencing some consistent failures recently [1] and haven't figured it out yet. You may also be getting a flaky failure: a test which is nondeterministic and sometimes fails. We are constantly trying to burn down the list of flaky tests [2], but there are still some around. As far as how this impacts the PR: You should find and resolve all of the deterministic failures that you introduce in the PR, and do your best to check whether you introduced any flakiness. You can look for tickets mentioning those failures, or ask a committer for more information. Hope this helps, Greg Harris ...

Re: outerjoin not joining after window

See one small edit below... On Tue, May 21, 2024 at 10:25 AM Chad Preisler < chad.preisler@gmail.com > wrote: > Hello, > > I think the issue is related to certain partitions not getting records to > advance stream time (because of low volume). Can someone confirm that each > partition has its own stream time and that the stream time for a partition > only advances when a record is written to the partition after the window > closes? > > If I use the repartition method on each input topic to reduce the number > of partitions for those streams, how many instances of the application will > process records? For example, if the input topics each have 6 partitions, > and I use the repartition method to set the number of partitions for the > streams to 2, how many instances of the application will process records? > > Thanks, > Chad > > > On Wed, May 1, 2024 at 6:47 PM Matthias J. Sax < mjsax@apache.org ...

Re: outerjoin not joining after window

Hello, I think the issue is related to certain partitions not getting records to advance stream time (because of low volume). Can someone confirm that each partition has its own stream time and that the stream time for a partition only advances when a record is written to the topic after the window closes? If I use the repartition method on each input topic to reduce the number of partitions for those streams, how many instances of the application will process records? For example, if the input topics each have 6 partitions, and I use the repartition method to set the number of partitions for the streams to 2, how many instances of the application will process records? Thanks, Chad On Wed, May 1, 2024 at 6:47 PM Matthias J. Sax < mjsax@apache.org > wrote: > >>> How do you know this? > >> First thing we do is write a log message in the value joiner. We don't > see > >> the log message for the missed records. > ...

Re: Request for contributor list

I believe it was brendendeluna, that is what I entered for username on the self serve page. I am unable to find a Jira ID though. Please let me know if brendendeluna works or if I need something else. On Mon, May 20, 2024 at 8:35 PM Matthias J. Sax < mjsax@apache.org > wrote: > What is your Jira ID? > > -Matthias > > > On 5/20/24 9:55 AM, Brenden Deluna wrote: > > Hello, I am requesting to be added to the contributor list to take care > of > > some tickets. Thank you. > > >

Re: Fwd: Request to be added to kafka contributors list

Hello, It works like a charm. Few questions: 1. Now, I'm asking my self, I did the job describe in JIRA 16707 in a fork/branch of the 3.7.0 of kafka, but reading the "Contributing Code Change", I feeI should have done it on a branch from trunk of my fork? (if so, I'll just do on my fork a new branch, rebase, and re-run test for sure, I just want to from which point I should start to PR correctly) 2. when doing a "gradelew clean test" from a clean fork of the 3.7.0 branch, I have a failure, so I'm asking my self how it will be managed when I'll do the PR, do you know? Best regards On 21/05/2024 03:35, Matthias J. Sax wrote: > Done. You should be all set :) > > > -Matthias > > On 5/20/24 10:10 AM, boulot@ulukai.net wrote: >> >> Dear Apache Kafka Team, >> >>      I hope to post in the right place: my name is Franck LEDAY, >...

Re: Fwd: Request to be added to kafka contributors list

Done. You should be all set :) -Matthias On 5/20/24 10:10 AM, boulot@ulukai.net wrote: > > Dear Apache Kafka Team, > >     I hope to post in the right place: my name is Franck LEDAY, under > Apache-Jira ID "handfreezer". > >     I opened an issue as Improvement KAFKA-16707 but I failed to > assigned it to me. > >     May I ask to be added to the contributors list for Apache Kafka? As > I already did the job of improvement, and would like to be assigned on > to end my contribution. > > Thank you for considering my request. > Best regards, Franck.

Re: Release plan required

Zookeeper is already deprecated (since 3.5): https://kafka.apache.org/documentation/#zk_depr It's planned to be fully removed in 4.0 release. It's not confirmed yet, but there is a high probability that there won't be a 3.9 release, and that 4.0 will follow 3.8. -Matthias On 5/20/24 2:11 AM, Sahil Sharma D wrote: > Hello, > > When Zookeeper is planned to depreciated from kafka, in which release this depreciation is planned? > > Regards, > Sahil > > -----Original Message----- > From: Sanskar Jhajharia <sjhajharia@confluent.io.INVALID> > Sent: Monday, May 20, 2024 1:38 PM > To: users@kafka.apache.org > Subject: Re: Release plan required > > [You don't often get email from sjhajharia@confluent.io.invalid. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > Hey Sahil, > > You can find the complete details of the releases and bug fix relea...

Fwd: Request to be added to kafka contributors list

Dear Apache Kafka Team,     I hope to post in the right place: my name is Franck LEDAY, under Apache-Jira ID "handfreezer".     I opened an issue as Improvement KAFKA-16707 but I failed to assigned it to me.     May I ask to be added to the contributors list for Apache Kafka? As I already did the job of improvement, and would like to be assigned on to end my contribution. Thank you for considering my request. Best regards, Franck.

Kafka strange behaviors seen in logs

Hello! Currently I am running cluster of 3 kafka machines. Two of those are hosted in same data center and last one is in different. My kafka heap options are following: KAFKA_HEAP_OPTS=-Xmx6g -Xms6g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+ExplicitGCInvokesConcurrent Recently I moved my cluster from zookeeper to Kraft. Cluster is working properly and kafka is accessible 100% of time but I am worried about things that can be seen in logs. It is hard to find any information if those are not harmful or are affecting cluster performance in any significant way. I am assuming it is related with some internet connection hiccups between nodes but I would like to know if it is normal or I can strive to minimalize or even remove those issues. So first thing is setting Quorum leader to none. [2024-05-20 09:06:10,507] ...