Skip to main content

Posts

Showing posts from July, 2020

Re: [VOTE] 2.6.0 RC2

Thank you, Randall for driving this release. +1 (binding) after verifying signatures and hashes, building from sources, running unit/integration tests and some manual tests with the 2.13 build. Two minor things: 1. There were two sitedoc files - 2.12 and 2.13, we don't really need two sitedocs generated. Not a big deal, but maybe worth tracking and fixing. 2. I got one test failure locally: org.apache.kafka.trogdor.agent.AgentTest.testAgentGetStatus failed, log available in /Users/gwenshap/releases/2.6.0-rc2/kafka-2.6.0-src/tools/build/reports/testOutput/org.apache.kafka.trogdor.agent.AgentTest.testAgentGetStatus.test.stdout org.apache.kafka.trogdor.agent.AgentTest > testAgentGetStatus FAILED java.lang.RuntimeException: at org.apache.kafka.trogdor.rest.RestExceptionMapper.toException(RestExceptionMapper.java:69) at org.apache.kafka.trogdor.rest.JsonRestServer$HttpResponse.body(JsonRestServer.java:285) at org.apache.kafka.trogd...

Re: [VOTE] 2.5.1 RC0

Hi John, +1 (binding) - Verified signatures, artifacts, Release notes - Built from sources, ran tests - Ran core/connect/streams quick start for Scala 2.13 release, ran few manual tests Thanks for driving the release. Thanks, Manikumar On Thu, Jul 30, 2020 at 10:32 PM Ismael Juma < ismael@juma.me.uk > wrote: > +1 (binding), verified signatures, ran the tests on the source archive with > Scala 2.13 and Java 11 and verified the quickstart with the source archive > and Scala 2.13 binary archive. > > Ismael > > On Thu, Jul 23, 2020 at 7:39 PM John Roesler < vvcephei@apache.org > wrote: > > > Hello Kafka users, developers and client-developers, > > > > This is the first candidate for release of Apache Kafka 2.5.1. > > > > Apache Kafka 2.5.1 is a bugfix release and fixes 72 issues since the > 2.5.0 > > release. Please see the release notes for more information. > > > ...

Re: [kafka-clients] Re: [VOTE] 2.6.0 RC2

Hi Randall, +1 (non-binding) built from source, ran all unit and integration tests (Scala 2.12 and 2.13), and verified all the signatures. Thanks, Bill On Fri, Jul 31, 2020 at 12:42 PM Randall Hauch < rhauch@apache.org > wrote: > Thanks, Rajini. > > Here's an update on the system tests. Unfortunately we've not yet had a > fully-green system test run, but each of the system test runs since > https://jenkins.confluent.io/job/system-test-kafka/job/2.6/49/ has had > just one or two failures -- and no failure has been repeated. This suggests > the failing tests appear to be somewhat flaky. I'll keep running more > system tests and will reply here if something appears suspicious, but > please holler if you think my analysis is incorrect. > > Best regards, > > Randall > > On Fri, Jul 31, 2020 at 11:00 AM Rajini Sivaram < rajinisivaram@gmail.com > > wrote: > >> Thanks Randall, +1 (b...

Re: [VOTE] 2.6.0 RC2

Thanks, Rajini. Here's an update on the system tests. Unfortunately we've not yet had a fully-green system test run, but each of the system test runs since https://jenkins.confluent.io/job/system-test-kafka/job/2.6/49/ has had just one or two failures -- and no failure has been repeated. This suggests the failing tests appear to be somewhat flaky. I'll keep running more system tests and will reply here if something appears suspicious, but please holler if you think my analysis is incorrect. Best regards, Randall On Fri, Jul 31, 2020 at 11:00 AM Rajini Sivaram < rajinisivaram@gmail.com > wrote: > Thanks Randall, +1 (binding) > > Built from source and ran tests, had a quick look through some Javadoc > changes, ran quickstart and some tests with Java 11 TLSv1.3 on the binary. > > Regards, > > Rajini > > > On Tue, Jul 28, 2020 at 10:50 PM Randall Hauch < rhauch@apache.org > wrote: > > > Hel...

Re: [VOTE] 2.6.0 RC2

Thanks Randall, +1 (binding) Built from source and ran tests, had a quick look through some Javadoc changes, ran quickstart and some tests with Java 11 TLSv1.3 on the binary. Regards, Rajini On Tue, Jul 28, 2020 at 10:50 PM Randall Hauch < rhauch@apache.org > wrote: > Hello Kafka users, developers and client-developers, > > This is the third candidate for release of Apache Kafka 2.6.0. This is a > major release that includes many new features, including: > > * TLSv1.3 has been enabled by default for Java 11 or newer. > * Smooth scaling out of Kafka Streams applications > * Kafka Streams support for emit on change > * New metrics for better operational insight > * Kafka Connect can automatically create topics for source connectors > * Improved error reporting options for sink connectors in Kafka Connect > * New Filter and conditional SMTs in Kafka Connect > * The default value for the `client.dns.lookup` configur...

Re: 409 incompatible schema (after delete subject with permanent=true)

Hello kafka community, I fixed this by deleting (permanent=true) a lot of subjects, Thanks On Thu, 30 Jul 2020 at 21:31, Dumitru-Nicolae Marasoui < Nicolae.Marasoiu@kaluza.com > wrote: > Hello kafka community, > So the schema that the kafka streams is trying to push into the registry > is this: > { > "type" : "record", > "name" : "AccountKey", > "namespace" : "...", > "fields" : [ { > "name" : "accountId", > "type" : "string", > } ] > } > I removed it: > curl -X DELETE ' http://localhost:8081/subjects/global_account_v1-key ' > curl -X DELETE ' > http://localhost:8081/subjects/global_account_v1-key?permanent=true ' > Then I started the pipeline: > Exception received: > org.apache.kafka.common.errors.SerializationException: Error registering > Avro schem...

Cached zkVersion [237] not equal to that in zookeeper, skip updating ISR

Hello community, We're facing troubles on our cluster since yesterday. We suddenly have massive ISR lost on 2 of our 3 nodes. Most partitions leaders were moved to node 1, which has ISR for all partitions. On Node 2 & 3 we have many logs stating: Cached zkVersion [xxx] not equal to that in zookeeper, skip updating ISR I believe restarting those nodes should fix the issue. But we're very concerned about why this happened. When the issue arose, we got the following errors on kafka node 2/3 controller.log: [2020-07-30 21:21:07,808] ERROR [ControllerEventThread controllerId=2] Error processing event RegisterBrokerAndReelect (kafka.controller.ControllerEventManager$ControllerEventThread) org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.KeeperException.create(KeeperException.java:122) at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1518) at kafka....

Re: mirror whitelist

Hi, I think the "" where the culprit, when removing "" it works On Sat, 25 Jul 2020 at 10:22, Dumitru-Nicolae Marasoui < Nicolae.Marasoiu@kaluza.com > wrote: > Hello kafka community, > > Doing the following cli command to copy messages from one cluster to > another, without any transformation on the binary keys/values of the > messages: > > kafka-mirror-maker.sh --consumer.config=config.properties > --producer.config=producer.properties --whitelist="id_u_v1" > > I am getting: > > ERROR Invalid expression syntax: identity_users_v1 > (kafka.tools.MirrorMaker$) > > Do you have any suggestions on how to try for that particular topic? > Thank you > Nicolae > > > -- > > Dumitru-Nicolae Marasoui > > Software Engineer > > > > w kaluza.com < https://www.kaluza.com/ > > > LinkedIn < https://www.linkedin.com/company/kaluza ...

Re: 409 incompatible schema (after delete subject with permanent=true)

Hello kafka community, So the schema that the kafka streams is trying to push into the registry is this: { "type" : "record", "name" : "AccountKey", "namespace" : "...", "fields" : [ { "name" : "accountId", "type" : "string", } ] } I removed it: curl -X DELETE ' http://localhost:8081/subjects/global_account_v1-key ' curl -X DELETE ' http://localhost:8081/subjects/global_account_v1-key?permanent=true ' Then I started the pipeline: Exception received: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"AccountKey","namespace":"com...","doc":"A key for an Account.","fields":[{"name":"accountId","type":"string"}]} Caused by: io.confluent.ka...

409 incompatible schema (after delete subject with permanent=true)

Hello kafka community, I am getting exception [1] even if i previously remove the culprit subject: curl -X DELETE ' http://localhost:8081/subjects/global_account_v1-key ' [1] curl -X DELETE ' http://localhost:8081/subjects/global_account_v1-key?permanent=true ' [1] Exception received: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"AccountKey","namespace":"com.ovoenergy.global","doc":"A key for an Account.","fields":[{"name":"accountId","type":"string","doc":"A globally unique ID for the account, distinct per source system and source record identifier. This should be deterministically generated from the source id and source system using a UUIDv5.","classification":"OVO Internal"}]} Caused by: io.confluent.kafka.schema...

Re: [VOTE] 2.5.1 RC0

+1 (binding), verified signatures, ran the tests on the source archive with Scala 2.13 and Java 11 and verified the quickstart with the source archive and Scala 2.13 binary archive. Ismael On Thu, Jul 23, 2020 at 7:39 PM John Roesler < vvcephei@apache.org > wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for release of Apache Kafka 2.5.1. > > Apache Kafka 2.5.1 is a bugfix release and fixes 72 issues since the 2.5.0 > release. Please see the release notes for more information. > > Release notes for the 2.5.1 release: > https://home.apache.org/~vvcephei/kafka-2.5.1-rc0/RELEASE_NOTES.html > > *** Please download, test and vote by Tuesday, 28 July 2020, 5pm Pacific > > Kafka's KEYS file containing PGP keys we use to sign the release: > https://kafka.apache.org/KEYS > > * Release artifacts to be voted upon (source and binary): > https://home.apache.org/~vv...

Re: Sharing of State Stores

Hi Charles, Two transformers that share the same state store should end up into the same sub-topology. A sub-topology is executed by as many task as the number of partitions of the input topics. Each task processes the records from one input partition group (i.e. the same partition from both input topics in your case). A task is assigned to one single stream thread on a Kafka Streams client. Each stream thread is a member of the consumer group. 1) As far as I understand your setup, same keys are produced to one partition. A given key will end up in the same partition in both of your input topics. Hence, the key will be processed by the same task that executes the sub-topology that contains both transformers. 2) Since the execution of a task is single threaded, the transformers will access the state consecutively and see the updated state store. Streams tries to process records from both partitions in time order, but this is best-effort and not guarantee...

Re: [VOTE] 2.5.1 RC0

Hello again all, Just a reminder that the 2.5.1 RC0 is available for verification. Thanks, John On Thu, Jul 23, 2020, at 21:39, John Roesler wrote: > Hello Kafka users, developers and client-developers, > > This is the first candidate for release of Apache Kafka 2.5.1. > > Apache Kafka 2.5.1 is a bugfix release and fixes 72 issues since the > 2.5.0 release. Please see the release notes for more information. > > Release notes for the 2.5.1 release: > https://home.apache.org/~vvcephei/kafka-2.5.1-rc0/RELEASE_NOTES.html > > *** Please download, test and vote by Tuesday, 28 July 2020, 5pm Pacific > > Kafka's KEYS file containing PGP keys we use to sign the release: > https://kafka.apache.org/KEYS > > * Release artifacts to be voted upon (source and binary): > https://home.apache.org/~vvcephei/kafka-2.5.1-rc0/ > > * Maven artifacts to be voted upon: > https://repository.apache.org/conten...

Follower node receiving records out of order

Hi, We are expanding a 3-node cluster to a 5-node cluster, and have encountered an issue where a follower node is fetching offsets out of order. We are on 2.4.0. We've used the kafka-reassign-partitions tool. Several partitions are affected. Picking an example partition (11), it was configured to go from replicas [1,2,3] to [3,4,5], without enabling throttling. Below is the current state of that partition: Topic: some-topic Partition: 11 Leader: 3 Replicas: 3,4,5,1,2 Isr: 2,3,1 What we are seeing is that follower 4 is getting an exception when fetching offsets from 3. kafka.common.OffsetsOutOfOrderException: Out of order offsets found in append to some-topic: ArrayBuffer(<snip>, 1091513, 745397, 1110822, 1127988, <snip>) at kafka.log.Log.$anonfun$append$2(Log.scala:1096) at kafka.log.Log.maybeHandleIOException(Log.scala:2316) at kafka.log.Log.append(Log.scala:1032) at kafka.log.Log.appendAsFollower(Log.scala:1012) at kafka.cluster.Partition....

Kafka Active Active with MMK2

Hi All, Assume I am using the latest version + mmk2. Let's say I have 3 data centers with high latency between them >~10ms Can I use the mmk2 between them to maintain active active or, mmk2 can only be used between 2 data centers ? Thanks in advanced, Dor This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service < https://www.amdocs.com/about/email-terms-of-service >

Re: Partition assignment not well distributed over threads

Hey Sophie, This was indeed the issue. An environment variable got passed through wrong. Thank you for your tip that made me check this. Giselle On 2020/07/29 17:41:43, Sophie Blee-Goldman < sophie@confluent.io > wrote: > Hey Giselle, > > How many stream threads is each instance configured with? If the total > number of threads > across all instances exceeds the total number of tasks, then some threads > won't get any > assigned tasks. There's a known bug where tasks might not get evenly > distributed over all > instances in this scenario, as Streams would only attempt to balance the > tasks over the > threads. See KAFKA-9173 < https://issues.apache.org/jira/browse/KAFKA-9173 >. > Luckily, this should be fixed in 2.6 which is just about to be > released. > > Instances that joined later, or restarted, would be more likely to have > these threads with no > assigned tasks due to the sti...

Sharing of State Stores

Hello, I have some rudimentary questions on state stores. My service is planned to have two transformers, each listening to a different topic. Both topics have the same number of partitions and the upstream producers to those topics are consistent with respect to key schema. My question centers around the fact that both transformers need to consult and update the same persistent state store in order to make decisions with respect to record processing. I am not implementing a custom key partitioner, I'm using the default. Also, there is no re-keying done by either transformer. Given the above scenario, I have the following questions: 1) Will a given key always hash to the same kstream consumer group member for both transformers? You can imagine why this is important given that they share a state store. My concern is that rebalancing may occur, and somehow the key space for one of the transformers is moved to another pod, but not both. 2) If transformer A proce...

Re: Mirrormaker 2 logs - WARN Catching up to assignment's config offset

Hey Ryanne, Interesting points. I wasn't aware of the number of partitions in relation to the number of connect workers. I will test this out and update. Thanks! On Wed, Jul 29, 2020, 21:01 Ryanne Dolan < ryannedolan@gmail.com > wrote: > Iftach, you can try deleting Connect's internal config and status topics. > The status topic records, among other things, the offsets within the config > topics iirc, so if you delete the configs without deleting the status, > you'll get messages such as those. Just don't delete the mm2-offsets > topics, as doing so would result in MM2 starting from the beginning of all > source partitions and re-replicating everything. > > You can also check that there are enough partitions in the config and > status topics to account for all the Connect workers. It's possible that > you're in a rebalance loop from too many consumers to those internal > topics. > > Ryanne ...

Re: Partition assignment not well distributed over threads

Hey Giselle, How many stream threads is each instance configured with? If the total number of threads across all instances exceeds the total number of tasks, then some threads won't get any assigned tasks. There's a known bug where tasks might not get evenly distributed over all instances in this scenario, as Streams would only attempt to balance the tasks over the threads. See KAFKA-9173 < https://issues.apache.org/jira/browse/KAFKA-9173 >. Luckily, this should be fixed in 2.6 which is just about to be released. Instances that joined later, or restarted, would be more likely to have these threads with no assigned tasks due to the stickiness optimization, as you guessed. If the problem you've run into is due to running more stream threads than tasks, I would recommend just decreasing the number of threads per instance to get a balanced assignment. This won't hurt performance in any way since those extra threads would have just been sittin...

Re: Mirrormaker 2 logs - WARN Catching up to assignment's config offset

Iftach, you can try deleting Connect's internal config and status topics. The status topic records, among other things, the offsets within the config topics iirc, so if you delete the configs without deleting the status, you'll get messages such as those. Just don't delete the mm2-offsets topics, as doing so would result in MM2 starting from the beginning of all source partitions and re-replicating everything. You can also check that there are enough partitions in the config and status topics to account for all the Connect workers. It's possible that you're in a rebalance loop from too many consumers to those internal topics. Ryanne On Wed, Jul 29, 2020 at 1:03 AM Iftach Ben-Yosef < iben-yosef@outbrain.com > wrote: > Hello > > I'm running a mirrormaker 2 cluster which copies from 3 source clusters > into 1 destination. Yesterday I restarted the cluster and it took 1 of the > mirrored topics a pretty long time to recov...

Consumer cannot move past missing offset

We are having issues with some of our older consumers getting stuck reading a topic. The issue seems to occur at specific offsets. Here's an excerpt from kafka-dump-log on the topic partition around the offset in question: baseOffset: 13920966 lastOffset: 13920987 count: 6 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 49 isTransactional: false isControl: false position: 98516844 CreateTime: 1595224747691 size: 4407 magic: 2 compresscodec: NONE crc: 1598305187 isvalid: true | offset: 13920978 CreateTime: 1595224747691 keysize: 36 valuesize: 681 sequence: -1 headerKeys: [] | offset: 13920979 CreateTime: 1595224747691 keysize: 36 valuesize: 677 sequence: -1 headerKeys: [] | offset: 13920980 CreateTime: 1595224747691 keysize: 36 valuesize: 680 sequence: -1 headerKeys: [] | offset: 13920984 CreateTime: 1595224747691 keysize: 36 valuesize: 681 sequence: -1 headerKeys: [] | offset: 13920985 CreateTime: 1595224747691 keysize: 36 valuesize:...

Partition assignment not well distributed over threads

We have a Kafka Streams (2.4) app consisting of 5 instances. It reads from a Kafka topic with 20 partitions (5 brokers). We notice that the partition assignment does not always lead to well distributed load over the different threads. We notice this at startup as well as after a recovery of a failed thread. 1. At startup, some instances get a significantly lower load and sometimes even no load. It seems like instances that come up slightly later get no partitions assigned (because of sticky assignment?). 2. When one thread (container) dies and comes back it often does not receive any or very few partitions to work on. We assume this has to do with the sticky assignment. Is there any way we can make this distribution more equal? I was also wondering whether Kafka Streams takes into account colocation of Kafka brokers with stream processing threads when assigning partitions. Do partitions on brokers get assigned to the streams thread that is colocated with it on the same ma...

Re: Re: Kafka compatibility between 2.2.x to 0.10.0.0

Hi Vitalii Stoianov, thanks for the reply. We are going to upgrade both clients and brokers from 0.10.0.0 to 2.2.x,we have multi clusters now, so probably we will upgrade clients first then upgrade kafka cluster one by one. I known this upgrade wont able to roll back, so if someone has done this before, you can tell me what problems we are facing. Best, Xingxing Di From: Vitalii Stoianov Date: 2020-07-25 16:12 To: users Subject: Re: Kafka compatibility between 2.2.x to 0.10.0.0 Hi Xingxing Di, Are you going to upgrade clients from 0.10.0.0 to 2.2.x or clients + brokers ? 1. I think so, for more information on what incompatibilities clients may notice when working with an old broker please check this page: https://cwiki.apache.org/confluence/display/KAFKA/Compatibility+Matrix 2. Message format change, check upgrade there are some steps after which you won't be able to roll back the server if something goes wrong: https://kafka.apache.org/22/document...

Mirrormaker 2 logs - WARN Catching up to assignment's config offset

Hello I'm running a mirrormaker 2 cluster which copies from 3 source clusters into 1 destination. Yesterday I restarted the cluster and it took 1 of the mirrored topics a pretty long time to recover (2~ hours) Since the restart the mm2 cluster has been sending a lot of these warning messages from all 3 source clusters WARN [Worker clientId=connect-4, groupId=local-ny-mm2] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020) Here is a snippet of how the logs look like TDOUT: [2020-07-29 05:44:17,143] INFO [Worker clientId=connect-2, groupId=local-chi-mm2] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:222) STDOUT: [2020-07-29 05:44:17,143] INFO [Worker clientId=connect-2, groupId=local-chi-mm2] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:552) STDOUT: [2020-07-29 05:44:17,144] INFO [Worker clientId=connect-2, groupId=lo...

[VOTE] 2.6.0 RC2

Hello Kafka users, developers and client-developers, This is the third candidate for release of Apache Kafka 2.6.0. This is a major release that includes many new features, including: * TLSv1.3 has been enabled by default for Java 11 or newer. * Smooth scaling out of Kafka Streams applications * Kafka Streams support for emit on change * New metrics for better operational insight * Kafka Connect can automatically create topics for source connectors * Improved error reporting options for sink connectors in Kafka Connect * New Filter and conditional SMTs in Kafka Connect * The default value for the `client.dns.lookup` configuration is now `use_all_dns_ips` * Upgrade Zookeeper to 3.5.8 This release also includes a few other features, 74 improvements, 175 bug fixes, plus other fixes. Release notes for the 2.6.0 release: https://home.apache.org/~rhauch/kafka-2.6.0-rc2/RELEASE_NOTES.html *** Please download, test and vote by Monday, August 3, 9am PT Kafka's...

Re: RecordTooLargeException with old (0.10.0.0) consumer

Again, we haven't changed the default message size, I believe this exception is a red herring. On Tue, 2020-07-28 at 17:38 +0000, Manoj.Agrawal2@cognizant.com wrote: [EXTERNAL EMAIL] Attention: This email was sent from outside Xperi. DO NOT CLICK any links or attachments unless you expected them. ________________________________ Hi , You also make to change at producer and consumer side as well server.properties: message.max.bytes=15728640 replica.fetch.max.bytes=15728640 max.request.size=15728640 fetch.message.max.bytes=15728640 and producer.properties: max.request.size=15728640 consumer max.partition.fetch.bytes On 7/28/20, 9:51 AM, "Thomas Becker" < Thomas.Becker@xperi.com > wrote: [External] We have some legacy applications using an old (0.10.0.0) version of the consumer that are hitting RecordTooLargeExceptions with the following message: org.apach...