Skip to main content

Posts

Showing posts from March, 2020

Re: Broker always out of ISRs

Hi Zach, Any issues with partitions broker 2 is leader of? Also, have you checked b2's server.log? Cheers, Liam Clarke-Hutchinson On Wed, 1 Apr. 2020, 11:02 am Zach Cox, < zcox522@gmail.com > wrote: > Hi - We have a small Kafka 2.0.0 (Zookeeper 3.4.13) cluster with 3 brokers: > 0, 1, and 2. Each broker is in a separate rack (Azure zone). > > Recently there was an incident, where Kafka brokers and Zookeeper nodes > restarted, etc. After that occurred, we've had problems where broker 2 is > consistently out of many ISRs. A pattern we've observed is that broker 2 > will not be in any ISRs of partitions where broker 0 is leader, but will be > in ISRs of partitions where broker 1 is leader. Then at some point the > controller will change to a different broker, then 2 will not be in any > ISRs where 1 is leader, but will be in ISRs where 0 is leader. Each time > controller changes, this "flip flopping" o...

Broker always out of ISRs

Hi - We have a small Kafka 2.0.0 (Zookeeper 3.4.13) cluster with 3 brokers: 0, 1, and 2. Each broker is in a separate rack (Azure zone). Recently there was an incident, where Kafka brokers and Zookeeper nodes restarted, etc. After that occurred, we've had problems where broker 2 is consistently out of many ISRs. A pattern we've observed is that broker 2 will not be in any ISRs of partitions where broker 0 is leader, but will be in ISRs of partitions where broker 1 is leader. Then at some point the controller will change to a different broker, then 2 will not be in any ISRs where 1 is leader, but will be in ISRs where 0 is leader. Each time controller changes, this "flip flopping" of 2 in/out of ISRs changes. No matter what, 2 never seems to get into all ISRs. For topics with replicas=3, min.insync.replicas=2, and producers with acks=all, we only ever have ISR=(0,1), and occasionally 0 or 1 also briefly falls out of ISR, leading to producer retries and...

Re: Statestore restoration - Error while range compacting during restoring

Thanks Nicolas for the report, so are you suggesting that you couldn't turn on compactions for the state store? Is there a workaround? On Tue, Mar 31, 2020 at 9:54 AM Nicolas Carlot < nicolas.carlot@chronopost.fr > wrote: > After some more testing and debugging, it seems that it is caused by the > compaction option I've configured for RocksDB. When removed everything is > fine... > The option is as follow: > > CompactionOptionsFIFO fifoOptions = new CompactionOptionsFIFO(); > fifoOptions.setMaxTableFilesSize(maxSize); > fifoOptions.setAllowCompaction(true); > options.setCompactionOptionsFIFO(fifoOptions); > options.setCompactionStyle(CompactionStyle.FIFO); > > Le mar. 31 mars 2020 à 16:27, Nicolas Carlot < nicolas.carlot@chronopost.fr > > > a écrit : > > > Hello everyone, > > > > I'm currently facing an issue with RocksDb internal compaction process, > > whic...

Re: Statestore restoration - Error while range compacting during restoring

After some more testing and debugging, it seems that it is caused by the compaction option I've configured for RocksDB. When removed everything is fine... The option is as follow: CompactionOptionsFIFO fifoOptions = new CompactionOptionsFIFO(); fifoOptions.setMaxTableFilesSize(maxSize); fifoOptions.setAllowCompaction(true); options.setCompactionOptionsFIFO(fifoOptions); options.setCompactionStyle(CompactionStyle.FIFO); Le mar. 31 mars 2020 à 16:27, Nicolas Carlot < nicolas.carlot@chronopost.fr > a écrit : > Hello everyone, > > I'm currently facing an issue with RocksDb internal compaction process, > which occurs when the local state store of several of my KafkaStream > applications are being restored. This is sadly a huge concern as it > completely discard resiliency over node failure as those often lead to a > state store restoration. The only workaround I currently have is to delete > the local store to restore it...

Statestore restoration - Error while range compacting during restoring

Hello everyone, I'm currently facing an issue with RocksDb internal compaction process, which occurs when the local state store of several of my KafkaStream applications are being restored. This is sadly a huge concern as it completely discard resiliency over node failure as those often lead to a state store restoration. The only workaround I currently have is to delete the local store to restore it from scratch. I'm using version 2.4.1 of the Java libraries. The exception thrown by the KStream process is: org.apache.kafka.streams.errors.ProcessorStateException: Error while range compacting during restoring store merge_store at org.apache.kafka.streams.state.internals.RocksDBStore$SingleColumnFamilyAccessor.toggleDbForBulkLoading(RocksDBStore.java:615) ~[kafka-stream-router.jar:?] at org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:398) ~[kafka-stream-router.jar:?] at org.apache.kafka.s...

Re: How to disable OPTIONS method in confluent-control-center

You probably want the Confluent Platform mailing list for this: https://groups.google.com/forum/#!forum/confluent-platform (or Confluent Platform slack group: http://cnfl.io/slack with the #control-center channel). Or if you have a Confluent support contract, contact support :) -- Robin Moffatt | Senior Developer Advocate | robin@confluent.io | @rmoff On Tue, 31 Mar 2020 at 07:01, Sunil CHAUDHARI <sunilchaudhari@dbs.com.invalid> wrote: > Hi, > I don't know whether this question is relevant to this group? > Sorry If I posted in wrong group. > I want to disable OPTIONS method in Confluent-control center running on > port 9091. > Can someone guide me for required configurations? > > Regards, > Sunil. >

Re: Reg : Slowness in Kafka

Hi James, A JIRA would be helpful. It looks like something we should fix. Ismael On Mon, Mar 30, 2020 at 8:17 PM James Olsen < james@inaseq.com > wrote: > Christophe, > > See "Problems when Consuming from multiple Partitions" in the list > archive. I'll forward you the full conversation privately. It includes > debug logs that demonstrate fetches being discarded and refetched, but only > after one or more full expirations of the fetch.max.wait.ms even though > messages are available. It appears to be due to the re-fetch only querying > one of the Partitions allocated to the Consumer, so if that Partition is > empty you get the full delay even though messages are available (and in > fact already fetched) from another Partition. > > The exact behaviour differs depending on the server and client versions. > I tried 2.2.1 and 2.3.1 servers with a 2.4.0 client. Both introduced > delays and/or some par...

Re: Reg : Slowness in Kafka

Christophe, See "Problems when Consuming from multiple Partitions" in the list archive. I'll forward you the full conversation privately. It includes debug logs that demonstrate fetches being discarded and refetched, but only after one or more full expirations of the fetch.max.wait.ms even though messages are available. It appears to be due to the re-fetch only querying one of the Partitions allocated to the Consumer, so if that Partition is empty you get the full delay even though messages are available (and in fact already fetched) from another Partition. The exact behaviour differs depending on the server and client versions. I tried 2.2.1 and 2.3.1 servers with a 2.4.0 client. Both introduced delays and/or some partitions not being processed at all. The problems were only observed where a Consumer subscribed to multiple Partitions of the same Topic. I haven't raised an issue for it as I would have no expectation of it being fixed. I just made my ...

Re: Problems when Consuming from multiple Partitions

Resolved by downgrading Client to 2.2.2 and implementing an application level heartbeat on every Producer to avoid he UNKNOWN_PRODUCER_ID issue. > On 9/03/2020, at 16:08, James Olsen < james@inaseq.com > wrote: > > P.S. I guess the big question is what is the best way to handle or avoid UNKNOWN_PRODUCER_ID when running versions that don't include KAFKA-7190 / KAFKA-8710 ? > > We are using non-transactional idempotent Producers. > >> On 9/03/2020, at 12:59 PM, James Olsen < james@inaseq.com > wrote: >> >> For completeness I have also tested 2.4.0 Broker with 2.4.0 Client. All works correctly. Unfortunately as we are on AWS MSK we don't have the option to use 2.4.0 for the Brokers. >> >> So now I guess the question changes to what combo is best for us and will it avoid UNKNOWN_PRODUCER_ID problems? >> >> We can choose 2.2.1 or 2.3.1 for the Broker (AWS recommend 2.2.1 although don't ...

RE: Reg : Slowness in Kafka

> There are serious latency issues when mixing different client and server version Could you be more specific ? Link to any issue ? Thanks by advance ! Christophe ________________________________ De : James Olsen < james@inaseq.com > Envoyé : vendredi 27 mars 2020 01:48 À : users@kafka.apache.org < users@kafka.apache.org > Objet : Re: Reg : Slowness in Kafka Also check your Kafka Client and Server versions. There are serious latency issues when mixing different client and server versions IF your consumers handle multiple partitions. > On 27/03/2020, at 12:59, Chris Larsen < clarsen@confluent.io > wrote: > > Hi Vidhya, > > How many tasks are you running against the topic? How many partitions are > on the topic? Can you post the connector config anonymized? > > Best, > Chris > > On Thu, Mar 26, 2020 at 17:58 Vidhya Sakar < sakar.black@gmail.com > wrote: > >> Hi Team, >...

Re: [kafka-clients] [VOTE] 2.5.0 RC2

Hi David, Thanks for running this release. Sorry for the delay in bringing this up. I just wanted to draw attention to https://issues.apache.org/jira/browse/KAFKA-9731 that blocked us from upgrading to 2.4. Based on the earlier discussion, the fix may not require a lot of work. Regards, --Vahid On Tue, Mar 17, 2020 at 8:10 AM David Arthur < mumrah@gmail.com > wrote: > Hello Kafka users, developers and client-developers, > > This is the third candidate for release of Apache Kafka 2.5.0. > > * TLS 1.3 support (1.2 is now the default) > * Co-groups for Kafka Streams > * Incremental rebalance for Kafka Consumer > * New metrics for better operational insight > * Upgrade Zookeeper to 3.5.7 > * Deprecate support for Scala 2.11 > > > Release notes for the 2.5.0 release: > https://home.apache.org/~davidarthur/kafka-2.5.0-rc2/RELEASE_NOTES.html > > *** Please download, test and vote by Tuesday March 24, 2...

Re: Are RocksDBWindowStore windows hopping or sliding?

-----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEI8mthP+5zxXZZdDSO4miYXKq/OgFAl6CKpUACgkQO4miYXKq /OghQhAAs/3exA1eDlPhwyBzSBQ3Hv3bw0vpU5d00c2+iaRjzFmP8zI9qw9TkxEe E8sIONEfi3Kyw/imr5XIhHMoa+0fd/1Y1/YObK6kiQxi+xYB540K5ji0vlFvKPij 6gCJ0fdKSyzyrvbuUhrnA3G/h1hF0Oz98tJbC25MnALQX/huPMQDKNrkSsUls0pn cnSrezfDnPLWeHjETbYyx3A2cTRl2JM8b2Kh5R9iS3JV5fgLUSAz64ewfDnLmVKp BgDlLrkIcbCuLza2WRqI/Y26lYV5CTIzqGzpyL8ZnM1xuhvCtOvNIWL/Xi13KjPk eMZUByqupq5JIEadceLKfxWrA1wHFPTZNMCk5QbeO/8U1KIT3deRq/OvOtw7ubyh 0CRH0BeU5+Pu/clHnpbxKzxTeLS5mhaM9TuooKoqh8Ks+qXlX9FV0FuvAvq1oL/b 14WYXhUUtPhVSbzDAuCaenin2OL7HxHyqOooYlYzJKfygN3b2kvDhjmP6oMYjU+j RirxEP2AFECkKLB4jcJWyCJHRxcarrgMms0TDdokl5At9O8rYugP8z/QtDpV/uFL HMqHPity1XoveUUfo8Fww8lrnoTY2kGgEMhTV0Xn5ywyCQKeurBdzltYDpuSMgsW PTer25PBxYEhoyvQxwEYlVWjz8aBObqwG6gSXVw0x93uNObhUR4= =NAl2 -----END PGP SIGNATURE----- `windowSize` has not impact on used segments: it is only used to compute the window end timestamp when you fetch a window, because the end-time is no...

Brokers session terminated and removed

Hi, I am running into an issue where the Kafka brokers (0.10.2.1) are getting removed from the Zookeeper (3.4.14). Here is the setup. We have 3 Zookeeper nodes and 3 Kafka nodes in AWS. We are making use of auto-scaling group to get the replacement nodes on failures. When the Zookeeper and Kafka clusters are running, I can see the brokers registered in Zookeeper under /brokers/ids path. I then terminate the leader Zookeeper node and wait for AWS auto-scaling group to provide a replacement Zookeeper node. I then check /brokers/ids path to confirm if the brokers are still connected. I then terminate the second Zookeeper node and check for the path when a new Zookeeper node comes up. I don't have an issue till here. When I terminate the third Zookeeper node in the original list of Zookeeper nodes and I see that all the Kafka brokers' sessions are terminated and the brokers are removed from Zookeeper. The ids under /brokers/ids is empty. I can see the below log...

Re: Are RocksDBWindowStore windows hopping or sliding?

Hi, I understood how window stores are implemented using rocksdb. When creating an instance of RocksDBWindowStore we pass two additional arguments: retainDuplicates windowSize I have not clearly understood the purpose of these two. Like say in my application I just create one windowed store of a given size say 10 minutes and retention of 30 minutes. Does this mean internally it will create a one rocksdb segment for every record within 10 minutes boundary and retain it for 30 minutes? If a new record arrives beyond that 10 minutes a new segment gets created? How does retainDuplicates comes into play here? Thanks Sachin On Mon, Mar 2, 2020 at 12:49 AM Matthias J. Sax < mjsax@apache.org > wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > If you want to put a record into multiple window, you can do a `put()` > for each window. > > The DSL uses the store in the exact same manner for hopping window > (compare t...

Re: Kafka topic partition directory

Kafka doesn't monitor the contents of the log data directories unless it created the file or directory. If it didn't create the directory/file it will ignore it. -- Peter > On Mar 28, 2020, at 4:17 PM, anila devi <feluda1_99_1999@yahoo.com.invalid> wrote: > > Hi Users, > If I create a directory or a file in the same directory where kafka creates partition topic, the kafka broker node does not restart. Is it expected ? > Thanks,Dhiman >

Re: Kubernetes Pod got restarted Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)

Hi Samir, The SIGTERM is not originating from Kafka, it's being done to Kafka - this is Kubernetes gracefully shutting down or restarting your pod. When a pod is being shutdown, k8s sends a SIGTERM to all containers in that pod, which will then forward it to the main process in each container. After a certain period (default in Kubernetes is 30 seconds), if the pod hasn't shut down, it will then send a SIGKILL. More info here: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods As to why the pod was restarted - that really depends on how your k8s cluster is configured and administered, and really beyond the scope of a Kafka mailing list. But I'll point you at this for some starting points for investigation: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#voluntary-and-involuntary-disruptions Lastly, I strongly recommend you configure "terminationGracePeriodSeconds" for the pods running Kafka much highe...

Re: Newbie Question

Thanks Hans - this makes sense, except for the debug messages give me exactly what I need without having to instrument any clients. It should be noted that for now, I am running a single server, so perhaps the messages change when I cluster? I maybe caused confusion by mentioning that I want to know where the messages go - that is not quite precise from an individual message perspective, but it is right enough for what I want to achieve (for now ;-) ). I just want a record of each IP Address and which topic (or something that can be traced back to a topic) they are connected to, from a high level, without having to instrument the clients (which can be upwards of 10,000, and I have no control or access over). Currently, as I mentioned, the debug messages have exactly what I need for this phase: [2020-03-28 20:32:23,901] DEBUG Principal = User:ANONYMOUS is Allowed Operation = Read from host = x.x.x.x on resource = Topic:LITERAL:xxxx (kafka.authorizer.logger) Just figuring...

Re: Newbie Question

I can tell from the terminology you use that you are familiar with traditional message queue products. Kafka is very different. Thats what makes it so interesting and revolutionary in my opinion. Clients do not connect to topics because kafka is a distributed and clustered system where topics are sharded into pieces called partitions and the topic partitions are spread out across all the kafka brokers in the cluster (and also replicated several more times across the cluster for fault tolerance). When a client logically connects to a topic, its actually making many connections to many nodes in the kafka cluster which enables both parallel processing and fault tolerance. Also when a client consumes a message, the message is not removed from a queue, it remains in kafka for many days (sometimes months or years). It is not "taken off the queue" it is rather "copied from the commit log". It can be consumed again and again if needed because it is an immutable record...

Newbie Question

Hi All - just started to use Kafka. Just one thing driving me nuts. I want to get logs of each time a publisher or subscriber connects. I am trying to just get the IP that they connected from and the topic to which they connected. I have managed to do this through enabling debug in the kafka-authorizer, however, the number of logs are overwhelming as is the update rate (looks like 2 per second per client). What I am actually trying to achieve is to understand where messages go, so I would be more than happy to just see notifications when messages are actually sent and actually taken off the queue. Is there a more efficient way of achieving my goal than turning on debug? Cheers Rossi

Re: Kafka with RAID 5 on. busy cluster.

RAID 5 typically is slower because Kafka is very write heavy load and that creates a bottleneck because writes to any disk require parity writes on the other disks. -hans > On Mar 28, 2020, at 2:55 PM, Vishal Santoshi < vishal.santoshi@gmail.com > wrote: > > Ny one ? We doing a series of tests to be confident, but if there is some > data folks, who have had RAID 5 on kafka, have to share, please do. > > Regards. > >> On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi < vishal.santoshi@gmail.com > >> wrote: >> >> << In RAID 5 one can loose more than only one disk RAID here will be data >> corruption. >>>> In RAID 5 if one looses more than only one disk RAID there will be data >> corruption. >> >> On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi < >> vishal.santoshi@gmail.com > wrote: >> >>> One obvious issue is disk failure tolerati...

Re: Kafka with RAID 5 on. busy cluster.

Ny one ? We doing a series of tests to be confident, but if there is some data folks, who have had RAID 5 on kafka, have to share, please do. Regards. On Mon, Mar 23, 2020 at 11:29 PM Vishal Santoshi < vishal.santoshi@gmail.com > wrote: > << In RAID 5 one can loose more than only one disk RAID here will be data > corruption. > >> In RAID 5 if one looses more than only one disk RAID there will be data > corruption. > > On Mon, Mar 23, 2020 at 11:27 PM Vishal Santoshi < > vishal.santoshi@gmail.com > wrote: > >> One obvious issue is disk failure toleration . As in if RF =3 on.normal >> JBOD disk failure toleration is 2. In RAID 5 one can loose more than only >> one disk RAID here will be data corruption. effectively making the broker >> unusable, thus reducing our drive failure toleration to 2 drives ON 2 >> different brokers with the added caveat that we loose the whole broker as >...

Kubernetes Pod got restarted Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)

Hi, My pod got restarted with below message [2020-03-25 08:24:04,905] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler) .   When do I get this situation? What should I look for to troubleshoot this? [2020-03-25 08:24:04,905] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler) [2020-03-25 08:24:04,912] INFO [KafkaServer id=1001] shutting down (kafka.server.KafkaServer) [2020-03-25 08:24:04,913] INFO [KafkaServer id=1001] Starting controlled shutdown (kafka.server.KafkaServer) [2020-03-25 08:24:04,984] INFO [ReplicaFetcherManager on broker 1001] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager) [2020-03-25 08:24:04,984] INFO [ReplicaAlterLogDirsManager on broker 1001] Removed fetcher for partitions  (kafka.server.ReplicaAlterLogDirsManager) [2020-03-25 08:24:04,987] INFO [ReplicaFetcherManager on broker 1001] Removed fetcher ...