Kafka

Posts

Showing posts from December, 2020

Add REST Endpoint for Health check etc in Kafka Connect app

Hi, I'd like to add a REST Endpoint for Health check in our Kafka Connect application. One way to do that is to implement ConnectRestExtension interface, but the problem is only ConnectRestExtensionContext will be passed in to the class, there is so few info contained in ConnectClusterState. Is there a way to add a REST Endpoint easily? Ideally I'd like to pass in the Herder object, from there I can get most of the Kafka status. I'd also like to reuse the RestServer from Kafka as we implemented some security patch to it, don't want to create a new RestServer/JettyServer, that will be too heavyweight for this task. Thanks.

does Kafka exactly-once guarantee also apply to kafka state stores?

Hi All, We use Kafka streams and may need to use exactly-once configuration for some of the use cases. Currently, the application uses either local or global state store to store state. So, the application will consume events from source kafka topic, process the events, for state stores it will use either local or global state store of kafka, then produce events onto the destination topic. Question i have is: in the case of exactly-once setting, kafka streams guarantees that all actions happen or nothing happens. So, in this case, any state stored on the local or global state store will also be counted under 'all or nothing' guarantee e.g. if event is consumed and state store is updated, however some issue occurs before event is produced on destination topic then will state store be restored back to the state before it was updated for this event?

Re: [ANNOUNCE] Apache Kafka 2.7.0

Thanks a lot for running this release, Bill! ________________________________ From: Kowshik Prakasam < kprakasam@confluent.io > Sent: Tuesday, December 29, 2020 5:34 AM To: dev@kafka.apache.org < dev@kafka.apache.org > Cc: Users < users@kafka.apache.org > Subject: Re: [ANNOUNCE] Apache Kafka 2.7.0 Thank you for running the release, Bill! Congrats to the community! Cheers, Kowshik On Mon, Dec 28, 2020 at 12:20 PM Michael Chisina < chisinam@gmail.com > wrote: > Hello, > > Is there a way to configure powerdns recursor DNS queries and Apache Kafka > to stream to a remote postgresdb/timescaleDB server? Is there any > documentation online which might assist? > > Your assistance is greatly appreciated. > > Regards, > > Michael Chisina > > > > On Mon, Dec 28, 2020, 6:55 PM Ismael Juma < ismael@juma.me.uk > wrote: > > > Thanks for running the release, Bill. And congratu...

Re: [ANNOUNCE] Apache Kafka 2.7.0

Thank you for running the release, Bill! Congrats to the community! Cheers, Kowshik On Mon, Dec 28, 2020 at 12:20 PM Michael Chisina < chisinam@gmail.com > wrote: > Hello, > > Is there a way to configure powerdns recursor DNS queries and Apache Kafka > to stream to a remote postgresdb/timescaleDB server? Is there any > documentation online which might assist? > > Your assistance is greatly appreciated. > > Regards, > > Michael Chisina > > > > On Mon, Dec 28, 2020, 6:55 PM Ismael Juma < ismael@juma.me.uk > wrote: > > > Thanks for running the release, Bill. And congratulations to the > community > > for another release! > > > > Ismael > > > > On Mon, Dec 21, 2020, 8:01 AM Bill Bejeck < bbejeck@apache.org > wrote: > > > > > The Apache Kafka community is pleased to announce the release for > Apache > > > Kafka 2.7.0 ...

Re: [ANNOUNCE] Apache Kafka 2.7.0

Hello, Is there a way to configure powerdns recursor DNS queries and Apache Kafka to stream to a remote postgresdb/timescaleDB server? Is there any documentation online which might assist? Your assistance is greatly appreciated. Regards, Michael Chisina On Mon, Dec 28, 2020, 6:55 PM Ismael Juma < ismael@juma.me.uk > wrote: > Thanks for running the release, Bill. And congratulations to the community > for another release! > > Ismael > > On Mon, Dec 21, 2020, 8:01 AM Bill Bejeck < bbejeck@apache.org > wrote: > > > The Apache Kafka community is pleased to announce the release for Apache > > Kafka 2.7.0 > > > > * Configurable TCP connection timeout and improve the initial metadata > > fetch > > * Enforce broker-wide and per-listener connection creation rate (KIP-612, > > part 1) > > * Throttle Create Topic, Create Partition and Delete Topic Operations > > * Add TRAC...

Re: [ANNOUNCE] Apache Kafka 2.7.0

Thanks for running the release, Bill. And congratulations to the community for another release! Ismael On Mon, Dec 21, 2020, 8:01 AM Bill Bejeck < bbejeck@apache.org > wrote: > The Apache Kafka community is pleased to announce the release for Apache > Kafka 2.7.0 > > * Configurable TCP connection timeout and improve the initial metadata > fetch > * Enforce broker-wide and per-listener connection creation rate (KIP-612, > part 1) > * Throttle Create Topic, Create Partition and Delete Topic Operations > * Add TRACE-level end-to-end latency metrics to Streams > * Add Broker-side SCRAM Config API > * Support PEM format for SSL certificates and private key > * Add RocksDB Memory Consumption to RocksDB Metrics > * Add Sliding-Window support for Aggregations > > This release also includes a few other features, 53 improvements, and 91 > bug fixes. > > All of the changes in this release can be found in the re...

Re: Producer batch size

Thanks Steve ! And yes by buffer.size i mean batch.size. Sorry for the typo. Let me restate my question. Lets assume producer i/o thread responsible to sending the messages to brokers is slow and app thread calling send method is very fast. While producer i/o thread is busy sending messages to brokers, app thread which is writing the message to buffer wrote messages upto the batch.size. What happens in this scenario ? does the app thread continue to write to the same batch or it create a new batch ? If it continue to write to the same batch then i/o thread when available send all the messages in the batch or only messages upto batch.size ? If it create new batch then i/o thread when available send all the batches or one batch at a time ? Thanks, Dhirendra. On Thu, Dec 24, 2020 at 9:28 PM Steve Howard < steve.howard@confluent.io > wrote: > If by buffer.size you mean batch.size, no it is very relevant. The > buffer.memory space is used to ensure th...

Re: Support for Uni-directional data-diode?

It might be best to do a web search for companies that know this stuff and speak to them. re. kafka over UDP I dunno but perhaps instead do normal kafka talking to a proxy machine via TCP and have that proxy forward traffic via UDP. If that works, would simplify the problem I guess. cheers jan On 23/12/2020, Danny - Terafence < danny@terafence.com > wrote: > Thank you Jan, > > The aim is to secure the sending side infrastructure and assets. Deny any > known and unkown attacks from the "outside" while maintaining real-time data > flowing outbound. > Data integrity may be maintained in various ways if the forwarded protocol > has such options. > > I wonder if KAFKA can run over UDP... for starters.. > > Anyone knows? > > On Dec 22, 2020 23:25, jan <rtm443x@googlemail.com.INVALID> wrote: > Dunno if it helps (if in doubt, probably not) but a search for the > term gets some useful articles (in...

Re: Producer batch size

If by buffer.size you mean batch.size, no it is very relevant. The buffer.memory space is used to ensure the application can still produce messages for a period of time until the producer can keep up with the application. The total time the producer has available to catch up is the sum of how long it takes to fill the default of 32MB + max.block.ms . batch.size controls (to some extent) how often the producer thread publishes messages to Kafka. At a default of only 16K, it is a very small fraction of the size of the local buffer (buffer.memory) used to store messages prior to transmission. This configuration however, can greatly affect throughput as well as latency. If you increase batch.size to 1MB from the default of 16K, you will see far less roundtrips from the producer to Kafka. This can often greatly increase the throughput of your application. Conversely, if you set it to 0, you effectively disable batching, but may see improvements in latency. The differe...

Re: Producer batch size

Thanks steve ! So if I understand correctly, the number of messages buffered can be greater than batch.size upto buffer.memory if the app is sending data faster than the producer i/o thread can send to broker. In this situation buffer.size becomes irrelevant. no ? Thanks, Dhirendra. On Wed, Dec 23, 2020 at 11:36 PM Steve Howard < steve.howard@confluent.io > wrote: > Hi Dhirenda, > > As long as buffer.memory (default 32MB) has space, the producer will > continue to write here. If that is exhausted, eventually the producer will > throw... > > org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory > within the configured max blocking time 60000 ms > > The 60 seconds it is given to dutifully clear some room in buffer.memory by > successfully sending messages is controlled by max.block.ms . > > Thanks, > > Steve > > On Wed, Dec 23, 2020 at 12:11 AM Dhirendra Singh < dhirendraks@gm...

Re: RE: RE: RE: Maintaining same offset while migrating from Confluent Replicator to Apache Mirror Maker 2.0

Great - that sounds a smart way of bridging the Replicator and MM2. Seems like even though the consumer group can be same between Replicator and MM2, the storage format of offsets are still little different, so we need an "adapter" anyway. Just need to monitor the "adapter" is constantly running and not lagging behind too much :) happy to see if there may be more interesting stories along the migration On 2020/12/21 05:21:39, < Amit.SRIVASTAV@cognizant.com > wrote: > Hi Ning and all, > > We got a crude way to solve this issue. Below are the high level steps: > > Read the message from Replicator's internal topic for storing offsets. [connect-offsets] > This topic stores the offsets for all topics which is getting replicated in key:value pair . For e.g. > Key : ["replicator-group",{"topic":"TEST","partition":0}] > Value: {"offset":24} > > For each topic a...

Re: Producer batch size

Hi Dhirenda, As long as buffer.memory (default 32MB) has space, the producer will continue to write here. If that is exhausted, eventually the producer will throw... org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms The 60 seconds it is given to dutifully clear some room in buffer.memory by successfully sending messages is controlled by max.block.ms . Thanks, Steve On Wed, Dec 23, 2020 at 12:11 AM Dhirendra Singh < dhirendraks@gmail.com > wrote: > Hi, > I have a question related to batch.size producer configuration. > What happens when batch.size has reached and the producer app thread sends > more data ? > Does the thread block till space becomes available in the buffer > containing the batch ? > > Thanks, > Dhirendra. >

Re: Support for Uni-directional data-diode?

Thank you Jan, The aim is to secure the sending side infrastructure and assets. Deny any known and unkown attacks from the "outside" while maintaining real-time data flowing outbound. Data integrity may be maintained in various ways if the forwarded protocol has such options. I wonder if KAFKA can run over UDP... for starters.. Anyone knows? On Dec 22, 2020 23:25, jan <rtm443x@googlemail.com.INVALID> wrote: Dunno if it helps (if in doubt, probably not) but a search for the term gets some useful articles (inc. < https://en.wikipedia.org/wiki/Unidirectional_network >) and a company < https://owlcyberdefense.com/blog/what-is-data-diode-technology-how-does-it-work/ > who may be worth contacting (I'm not affiliated in any way). The first question I'd ask myself is, would a burn-to-dvd solution work? Failing that, basic stuff like email? In any case, what if the data's corrupted, how can the server's detect and re-request? W...

Producer batch size

Hi, I have a question related to batch.size producer configuration. What happens when batch.size has reached and the producer app thread sends more data ? Does the thread block till space becomes available in the buffer containing the batch ? Thanks, Dhirendra.

Re: Support for Uni-directional data-diode?

Dunno if it helps (if in doubt, probably not) but a search for the term gets some useful articles (inc. < https://en.wikipedia.org/wiki/Unidirectional_network >) and a company < https://owlcyberdefense.com/blog/what-is-data-diode-technology-how-does-it-work/ > who may be worth contacting (I'm not affiliated in any way). The first question I'd ask myself is, would a burn-to-dvd solution work? Failing that, basic stuff like email? In any case, what if the data's corrupted, how can the server's detect and re-request? What are you protecting against exactly? Stuff like that. jan On 22/12/2020, Danny - Terafence < danny@terafence.com > wrote: > Hello, > > Merry Christmas, > > My name is Danny Michaeli, I am Terafence's Technical Services Manager. > > One of our customers is using KAFKA to gather ICS SEIM data to collect and > forward to AI servers. > > They have requested us to propose a uni-direc...

Re: Kafka Scaling Ideas

Hm, it's an optimization for "first layer", so if the bottleneck is in "second layer" (i.e. DB write) as you mentioned, it shouldn't make much difference I think. 2020年12月22日(火) 16:02 Yana K < yanak1019@gmail.com >: > I thought about it but then we don't have much time - will it optimize > performance? > > On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada < ocadaruma@gmail.com > wrote: > > > About "first layer" right? > > Then it's better to make sure that not get() the result of > Producer#send() > > for each message, because in that way, it spoils the ability of > > producer-batching. > > Kafka producer batches messages by default and it's very efficient, so if > > you produce in async way, it rarely becomes a bottleneck in general. > > > Also are there any producer optimizations > > > > By the way, if "first layer" just f...

Re: --override option for bin/connect-distributed.sh

Hi Tom, thank for your suggestion. I'll follow the KIP process. regards, aki El mar, 22 dic 2020 a las 10:18, Tom Bentley (< tbentley@redhat.com >) escribió: > > Hi Aki, > > Since this is a change to a public API of the project it would need to be > done through the KIP process [1]. Since writing the KIP in this case isn't > much work, I suggest you write it up as a proposal and start a KIP > discussion thread on the dev@ mailing list, then interested people can > comment there. > > Kind regards, > > Tom > > [1]: > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > On Mon, Dec 21, 2020 at 8:57 PM Aki Yoshida < elakito@gmail.com > wrote: > > > Hi Kafka team, > > I think the --override option of Kafka is very practical in starting > > Kafka for various situations without changing the properties file. I > > missed this feature in ...

Re: --override option for bin/connect-distributed.sh

Hi Aki, Since this is a change to a public API of the project it would need to be done through the KIP process [1]. Since writing the KIP in this case isn't much work, I suggest you write it up as a proposal and start a KIP discussion thread on the dev@ mailing list, then interested people can comment there. Kind regards, Tom [1]: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals On Mon, Dec 21, 2020 at 8:57 PM Aki Yoshida < elakito@gmail.com > wrote: > Hi Kafka team, > I think the --override option of Kafka is very practical in starting > Kafka for various situations without changing the properties file. I > missed this feature in Kafka-Connect and I wanted to have it, so I > created a patch in this commit in my forked repo. > > https://github.com/elakito/kafka/commit/1e54536598d1ce328d0aee10edb728270cc04af1 > > Could someone tell me if this is a good idea or a bad idea? If bad, is > ...

Support for Uni-directional data-diode?

Hello, Merry Christmas, My name is Danny Michaeli, I am Terafence's Technical Services Manager. One of our customers is using KAFKA to gather ICS SEIM data to collect and forward to AI servers. They have requested us to propose a uni-directional solution to avoid being exposed from the AI server site. Can you, please advise as to if and how can this be done? B. Regards, Danny Michaeli Technical Services Manager Tel.: +972- 73-3791191 Cell: +972-52-882-3108

Producer closed while allocating memory error

I am getting following error in kafka producer. org.apache.kafka.common.KafkaException: Producer closed while allocating memory at org.apache.kafka.clients.producer.internals.BufferPool.allocate(BufferPool.java:151) ~[kafka-clients-2.5.0.jar:?] at org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:221) ~[kafka-clients-2.5.0.jar:?] at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:941) ~[kafka-clients-2.5.0.jar:?] at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862) ~[kafka-clients-2.5.0.jar:?] What could be the reason for this error message? Kafka version i am using is 2.5.0

Re: Kafka Scaling Ideas

I thought about it but then we don't have much time - will it optimize performance? On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada < ocadaruma@gmail.com > wrote: > About "first layer" right? > Then it's better to make sure that not get() the result of Producer#send() > for each message, because in that way, it spoils the ability of > producer-batching. > Kafka producer batches messages by default and it's very efficient, so if > you produce in async way, it rarely becomes a bottleneck in general. > > Also are there any producer optimizations > > By the way, if "first layer" just filters then produces messages without > interacting with any other external DB, using KafkaStreams should be much > easier. > > 2020年12月22日(火) 3:27 Yana K < yanak1019@gmail.com >: > > > Thanks! > > > > Also are there any producer optimizations anyone can think of in this > > sc...