Skip to main content

Re: duplicate packets in kafka topic

Hi,

What are duplicate messages in your use case?
1) different messages with the same content
2) the same message that is send multiple times to the broker due to
retries in the producer
3) something else

What do you mean with "identify those duplicates"? What do you want to do
with them?

For case 1), you could write all messages in a topic and then identify the
duplicates with a Kafka Streams application, process them and write the
results again to a topic. Be aware that identifying duplicate messages let
grow the state in the Kafka Stream application to the sum of the sizes of
all unique messages, because you have to store all messages in your state
to able to find duplicate future messages. That is not feasible in most
cases. To limit your state in the Streams application you can restrict the
identification of duplicates to a time window. For example, identify all
duplicate messages of the last hour. Within a window of one hour, you would
only process unique messages, but you would have duplicates across windows.
If you want a fail-safe identification of duplicates, you also need to
switch on exactly-once semantics in the Streams application.
See https://kafka.apache.org/documentation/streams/ and the Streams
configuration `processing.guarantee` under
https://kafka.apache.org/22/documentation/streams/developer-guide/config-streams.html#id6
for
more information on Kafka Streams and exactly-once semantics.

For case 2) and if you want to ensure that the same message is only written
once to the log you should look into idempotent producers.
See https://kafka.apache.org/documentation/#semantics and the producer
configuration `enable.idempotence` under
https://kafka.apache.org/documentation/#producerconfigs .

Hope that helps.

Best regards,
Bruno

On Fri, Apr 26, 2019 at 9:02 AM saching1984@gmail.com <saching1984@gmail.com>
wrote:

> I have multiple clients who can send duplicate packets multiple time to
> same kafka topic.. Is there a way to identify those duplicate packets.
>

Comments