Skip to main content

Exactly once transactions

Hi!

I investigate possibilities of "exactly once" Kafka transactions for
consume-transform-produce pattern. As far as I understand, the logic must
be the following (in pseudo-code):

var cons = createKafkaConsumer(MY_CONSUMER_GROUP_ID);
cons.subscribe(TOPIC_A);
for (;;) {
var recs = cons.poll();
for (var part : recs.partitions()) {
var partRecs = recs.records(part);
var prod = getOrCreateProducerForPart(MY_TX_ID_PREFIX + part);
prod.beginTransaction();
sendAllRecs(prod, TOPIC_B, partRecs);
prod.sendOffsetsToTransaction(singletonMap(part,
lastRecOffset(partRecs) + 1),

MY_CONSUMER_GROUP_ID);
prod.commitTransaction();
}
}

Is this right approach?

Because it looks to me there is a possible race here and the same record
from topic A can be committed to topic B more than once: if rebalancing
happens after our thread polled a record and before creating a producer,
then another thread will read and commit the same record, after that our
thread will wake up, create a producer (and fence the other one) and
successfully commit the same record second time.

Can anyone explain please how to do "exactly once" in Kafka right?
Examples would be very helpful.

Sergi

Comments