Skip to main content

Re: Question on Kafka Connect worker offset reset config for internal topics

Hi Imcom Jin,

Thanks for your question!

It is expected behavior that Connect's internal topics are read completely
from the beginning each time the worker starts, regardless of the
auto.offset.reset configuration [1].
This is because they are compacted topics, and the first message in the
topic may be necessary for correctness reasons. For example, if a worker
only reads from the latest offset of the status topic, it may not know the
status of long-running stable tasks.

If you want to reduce the startup time, I suggest reducing the segment
rolling configurations [2,3] for the internal topics. This will permit
Kafka to compact away the duplicate status messages sooner, preventing them
from being read on a future startup. This was previously reported [4] but
we have not yet changed the default.

I hope this helps,
Greg

[1]
https://github.com/apache/kafka/blob/c4fb1008c4856c8cf9594269c86323753e6860ce/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L274-L278
[2] https://kafka.apache.org/documentation/#topicconfigs_segment.bytes
[3] https://kafka.apache.org/documentation/#topicconfigs_segment.ms
[4] https://issues.apache.org/jira/browse/KAFKA-15086

On Thu, Aug 14, 2025 at 9:44 AM Imcom JIN <imcom.jin@nexusguard.com> wrote:

> Hi dear Kafka team,
>
> I see that no matter what properties I give to the connector, the offset
> reset config for internal topics, especially the offset storage topic, say
> my-connect-offsets always use "earliest" which leads to very long bootstrap
> time during restart or stuck workers
>
> Log sample and config sample print in the log
>
> 2025-08-12 10:10:45,531 INFO [Consumer
> clientId=cbdhk04-data-cluster-offsets, groupId=cbdhk04-data-cluster]
> Seeking to earliest offset of partition
>
> root@cbd:/usr/local/nxg/docker/kafka-connect# docker logs
> connect-replication-8085 | grep "auto.offset.reset = earliest" -C2
> auto.commit.interval.ms = 5000
> auto.include.jmx.reporter = true
> auto.offset.reset = earliest
>
> My connect-districuted.properties contains the following config
>
> producer.override.auto.offset.reset=latest
> consumer.override.auto.offset.reset=latest
> producer.auto.offset.reset=latest
> consumer.auto.offset.reset=latest
> auto.offset.reset=latest
> connector.client.config.override.policy=All
>
> None of the above can change the behaviour of the consumer initialized by
> connect to consume internal topics.
>
> What's the expected behaviour? How to improve the bootstrap time for havey
> connect cluster?
> What properties should I use to change the consumer config if possible at
> all.
>
> Thanks in advance
>
> --
> *Imcom Jin*
> Software Engineer Manager, SEG
> T : +8613552756336
>
> *NEXUSGUARD*
> www.nexusguard.com
> LinkedIn <https://www.linkedin.com/company/nexusguard> • Twitter
> <https://www.twitter.com/nexusguard> • Facebook
> <https://www.facebook.com/nxg.pr>
>
>
>
> Disclaimer: This e-mail message contains information intended solely for
> the intended recipient and is confidential or private in nature. If you are
> not the intended recipient, you must not read, disseminate, distribute,
> copy or otherwise use this message or any file attached to this message.
> Any such unauthorized use is prohibited and may be unlawful. If you have
> received this message in error, please notify the sender immediately by
> email, facsimile or telephone and then delete the original message from
> your machine.
>

Comments