Hi Avishek Das:
To be honest, I don't currently have an environment to monitor the log to
know the details logic for upload. Based on my code review, it appears that
the task was created in my case.
The similarities between my case and yours are:
1. Not all servers are affected.
2. The issue occurred after a restart.
You could try the following method to see if it resolves the issue:
1. Upgrade Kafka to 4.2.0, which was just released this week.
2. If you don't want to upgrade kafka. Increase
remote.log.metadata.initialization.retry.max.timeout.ms from 2 minutes to a
larger value, such as 20 minutes and do restart.
Maybe we can also wait whether other guys can have some other feedbacks for
your issue. Thanks
Regards
Jian
Avishek Das via users <users@kafka.apache.org> 于2026年2月18日周三 21:27写道:
> Hello Jian,
>
> Thanks for sharing the references.
>
> I just wanted to confirm, did you encounter the same issue where the
> RLMCopyTask was created but did not copy segments to remote storage, or was
> the RLMCopyTask not created at all in your case?
>
> On Tue, Feb 17, 2026 at 8:48 AM jian fu <fujian1115@gmail.com> wrote:
>
> > Hi Avishek Das:
> >
> > Though I am not sure about your issue. but I encounter the similar issue
> > and fix it with:
> >
> > https://github.com/apache/kafka/pull/20007
> > https://github.com/apache/kafka/pull/20203
> > https://cwiki.apache.org/confluence/x/Hg9JFg
> >
> > You can refer to it to see if it is the same issue. thanks
> >
> > Regards
> > Jian
> >
> > Avishek Das via users <users@kafka.apache.org> 于2026年2月16日周一 21:32写道:
> >
> > > Hello Team,
> > >
> > > During tiered storage performance testing on Kafka 3.9.1, we observed
> > cases
> > > where a broker did not trigger scheduled RLMCopy tasks, resulting in no
> > > segment uploads to remote storage. Scenarios observed:
> > >
> > > - During a rolling restart while tiered storage was actively copying
> > > segments, one broker (out of six) stopped copying segments after
> > > restart.
> > >
> > > - After multiple rolling restarts before enabling tiered storage at
> > > topic level, enabling tiered storage again caused one broker (out of
> > > six)
> > > to stop copying segments.
> > >
> > > In both cases, restarting the affected broker resolved the issue and
> > > copying resumed. DEBUG logging showed that RLMCopy tasks were created
> for
> > > leader partitions, traffic was active, and rolled segments were
> > available.
> > > However, no debug logs appeared from the actual copy path
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/3.9/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L839
> > > >,
> > > suggesting the copy workflow was not being triggered despite tasks
> being
> > > scheduled.
> > >
> > > Please let me know if this is a known issue or if any configuration
> might
> > > have been missed during testing. I can share additional logs if
> helpful.
> > >
> > > Thanks!
> > > --
> > > Avishek Das
> > >
> >
>
>
> --
> Avishek Das
> Member Of Technical Staff | Salesforce
> Mobile: +917008383890
>
To be honest, I don't currently have an environment to monitor the log to
know the details logic for upload. Based on my code review, it appears that
the task was created in my case.
The similarities between my case and yours are:
1. Not all servers are affected.
2. The issue occurred after a restart.
You could try the following method to see if it resolves the issue:
1. Upgrade Kafka to 4.2.0, which was just released this week.
2. If you don't want to upgrade kafka. Increase
remote.log.metadata.initialization.retry.max.timeout.ms from 2 minutes to a
larger value, such as 20 minutes and do restart.
Maybe we can also wait whether other guys can have some other feedbacks for
your issue. Thanks
Regards
Jian
Avishek Das via users <users@kafka.apache.org> 于2026年2月18日周三 21:27写道:
> Hello Jian,
>
> Thanks for sharing the references.
>
> I just wanted to confirm, did you encounter the same issue where the
> RLMCopyTask was created but did not copy segments to remote storage, or was
> the RLMCopyTask not created at all in your case?
>
> On Tue, Feb 17, 2026 at 8:48 AM jian fu <fujian1115@gmail.com> wrote:
>
> > Hi Avishek Das:
> >
> > Though I am not sure about your issue. but I encounter the similar issue
> > and fix it with:
> >
> > https://github.com/apache/kafka/pull/20007
> > https://github.com/apache/kafka/pull/20203
> > https://cwiki.apache.org/confluence/x/Hg9JFg
> >
> > You can refer to it to see if it is the same issue. thanks
> >
> > Regards
> > Jian
> >
> > Avishek Das via users <users@kafka.apache.org> 于2026年2月16日周一 21:32写道:
> >
> > > Hello Team,
> > >
> > > During tiered storage performance testing on Kafka 3.9.1, we observed
> > cases
> > > where a broker did not trigger scheduled RLMCopy tasks, resulting in no
> > > segment uploads to remote storage. Scenarios observed:
> > >
> > > - During a rolling restart while tiered storage was actively copying
> > > segments, one broker (out of six) stopped copying segments after
> > > restart.
> > >
> > > - After multiple rolling restarts before enabling tiered storage at
> > > topic level, enabling tiered storage again caused one broker (out of
> > > six)
> > > to stop copying segments.
> > >
> > > In both cases, restarting the affected broker resolved the issue and
> > > copying resumed. DEBUG logging showed that RLMCopy tasks were created
> for
> > > leader partitions, traffic was active, and rolled segments were
> > available.
> > > However, no debug logs appeared from the actual copy path
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/3.9/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L839
> > > >,
> > > suggesting the copy workflow was not being triggered despite tasks
> being
> > > scheduled.
> > >
> > > Please let me know if this is a known issue or if any configuration
> might
> > > have been missed during testing. I can share additional logs if
> helpful.
> > >
> > > Thanks!
> > > --
> > > Avishek Das
> > >
> >
>
>
> --
> Avishek Das
> Member Of Technical Staff | Salesforce
> Mobile: +917008383890
>
Comments
Post a Comment