Hello Team,
During tiered storage performance testing on Kafka 3.9.1, we observed cases
where a broker did not trigger scheduled RLMCopy tasks, resulting in no
segment uploads to remote storage. Scenarios observed:
- During a rolling restart while tiered storage was actively copying
segments, one broker (out of six) stopped copying segments after restart.
- After multiple rolling restarts before enabling tiered storage at
topic level, enabling tiered storage again caused one broker (out of six)
to stop copying segments.
In both cases, restarting the affected broker resolved the issue and
copying resumed. DEBUG logging showed that RLMCopy tasks were created for
leader partitions, traffic was active, and rolled segments were available.
However, no debug logs appeared from the actual copy path
< https://github.com/apache/kafka/blob/3.9/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L839 >,
suggesting the copy workflow was not being tri...