Skip to main content

INCONSISTENT_CLUSTER_ID causing migration to not take place.

Hello
I had a Kafka v3.8 running in Kubernetes with 3 replicas each of broker and zookeeper.

After upgrading to v3.9, I am attempting to migrate away from zookeeper to kraft.

So,
      broker.zookeeperMigrationMode: true
      broker.minId: 0
      controller.zookeeperMigrationMode: true
      controller.minId: 100
      kraft.isenabled: true
      zookeeper.isenabled: true.

I see in the log error messages "INCONSISTENT_CLUSTER_ID in FETCH response". Migration is never started or completed (Zk_MigrationState value is 2).
I have checked contents of meta.properties and I see that the "cluster.id" is same in all PODs. The value matches what exists in the zookeeper.
I have also deleted meta.properties file from all PODs and restarted them and I still see the error.


1.
Where is the discrepancy? ClusterId was never changed.
2.
Why is the error message not printing the clusterId's being compared?
https://github.com/apache/kafka/blob/31e1a57c41cf9cb600751669dc71bcd9596b45f9/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1459

       private boolean hasValidClusterId(String requestClusterId) {
            // We don't enforce the cluster id if it is not provided.
          if (requestClusterId == null) {
             return true;
           }
          return clusterId.equals(requestClusterId);
        }

          .

          .

          .

          private CompletableFuture<FetchResponseData> handleFetchRequest(
                    RaftRequest.Inbound requestMetadata,
                    long currentTimeMs
          ) {
                    FetchRequestData request = (FetchRequestData) requestMetadata.data();

                    if (!hasValidClusterId(request.clusterId())) {
                    return completedFuture(new FetchResponseData().setErrorCode(Errors.INCONSISTENT_CLUSTER_ID.code()));
                    }
          .

          .

Comments