Skip to main content

Guidance Needed: Kafka MirrorMaker2 configuration to prevent data loss

Hello,

I'm looking for guidance on how to properly configure MirrorMaker to ensure that no data is lost during normal replication, as well as during planned maintenance windows.

I recently encountered an issue where not all records were replicated to the target cluster, as only 974,345 out of 1 million records were present, which was verified using the kafka-get-offsets script. (only reproduced once)

The environment consists of two Kubernetes clusters configured in an active/standby topology, where the Strimzi Operator is used to deploy Kafka with three replicas and MirrorMaker2.

Before performing a switchover, I scale down MirrorMaker 2 and delete the heartbeats topic, as otherwise it gets replicated under different names, such as source.heartbeats, source.source.heartbeats, and so on.

The configuration currently in use is the following:

> apiVersion: kafka.strimzi.io/v1beta2
> kind: KafkaMirrorMaker2
> metadata:
> name: kafka-main
> spec:
> clusters:
> - alias: source
> bootstrapServers: kafka-external-kafka-mcs-bootstrap.kafka.svc.clusterset.local:9095
> config:
> consumer.fetch.max.wait.ms: 500
> consumer.fetch.min.bytes: 1048576
> - alias: target
> bootstrapServers: kafka-main-kafka-mcs-bootstrap.kafka.svc.clusterset.local:9095
> config:
> producer.batch.size: 65536
> producer.compression.type: lz4
> producer.linger.ms: 10
> producer.max.request.size: 10485760
> connectCluster: target
> jvmOptions:
> -Xms: 1g
> -Xmx: 2g
> mirrors:
> - checkpointConnector:
> config:
> checkpoints.topic.replication.factor: 3
> emit.checkpoints.interval.seconds: 30
> replication.policy.class: org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> sync.group.offsets.enabled: "true"
> sync.group.offsets.interval.seconds: 60
> sync.topic.configs.enabled: "true"
> tasksMax: 1
> groupsPattern: .*
> heartbeatConnector:
> config:
> emit.heartbeats.interval.seconds: 5
> heartbeats.topic.replication.factor: 3
> tasksMax: 1
> sourceCluster: source
> sourceConnector:
> config:
> consumer.auto.offset.reset: latest
> offset-syncs.topic.location: target
> offset-syncs.topic.replication.factor: 3
> offset.lag.max: 100
> refresh.groups.interval.seconds: 60
> refresh.topics.interval.seconds: 60
> replication.factor: 3
> replication.policy.class: org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> sync.topic.acls.enabled: "true"
> sync.topic.configs.enabled: "true"
> topics: .*
> topics.blacklist: .*[\-\.]internal, __consumer_offsets, __transaction_state,
> connect-.*, exclude-.*
> tasksMax: 10
> targetCluster: target
> topicsPattern: .*
> replicas: 3
> resources:
> limits:
> cpu: "2"
> memory: 4Gi
> requests:
> cpu: "1"
> memory: 2Gi
> version: 4.0.0

Any suggestions to improve the current configuration are welcome.

Comments