Skip to main content

Kafka Streams app process records until certain date

Hello

For my use case I need to work with a chuck of records, let's say per
month... We have over two years of data... and we are testing if we can
deploy it to production, but we need to test in small batches.

I have built a Kafka Streams app that processes two input topics and output
to one topic.

I would like to process the first two months of data. Is that possible?

- I have tried blocking the consumer thread using .map and comparing the
timestamp on the message and a timestamp I get from another system that
would tell me until what time I should process on the two KStreams I have
but I have noticed.I also increased MAX_POLL_INTERVAL_MS_CONFIG but I have
noticed the messages that are in range do not get processed and sent to the
output topic.
- I have also seen a Spring Cloud library apparently offer a
pause-resume feature.
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.5/reference/html/spring-cloud-stream-binder-kafka.html#_binding_visualization_and_control_in_kafka_streams_binder
- I have also seen that implementing a transformer or processor could
work but in this case the state store would possible less than years of
data. That is something I would like to avoid.


Any help is appreciated.

regards
- Miguel

Comments