Hello,
I have some rudimentary questions on state stores. My service is planned to
have two transformers, each listening to a different topic. Both topics
have the same number of partitions and the upstream producers to those
topics are consistent with respect to key schema. My question centers
around the fact that both transformers need to consult and update the same
persistent state store in order to make decisions with respect to record
processing. I am not implementing a custom key partitioner, I'm using the
default. Also, there is no re-keying done by either transformer.
Given the above scenario, I have the following questions:
1) Will a given key always hash to the same kstream consumer group member
for both transformers? You can imagine why this is important given that
they share a state store. My concern is that rebalancing may occur, and
somehow the key space for one of the transformers is moved to another pod,
but not both.
2) If transformer A processes a record R for a given key K, and the shared
state store is updated at key K as a result of that processing, does the
second transformer B have access to the updated state store value as soon
as transformer A is done processing the record? (Assume the record is
updated with a state store put()).
I have been told that in order to ensure that the partition assignments are
consistent across pods, for both input topics, I have to do some exotic
merging of the kstreams that process the input topics, which feels strange
and wrong.
Are there any other constraints or considerations that involve sharing a
state store across transformers that I should be thinking about in my
architecture for this service, but didn't mention?
Thanks for clarifying.
I have some rudimentary questions on state stores. My service is planned to
have two transformers, each listening to a different topic. Both topics
have the same number of partitions and the upstream producers to those
topics are consistent with respect to key schema. My question centers
around the fact that both transformers need to consult and update the same
persistent state store in order to make decisions with respect to record
processing. I am not implementing a custom key partitioner, I'm using the
default. Also, there is no re-keying done by either transformer.
Given the above scenario, I have the following questions:
1) Will a given key always hash to the same kstream consumer group member
for both transformers? You can imagine why this is important given that
they share a state store. My concern is that rebalancing may occur, and
somehow the key space for one of the transformers is moved to another pod,
but not both.
2) If transformer A processes a record R for a given key K, and the shared
state store is updated at key K as a result of that processing, does the
second transformer B have access to the updated state store value as soon
as transformer A is done processing the record? (Assume the record is
updated with a state store put()).
I have been told that in order to ensure that the partition assignments are
consistent across pods, for both input topics, I have to do some exotic
merging of the kstreams that process the input topics, which feels strange
and wrong.
Are there any other constraints or considerations that involve sharing a
state store across transformers that I should be thinking about in my
architecture for this service, but didn't mention?
Thanks for clarifying.
Comments
Post a Comment