Did reply on SO.
-Matthias
On 1/24/24 2:18 AM, warrior2031@gmail.com wrote:
> Let's say there's a topic in which chunks of different files are all
> mixed up represented by a tuple |(FileId, Chunk)|.
>
> Chunks of a same file also can be a little out of order.
>
> The task is to aggregate all files and store them into some store.
>
> The number of files is unbound.
>
> In pseudo stream DSL that might look like
>
> |topic('chunks') .groupByKey((fileId, chunk) -> fileId) .sortBy((fileId,
> chunk) -> chunk.offset) .aggregate((fileId, chunk) ->
> store.append(fileId, chunk)); |
>
> I want to understand whether kafka streams can solve this efficiently.
> Since the number of files is unbound how would kafka manage intermediate
> topics for groupBy operation? How many partitions will it use etc? Can't
> find this details in the docs. Also let's say chunk has a flag that
> indicates EOF. How to indicate that specific group will no longer have
> any new data?
>
>
> That's a copy of my stack overflow question.
> apple-touch-icon@2.png
> What does kafka streams groupBy does internally?
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
> stackoverflow.com
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
>
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
>
>
> —
> Michael
-Matthias
On 1/24/24 2:18 AM, warrior2031@gmail.com wrote:
> Let's say there's a topic in which chunks of different files are all
> mixed up represented by a tuple |(FileId, Chunk)|.
>
> Chunks of a same file also can be a little out of order.
>
> The task is to aggregate all files and store them into some store.
>
> The number of files is unbound.
>
> In pseudo stream DSL that might look like
>
> |topic('chunks') .groupByKey((fileId, chunk) -> fileId) .sortBy((fileId,
> chunk) -> chunk.offset) .aggregate((fileId, chunk) ->
> store.append(fileId, chunk)); |
>
> I want to understand whether kafka streams can solve this efficiently.
> Since the number of files is unbound how would kafka manage intermediate
> topics for groupBy operation? How many partitions will it use etc? Can't
> find this details in the docs. Also let's say chunk has a flag that
> indicates EOF. How to indicate that specific group will no longer have
> any new data?
>
>
> That's a copy of my stack overflow question.
> apple-touch-icon@2.png
> What does kafka streams groupBy does internally?
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
> stackoverflow.com
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
>
> <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
>
>
> —
> Michael
Comments
Post a Comment