Skip to main content

Re: Leverage multiple disks for kafka streams stores

Hi Adrian,

Thank you for the additional information!

One reason to have a single folder is that Streams also stores metadata
that refers to all state stores in the state directory. That could be
changed if we have a good reason.

If you have a good idea to solve this issue, please feel free to open a
KIP. Would be glad to discuss such a KIP.

Best,
Bruno

On 19.05.22 15:40, Adrian Tubio wrote:
> Hi Bruno,
>
> Thanks a lot for your answer.
>
> I have tried to tune store by store to the best of my ability, and indeed I
> have managed to improve considerably. We even changed the disk to a much
> faster one. But it's still not enough.
>
> Yes we can try dividing the application up into sub applications to make
> use of different disks, but it feels like an artificial solution.
>
> There might be reasons I don't know of to have a single folder for all
> stores, but it feels limiting, especially if you consider that you can
> plugin other types of stores instead of rocks db which doesn't even use
> local disk.
>
> If my CPU is ok, my memory is ok and the only limiting factor is Disk, why
> not allow the usage of multiple disks instead?
> Especially in cloud deployments in which you can arbitrarily attach
> multiple volumes, sometimes it is cheaper to use several cheaper volumes in
> parallel than a single very expensive one.
>
> I personally believe that this should be considered for a KIP.
>
> Best regards,
>
> Adrian Tubio
>
>
>
> On Thu, May 19, 2022 at 1:49 PM Bruno Cadonna <cadonna@apache.org> wrote:
>
>> Hi Adrian,
>>
>> I am afraid that you cannot set the state directory for a single state
>> store to a different directory than all other stores.
>>
>> Maybe the following blog post can help you debug and solve your issue:
>>
>>
>> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance
>>
>> Specifically look at the section "High disk I/O and write stalls":
>>
>> https://www.confluent.io/blog/how-to-tune-rocksdb-kafka-streams-state-stores-performance/#write-stalls
>>
>> Best,
>> Bruno
>>
>>
>> On 19.05.22 10:56, Adrian Tubio wrote:
>>> Hi there,
>>>
>>> My kafka streams topology has one store that is particularly busy, that
>>> alongside other stores in the same topology is exhausting I/O which leads
>>> to write stalls and increased latency.
>>>
>>> The amount of compaction that this store does with regards to others is
>>> about 3/4 times more, so we were wondering if, since we have more
>>> disks/volumes available, would it be possible to set a different path for
>>> this store so it falls into a different disk?
>>>
>>> I don't seem to be able to find any way to do it, ideally it should be
>> done
>>> via RocksDbConfigSetter, but that doesn't seem to offer that possibility
>> as
>>> it seems the state store comes from StateStoreContext which is
>> initialized
>>> from the STATE_DIR_CONFIG global setting.
>>>
>>> Has anyone done something similar?
>>>
>>> Best regards,
>>>
>>> Adrian Tubio
>>>
>>
>

Comments