Skip to main content

Re: uneven distribution of events across kafka topic partitions for small number of unique keys

I'm sorry. I misread your message. I thought you were asking about increasing the number of partitions on a topic after there were keyed events in it.

> On Nov 22, 2021, at 3:07 AM, Pushkar Deole <pdeole2015@gmail.com> wrote:
>
> Dave,
>
> i am not sure i get your point... it is not about lesser partitions, the
> issue is about the duplicate hash caused by default partitioner for 2
> different string, which might be landing the 2 different keys into same
> partition
>
>> On Sun, Nov 21, 2021 at 9:33 PM Dave Klein <daveklein@usa.net> wrote:
>>
>> Another possibility, if you can pause processing, is to create a new topic
>> with the higher number of partitions, then consume from the beginning of
>> the old topic and produce to the new one. Then continue processing as
>> normal and all events will be in the correct partitions.
>>
>> Regards,
>> Dave
>>
>>>> On Nov 21, 2021, at 7:38 AM, Pushkar Deole <pdeole2015@gmail.com> wrote:
>>>
>>> Thanks Luke, I am sure this problem would have been faced by many others
>>> before so would like to know if there are any existing custom algorithms
>>> that can be reused,
>>>
>>> Note that we also have requirement to maintain key level ordering, so
>> the
>>> custom partitioner should support that as well
>>>
>>>> On Sun, Nov 21, 2021, 18:29 Luke Chen <showuon@gmail.com> wrote:
>>>>
>>>> Hello Pushkar,
>>>> Default distribution algorithm is by "hash(key) % partition_count", so
>>>> there's possibility to have the uneven distribution you saw.
>>>>
>>>> Yes, there's a way to solve your problem: custom partitioner:
>>>>
>> https://kafka.apache.org/documentation/#producerconfigs_partitioner.class
>>>>
>>>> You can check the partitioner javadoc here
>>>> <
>>>>
>> https://kafka.apache.org/30/javadoc/org/apache/kafka/clients/producer/Partitioner.html
>>>>>
>>>> for reference. You can see some examples from built-in partitioners, ex:
>>>>
>>>>
>> clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java.
>>>> Basically, you want to focus on the "partition" method, to define your
>> own
>>>> algorithm to distribute the keys based on the events, ex: key-1 ->
>>>> partition-1, key-2 -> partition-2... etc.
>>>>
>>>> Thank you.
>>>> Luke
>>>>
>>>>
>>>> On Sat, Nov 20, 2021 at 2:55 PM Pushkar Deole <pdeole2015@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We are experiencing some uneven distribution of events across topic
>>>>> partitions for a small set of unique keys: following are the details:
>>>>>
>>>>> 1. topic with 6 partitions
>>>>> 2. 8 unique keys used to produce events onto the topic
>>>>>
>>>>> Used 'key' based partitioning while producing events onto the above
>> topic
>>>>> Observation: only 3 partitions were utilized for all the events
>>>> pertaining
>>>>> to those 8 unique keys.
>>>>>
>>>>> Any idea how can the load be even across partitions while using key
>> based
>>>>> partitioning strategy? Any help would be greatly appreciated.
>>>>>
>>>>> Note: we cannot use round robin since key level ordering matters for us
>>>>>
>>>>
>>
>>

Comments