Team,
*Use-case :*
*IMAP* . I have an application in which an org has users , who use IMAP
to send mails, where the mail contents are produced to kafka.
Here the scaling factors are
1. org can grow from 1 to million
2. users can grow from 1 to million.
For this use-case, I need to calculate the producer rate and broker
response rate for a single machine.
So far we have identified, the factors that will be involved in
producer-rate are :
1. Message size
2. Request size
3. Request rate overhead
4. Request latency
5. Round Trip Time
6. Number of Sender Threads
7. Number of Processor Threads at Broker
8. Replication factor
Variables identified at Network layer, Kernel, NIC :
1. sysctl_wmem
2. Tx queues
3. Ring Buffer
4. Driver Queue
5. NAPI Polling
Observations made so far :
1. SocketChannel is the one who is the entry point of sending data at
the application level.
2. sendfile() system call used to transfer the data.
*Questions* :
1. How data is transferred from SocketChannel to NIC ? (ie) The
data-flow in-terms of network(protocol) layer, kernel, network device
drivers, NIC .
2. Since, each KafkaProducer instance will create an SocketChannel.What
is the maximum number of producer instances , a machine can have to utilise
the network in an efficient manner ?
3. In-addition to the above listed variables,
1. What are the list of variables involved in sending data in the
network layer ?
2. What are the list of variables involved in sending data in the
kernel ?
3. What are the list of variables involved in sending data to NIC ?
4. How to frame the producer rate in-terms of the variables identified
in each layer ?
5. *With the given machine hardware, how to precisely frame the producer
rate in a single formula in-terms of hardware and software level ?*
Anyone, Please help me in identifying the variables and also in-corporate
those variables in a single formula to frame the producer-rate for a
machine in-terms of producer instances.
Thanks in advance.
PS : I have already came across the following documents
-
https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
- https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
-
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
Regards,
Girija A.
*Use-case :*
*IMAP* . I have an application in which an org has users , who use IMAP
to send mails, where the mail contents are produced to kafka.
Here the scaling factors are
1. org can grow from 1 to million
2. users can grow from 1 to million.
For this use-case, I need to calculate the producer rate and broker
response rate for a single machine.
So far we have identified, the factors that will be involved in
producer-rate are :
1. Message size
2. Request size
3. Request rate overhead
4. Request latency
5. Round Trip Time
6. Number of Sender Threads
7. Number of Processor Threads at Broker
8. Replication factor
Variables identified at Network layer, Kernel, NIC :
1. sysctl_wmem
2. Tx queues
3. Ring Buffer
4. Driver Queue
5. NAPI Polling
Observations made so far :
1. SocketChannel is the one who is the entry point of sending data at
the application level.
2. sendfile() system call used to transfer the data.
*Questions* :
1. How data is transferred from SocketChannel to NIC ? (ie) The
data-flow in-terms of network(protocol) layer, kernel, network device
drivers, NIC .
2. Since, each KafkaProducer instance will create an SocketChannel.What
is the maximum number of producer instances , a machine can have to utilise
the network in an efficient manner ?
3. In-addition to the above listed variables,
1. What are the list of variables involved in sending data in the
network layer ?
2. What are the list of variables involved in sending data in the
kernel ?
3. What are the list of variables involved in sending data to NIC ?
4. How to frame the producer rate in-terms of the variables identified
in each layer ?
5. *With the given machine hardware, how to precisely frame the producer
rate in a single formula in-terms of hardware and software level ?*
Anyone, Please help me in identifying the variables and also in-corporate
those variables in a single formula to frame the producer-rate for a
machine in-terms of producer instances.
Thanks in advance.
PS : I have already came across the following documents
-
https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
- https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
-
https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
Regards,
Girija A.
Comments
Post a Comment