Skip to main content

Re: Newbie Question

Thanks Hans - this makes sense, except for the debug messages give me
exactly what I need without having to instrument any clients. It should be
noted that for now, I am running a single server, so perhaps the messages
change when I cluster?
I maybe caused confusion by mentioning that I want to know where the
messages go - that is not quite precise from an individual message
perspective, but it is right enough for what I want to achieve (for now ;-)
). I just want a record of each IP Address and which topic (or something
that can be traced back to a topic) they are connected to, from a high
level, without having to instrument the clients (which can be upwards of
10,000, and I have no control or access over).
Currently, as I mentioned, the debug messages have exactly what I need for
this phase:
[2020-03-28 20:32:23,901] DEBUG Principal = User:ANONYMOUS is Allowed
Operation = Read from host = x.x.x.x on resource = Topic:LITERAL:xxxx
(kafka.authorizer.logger)
Just figuring there must be a better way of getting this info rather than
turning on debug.

On Sat, Mar 28, 2020 at 4:15 PM Hans Jespersen <hans@confluent.io> wrote:

> I can tell from the terminology you use that you are familiar with
> traditional message queue products. Kafka is very different. Thats what
> makes it so interesting and revolutionary in my opinion.
>
> Clients do not connect to topics because kafka is a distributed and
> clustered system where topics are sharded into pieces called partitions and
> the topic partitions are spread out across all the kafka brokers in the
> cluster (and also replicated several more times across the cluster for
> fault tolerance). When a client logically connects to a topic, its actually
> making many connections to many nodes in the kafka cluster which enables
> both parallel processing and fault tolerance.
>
> Also when a client consumes a message, the message is not removed from a
> queue, it remains in kafka for many days (sometimes months or years). It is
> not "taken off the queue" it is rather "copied from the commit log". It can
> be consumed again and again if needed because it is an immutable record of
> an event that happened.
>
> Now getting back to your question of how to see where messages get
> consumed (copied). The reality is that they go many places and can be
> consumed many times. This makes tracing and tracking message delivery more
> difficult but not impossible. There are many tools both open source and
> commercial that can track data from producer to kafka (with replication) to
> multiple consumers. They typically involve taking telemetry from both
> clients (producers and consumers) and brokers (all of them as they act as a
> cluster) and aggregate all the data to see the full flow of messages in the
> system. Thats why the logs may seem overwelming and you need to look at the
> logs of all the broker (and perhaps all the clients as well) to get the
> full picture.
>
> -hans
>
> > On Mar 28, 2020, at 4:50 PM, Colin Ross <rossi141@gmail.com> wrote:
> >
> > Hi All - just started to use Kafka. Just one thing driving me nuts. I
> want
> > to get logs of each time a publisher or subscriber connects. I am trying
> to
> > just get the IP that they connected from and the topic to which they
> > connected. I have managed to do this through enabling debug in the
> > kafka-authorizer, however, the number of logs are overwhelming as is the
> > update rate (looks like 2 per second per client).
> >
> > What I am actually trying to achieve is to understand where messages go,
> so
> > I would be more than happy to just see notifications when messages are
> > actually sent and actually taken off the queue.
> >
> > Is there a more efficient way of achieving my goal than turning on debug?
> >
> > Cheers
> > Rossi
>

Comments