How to Monitor Kafka

Distributed systems and microservices are all the rage these days, and Apache Kafka seems to be getting most of that attention.

Here at Server Density we use it as part of our payloads processing (see: Tech chat: processing billions of events a day with Kafka, Zookeeper and Storm). Last week we looked at How to Monitor Zookeeper. Today we’ll look at how to monitor Kafka.

Introduction and terminology

For the uninitiated, Kafka is a Scala project—originally developed by LinkedIn—that provides a publish-subscribe messaging between distributed nodes.

Kafka is fast: a single node, can handle hundreds of read/writes from thousands of clients in real time. Kafka is distributed and scalable. It creates and takes down nodes in an elastic manner and without any downtime. Data streams are split into partitions and spread over different brokers for capability and redundancy.

Before we delve deeper, here’s some useful terminology:

Topic: a feeds of messages or packages

Partition: group of topics split for scalability and redundancy

Producer: process that introduces messages into the queue

Consumer: process that subscribes to various topics and processes from a feed of published messages

Broker: a node that is part of the Kafka cluster

Here is a diagram of a Kafka cluster alongside the required Zookeeper ensemble: 3 Kafka brokers plus 3 Zookeeper servers (2n+1 redundancy) with 6 producers writing in 2 partitions for redundancy.

How-to-monitor-Kafka-diagram2

With that in mind, here is our very own checklist of best practices, including key Kafka metrics and alerts we monitor with Server Density.

Monitor Kafka: Metrics and Alerts

Once again, our general rule of thumb is “collect all possible/reasonable metrics that can help when troubleshooting, alert only on those that require an action from you”.

Kafka process is running

Metric	Comments	Suggested Alert
Kafka process	Is the right binary daemon process running?	When a process list contains the regexp /usr/bin/javakafka.Kafka$.

You can also use:

$INSTALL_PREFIX/bin/kafka-server-start.sh config/server.properties

Or if you run Zookeeper via supervisord (recommended) you can alert the supervisord resource instead.

System Metrics

Metric	Meaning / Comments	Suggested Alert
Memory usage	Kafka should run entirely on RAM. JVM heap size shouldn’t be bigger than your available RAM. That is to avoid swapping.	None
Swap usage	Watch for swap usage, as it will degrade performance on Kafka and lead to operations timing out (set vm.swappiness = 0).	When used swap is > 128MB.
Network bandwidth	Kafka servers can incur a high network usage. Keep an eye on this, especially if you notice any performance degradation. Also look out for dropped packet errors.	None
Disk usage	Make sure you always have free space for new data, temporary files, snapshot or backups.	When disk is > 85% usage.
Disk IO	Kafka partitions are stored asynchronously as a sequential write ahead log. Thus, disk reads and writes in Kafka are sequential, with very few random seeks.	None

Here is how Server Density graphs some Kafka metrics:

How-to-monitor-kafka

Kafka Metrics

Metric	Meaning / Comments	Suggested Alert
UnderReplicatedPartitions	kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions Number of under-replicated partitions.	When UnderReplicatedPartitions > 0.
OfflinePartitionsCount	kafka.controller:type=KafkaController,name=OfflinePartitionsCount Number of partitions without an active leader, therefore not readable nor writeable.	When OfflinePartitionsCount > 0.
ActiveControllerCount	kafka.controller:type=KafkaController,name=ActiveControllerCount Number of active controller brokers.	When ActiveControllerCount != 1.
MessagesInPerSec	kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec Incoming messages per second.	None
BytesInPerSec / BytesOutPerSec	kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec Incoming/outgoing bytes per second.	None
RequestsPerSec	kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce\|FetchConsumer\|FetchFollower} Number of requests per second.	None
TotalTimeMs	kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce\|FetchConsumer\|FetchFollower} Total time it takes to process a request, you can also monitor split times for QueueTimeMs, LocalTimeMs, RemoteTimeMs and RemoteTimeMs.	None
UncleanLeaderElectionsPerSec	kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs Number of disputed leader elections rate.	When UncleanLeaderElectionsPerSec != 0.
LogFlushRateAndTimeMs	kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs Asynchronous disk log flush and time in ms.	None
UncleanLeaderElectionsPerSec	kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec Unclean leader election rate.	When UncleanLeaderElectionsPerSec != 0.
PartitionCount	kafka.server:type=ReplicaManager,name=PartitionCount Number of partitions on your system.	When PartitionCount != your_num_partitions.
ISR shrink/expansion rate	kafka.server:type=ReplicaManager,name=IsrShrinksPerSec kafka.server:type=ReplicaManager,name=IsrExpandsPerSec When a broker goes down ISR will shrink for some of the partitions. When that broker is up again, ISR will be expanded once the replicas are fully caught up.	IsrShrinksPerSec \| IsrExpandsPerSec != 0.
NetworkProcessorAvgIdlePercent	kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent The average fraction of time the network processors are idle.	When NetworkProcessorAvgIdlePercent < 0.3.
RequestHandlerAvgIdlePercent	kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent The average fraction of time the request handler threads are idle.	When RequestHandlerAvgIdlePercent < 0.3.
Heap Memory Usage	Memory allocated dynamically by the Java process, Zookeeper in this case.	None

Kafka Consumer Metrics

Metric	Meaning / Comments	Suggested Alert
MaxLag	kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+) Number of messages the consumer lags behind the producer.	When MaxLag > 50.
MinFetchRate	kafka.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=([-.\w]+) Minimum rate at which consumer sends requests to the Kafka broker. If stalled or dead, this would drop to 0.	When MinFetchRate < 0.5.
MessagesPerSec	kafka.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=([-.\w]+) Messages consumed per second.	None
BytesPerSec	kafka.consumer:type=ConsumerTopicMetrics,name=BytesPerSec,clientId=([-.\w]+) Byes consumed per second.	None
KafkaCommitsPerSec	kafka.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=([-.\w]+) Rate at which consumer commits offsets to Kafka, only when using offsets.storage=kafka, Kafka >= 0.8.2 required.	None
OwnedPartitionsCount	kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=([-.\w]+),groupId=([-.\w]+) Number of partitions owned by this consumer.	When OwnedPartitionsCount != your_count. (might require dynamic adjust if you change your cluster)

Kafka Monitoring Tools

Any monitoring tools with JMX support should be able to monitor a Kafka cluster. Here are 3 monitoring tools outside that category that we liked:

First one is check_kafka.pl from Hari Sekhon. It performs a complete end to end test, inserting a message in Kafka as a producer and then extracting it as a consumer. This helps when measuring service times.

Another handful tool is KafkaOffsetMonitor. This one allows to monitor consumer offset (their position) in the queue. That information should help you understand how your queue grows and which consumers groups are lagging behind.

Last but not least, the Linkedin folks have developed what we think is the smartest tool out there: Burrow. It analyzes consumer offsets and lags over a window of time and determines the consumer status. You can retrieve this status over their HTTP endpoint and plug it into your favourite monitoring tool (Server Density for example).

Oh, and we’d go amiss if we didn’t mention Yahoo’s Kafka-Manager. This one includes some basic monitoring but is more of a management tool.

How to Monitor Kafka

Introduction and terminology

Monitor Kafka: Metrics and Alerts

Kafka process is running

System Metrics

Kafka Metrics

Kafka Consumer Metrics

Kafka Monitoring Tools

Further reading

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List