Kafka Quick Start-Kafka Monitoring

Kafka Quick Start (7) - Kafka Monitoring

1. Kafka Monitoring Indicators

1. Kafka host monitoring indicators

Host monitoring monitors the performance of the node machine where the Kafka cluster Broker resides.Common host monitoring metrics include:
(1) Machine Load
(2) CPU utilization
(3) Memory usage, including Free Memory and Used Memory
(4) Disk I/O usage, including read and write usage networks
(5) I/O utilization
(6) Number of TCP connections
(7) Number of open files
(8) Use of inode

2. JVM Monitoring Indicators

The Kafka Broker process is a common Java process, so all the monitoring methods for the JVM can be used to monitor the Kafka Broker process.
(1) The frequency and duration of Full GC occurrence, which is used to assess the impact of Full GC on the Broker process.Long pauses cause the Broker side to throw various timeout exceptions.
(2) Active object size is an important basis for heap size setting and can help fine-grained tuning of heap size for each generation of JVM.
(3) Total number of application threads.Learn about the use of CPU s by the Broker process.
2019-07-30T09:13:03.809+0800: 552.982: [GC cleanup 827M->645M(1024M), 0.0019078 secs]
The Broker JVM process defaults to the G1 GC algorithm, and when the cleanup step ends, the size of the active object on the heap is reduced from 827 MB to 645 MB.Since Kafka version 0.9.0.0, the default GC collector is G1, while Full GC in G1 is executed by a single thread and is very slow.Therefore, the Broker GC log needs to be monitored as kafkaServer-Gc.logBeginning file.If you find that the Broker process has frequent Full GC, you can turn on the -XX:+PrintAdaptiveSizePolicy switch on G1 to let the JVM indicate who triggered the Full GC.

3. Cluster Monitoring Indicators

(1) Check whether the Broker process is started and whether the port is established.In a containerized Kafka environment, when using Docker to start Kafka Broker, the Docker container starts successfully, but if the network settings are incorrectly configured, there may be situations where the process has started but the port has not successfully established monitoring.
(2) View Broker-side critical logs.Broker-side Server LogServer.log, Controller LogController.logAnd theme partition status change log state-change.log.
(3) View the running status of key threads on the broker side.The Kafka Broker process starts dozens or even dozens of threads.In a real production environment, the Log Compaction thread starts with kafka-log-cleaner-thread s and is responsible for log Compaction; replica-pull threads, usually starting with ReplicaFetcherThread, perform the logic of pulling out messages from the Follower replica to the Leader replica.
(4) View the key JMX metrics at the Broker end.
BytesIn/BytesOut: The number of inbound and outbound bytes per second on the Broker side, which is prone to network packet loss if the value is close to network bandwidth.
NetworkProcessorAvgIdlePercent: This is the average percentage of idle threads in the network thread pool that typically needs to be guaranteed to be longer than 30%.If less than 30%, this indicates that the network thread pool is very busy and needs to be reduced by increasing the number of network threads or transferring load to other servers.
RequestHandlerAvgIdlePercent: The average idle percentage of I/O thread pool threads.If the long-term value is less than 30%, you need to adjust the number of I/O thread pools or reduce the load on the Broker side.
UnderReplicatedPartitions: The number of partitions that were not fully backed up.Not all Follower copies are synchronized with Leader copies.
ISRShrink/ISRExpand: The frequency of contraction and expansion of ISR.If replicas are frequently in and out of the ISR in a production environment, the value must be high.The reason why replicas frequently enter and leave the ISR needs to be diagnosed and appropriate measures taken.
ActiveController Count: The number of controllers that are currently active.Typically, the ActiveController Count indicator value on the Broker where the Controller resides is 1, and the value on other Brokers is 0.If the ActiveController Count value is found to be 1 on multiple brokers, indicating that there is a brain fissure in the Kafka cluster, it must be handled as soon as possible, mainly by looking at network connectivity.Schizophrenia is a very serious distributed problem. Kafka currently relies on ZooKeeper to prevent it. Once a schizophrenia occurs, Kafka cannot guarantee normal operation.
(5) Monitor the Kafka client.Round-Trip Time (RTT) between the client's machine and the Kafka Broker machine.For producers, threads that start with kafka-producer-network-thread s are responsible for sending actual messages. Once suspended, the Producer will not work properly, but the Producer process will not automatically suspend.For consumers, a heartbeat thread that starts with kafka-coordinator-heartbeat-thread s is about Rebalance.
From the Producer perspective, the JMX metrics that need attention are request-latency, which is the delay of message production requests, which most directly represents the TPS of the Producer program; from the Consumer perspective, records-lag and records-lead are two important JMX metrics.If you use Consumer Group, you need to focus on the join rate and sync rate indicators, which indicate how frequently Rebalance s are occurring.

2. JMX Monitoring Kafka

1. Introduction to JMX

JMX (Java Management Extensions) manages and monitors running Java programs for managing threads, memory, log levels, service restarts, system environments, and more.

2. Kafka Opens JMX

There are two ways to open a JMX port:
(1) Set JMX_when Kafka is startedPORT
export JMX_PORT=9999 kafka-server-start.sh -daemon config/server.properties
(2) Modify kafka-run-class.sh
In kafka-run-Class.shThe file begins with the following lines:
JMX_PORT=9999
Modify kafka-run-Class.shRestart the Kafka cluster after the file.
(3) JMX Opening of Kafka Docker Container Service
Docker-for Kafka Container ServiceCompose.ymlFile Import KAFKA_JMX_OPTS and JMX_PORT environment variable.

KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=192.168.0.105 -Dcom.sun.management.jmxremote.rmi.port=9999"
JMX_PORT: 9999

Expose the corresponding JMX port to the outside world.

ports:
      - "9999:9999" # Expose Port Number

3. JMX_PORT Occupancy Problem

When Kafka needs to monitor Broker and Topic data, it needs to turn on JMX_PORT, usually in script kafka-run-Class.shDefine JMX_insidePORT variable, but JMX_When the PORT definition is complete, the execution of the script tool in the bin directory will result in an error.The reason is that
Kafka-run-Class.shIs the called script, Java binds JMX_when called by other scriptsPORT, causing the port to be occupied.

The solution is to specify JMX_during Kafka startupPORT.
(1) The supervisor starts Kafka, adding environment=JMX_to the supervisor service startup profilePORT=9999.
(2) kafka-server-Start.shThe script starts Kafka, exporting JMX_at startupPORT=9999 or kafka-server-Start.shScript specified.
(3) Modify kafka-run-Class.shScript
Modify bin/Kafka-run-under the Kafka installation directoryClass.shFiles:

3. Kafka Monitoring Tools

1. JMXTool Tools

JMXTool is a tool for the Kafka community to view Kafka JMX metrics in real time.
kafka-run-class.sh kafka.tools.JmxTool
--attributes: Specifies the JMX attribute name to query, in comma-separated CSV format.
--date-format: Specify the log format to display
--jmx-url: Specify the JMX interface to connect to, default format isService:jmx: rmi:///jndi/rmi://:JMX port/jmxrmi.
--object-name: Specifies the JMX MBean name to query.
--reporting-interval: Specifies the time interval for real-time queries, defaulting to 2s.
The Broker side inbound traffic per second (BytesInPerSec) command for the past minute is queried once per second as follows:
kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes OneMinuteRate --reporting-interval 1000
The ActiveController JMX Indicator View command is as follows:
kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.controller:type=KafkaController,name=ActiveControllerCount --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --reporting-interval 1000

2,Kafka Manager

Kafka Manager is an open source Kafka monitoring framework developed in Scala language by Yahoo in 2015, primarily for managing and monitoring Kafka clusters.
Kafka Manager has now been renamed CMAK (Cluster Manager for Apache Kafka).
GitHub address:
https://github.com/yahoo/CMAK
Kafka Manager Docker mirror: Kafka manager/kafka-manager
If you need to set Kafka Manager basic security certification, you can set environment variables for Kafka Manager:

KAFKA_MANAGER_AUTH_ENABLED: "true"
KAFKA_MANAGER_USERNAME: username
KAFKA_MANAGER_PASSWORD: password

Kafka-Manager Service Deployment Docker-Compose.yml The files are as follows:

# Define kafka-manager service
kafka-manager-test:
  image: kafkamanager/kafka-manager # kafka-manager mirror
  restart: always
  container_name: kafka-manager-test
  hostname: kafka-manager-test
  ports:
    - "9000:9000"  # Expose ports, provide web access
  depends_on:
    - kafka-test # rely on
  environment:
    ZK_HOSTS: zookeeper-test:2181 # Host IP
    KAFKA_BROKERS: kafka-test:9090 # kafka
    KAFKA_MANAGER_AUTH_ENABLED: "true"
    KAFKA_MANAGER_USERNAME: admin
    KAFKA_MANAGER_PASSWORD: password

Start the Kafka Manager service and log in to the Kafka Manager Web.
Web address: http://127.0.0.1:9000

Add Kafka-Manager to manage the Kafka Broker node:

3,JMXTrans + InfluxDB + Grafana

Normally, monitoring frameworks can use a JMXTrans + InfluxDB + Grafana combination. Because Grafana supports monitoring JMX metrics, it is easy to integrate various JMX metrics from Kafka. For companies that have adopted the JMXTrans + InfluxDB + Grafana monitoring scheme, existing monitoring frameworks can be reused directly, which can greatly reduce maintenance costs.

4,Confluent Control Center

Control Center can monitor the Kafka cluster in real time and also help operate and build real-time streaming applications based on Kafka.Control Center is not free and must use Confluent Kafka Platform Enterprise.

5,jconsole

Jconsole (Java Monitoring and Management Console) is a JMX-based visual monitoring and management tool that provides monitoring of overviews, memory, threads, classes, VM profiles, MBean s.
Execute jsoncole on Linux Terminal and enter it in the remote process of the pop-up windowService:jmx: rmi:///jndi/Rmi://192.168.0.105: 9999/jmxrmi or 192.168.0.105:9999.

Select the MBeans tab,

IV. JMXTrans

1. Introduction to JMXTrans

JMXTrans is a data collector for collecting Java applications via JMX, which can be collected as long as the Java application opens the JMX port.
JMXTrans runs as a background deamon and collects data every 1 minute.
GitHub address: https://github.com/jmxtrans/jmxtrans
JMXTrans Docker Container Mirror Download:
docker pull jmxtrans/jmxtrans

2. JMXTrans Profile

By default, JMXTrans reads all data source profiles (json format files) in the / var/lib/jmxtrans directory, retrieves data from the data source in real time, parses the data, and stores it in InfluxDB.
The JMXTrans configuration JSON file is as follows:

{
   "servers": [{
      "port": "9901",
      "host": "192.168.0.105",
      "queries": [{
         "obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
         "attr": ["MeanRate", "OneMinuteRate", "FiveMinuteRate", "FifteenMinuteRate"],
         "resultAlias": "kafkaServer",
         "outputWriters": [{
            "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
            "url": "http://192.168.0.105:8086/",
            "username": "admin",
            "password": "123456",
            "database": "jmx",
            "tags": {
               "application": "kafka_server"
            }
         }]
      }]
   }]
}
servers: Array, data source configuration.
Port: String, the port to receive json data for jmx.
host: string, IP address to receive json data for jmx.
queries: Array of specific monitoring indicator items, listing multiple indicator items in JSON format, monitoring indicators can be obtained through the jconsole tool (JDK's own tool).
obj: String, the name of the monitoring indicator.
attr: Array, the index item field to be stored, is the field name of the data target table.
resultAlias: String, table name in InfluxDB.
outputWriters: Array, data destination.
@class:string, the class of the data destination.
url: String, url of the data destination (InfluxDb).
username: String, InfluxDB login name.
Password: String, InfluxDB login password.
Database: string, InfluxDB database name (pre-created).
Tags:jsonTo avoid duplicate names of the corresponding fields in the InfluxDbB table.

3. Kafka JMX Monitoring Indicators

Kafka's JMX monitoring indicators can be obtained through jconsole.
For BytesInPerSec monitoring metrics, find BytesInPerSe on jconsole's MBAs Options page.

The value of ObjectName is the value of the monitoring indicator obj.
The attribute of ObjectName is the indicator value corresponding to "attr", and one or more can be selected.
The metric name is the indicator value for resultAlias and MEASUREMENTS in InfluxDB.
"tags" correspond to the tag function of InfluxDB and are used to distinguish different monitoring indicators stored in the same MEASUREMENTS.

{      
   "obj":"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
         "attr":[ "Count", "EventType","RateUnit","OneMinuteRate" ],
         "resultAlias":"BytesInPerSec",
         "outputWriters": [{
      "@class" :   "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
              "url" :   "http://192.168.0.105:8086/",
              "username" :   "admin",
              "password" :   "123456",
              "database" :   "jmx",
              "tags"     :  {
         "application" :   "BytesInPerSec"
      }
   } ]
}

For global monitoring, each monitoring indicator corresponds to one MEASUREMENTS for InfluxDB, all Kafka nodes write the same MEASUREMENTS for the same monitoring indicator data, and for Topic's monitoring indicator, all Kafka nodes of the same Topic write the same MEASUREMENTS, named after Topic.

{
  "servers" : [ {
    "port" : "9999",
    "host" : "192.168.0.105",
    "queries" : [ {
      "obj" : "java.lang:type=Memory",
      "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
      "resultAlias":"jvmMemory",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"kafkaServer",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions",
      "attr" : [ "Value" ],
      "resultAlias":"underReplicated",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.controller:type=KafkaController,name=ActiveControllerCount",
      "attr" : [ "Value" ],
      "resultAlias":"activeController",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "java.lang:type=OperatingSystem",
      "attr" : [ "FreePhysicalMemorySize","SystemCpuLoad","ProcessCpuLoad","SystemLoadAverage" ],
      "resultAlias":"jvmMemory",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    } ,{
      "obj" : "kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent",
      "attr" : [ "Value" ],
      "resultAlias":"network",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
      "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
      "resultAlias":"network",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    },{
      "obj" : "java.lang:type=GarbageCollector,name=G1 Young Generation",
      "attr" : [ "CollectionCount","CollectionTime" ],
      "resultAlias":"gc",
      "outputWriters" : [ {
        "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
        "url" : "http://192.168.0.105:8086/",
        "username" : "admin",
        "password" : "123456",
        "database" : "jmx",
        "tags"     : {"application" : "kafka_server"}
      } ]
    }]
  } ]
}

4. JMXTrans Deployment

JMX is connected over a network, so there are two deployment scenarios for JMXtrans:
(1) Centralized.Deploy JMXtrans on one server, connect to all Kafka Broker instances, and write data to InfluxDB.To reduce network transmission, InfluxDB is usually deployed on the server on which it resides.
(2) Distributed.Each Kafka Broker instance deploys a JMXtrans.
The JMXTrans profile is divided into global indicators (each Kafka node) and Topic indicators. The global indicators are each configuration file for each node, and the naming rules are: kafka-brokerxx.json, Topic Indicator is a configuration file per Topic, naming rules:TopicName.json.

5. Examples of Kafka monitoring scheme

1. Kafka Monitoring Architecture Scheme Selection

Monitoring system architecture is usually divided into three parts: data collection, analysis and conversion, data display (visualization).
(1) Data collection
Data collection usually starts with the development of data collection programs, then uses monitoring software such as Nagios, Zabbix to schedule execution and report the collected data.For Java programs, you can use JMXTrans to collect data.
(2) Analysis and Conversion
Kafka is a Java application that provides comprehensive performance metrics data. The histogram, number of times, maximum and minimum, and standard deviation of the metrics have been calculated, so there is no need to analyze and process the data, and MBeans data is stored directly in InfluxDB.
(3) Data visualization
Grafana is an open source visualization panel (Dashboard) that supports Graphite, Zabbix, InfluxDB, Prometheus, and OpenTSDB as data sources.

2. InfluxDB deployment

InfluxDB is an open source distributed database of time series, events, and metrics written in the Go language. It does not require external dependency and is mainly used to store large amounts of time stamp data, such as DevOps monitoring data, APP metrics, lOT sensor data, and real-time analysis data.
docker pull influxdb
Influxdb.ymlFiles:

version: '2'
services:
  influxdb:
    image: influxdb
    container_name: influxdb
    volumes:
      - /data/influxdb/conf:/etc/influxdb
      - /data/influxdb/data:/var/lib/influxdb/data
      - /data/influxdb/meta:/var/lib/influxdb/meta
      - /data/influxdb/wal:/var/lib/influxdb/wal
    ports:
      - "8086:8086"
    restart: always

Results View:
docker exec -it influxdb influx

3. JMXTrans Deployment

JMXTrans is a data collector for collecting Java applications via JMX, which can be collected as long as the Java application opens the JMX port.
docker pull jmxtrans/jmxtrans
By default, JMXTrans reads all data source profiles (json format files) in the / var/lib/jmxtrans directory, retrieves data from the data source in real time, parses the data, and stores it in InfluxDB.

version: '2'
services:
  # JMXTrans Service
  jmxtrans:
    image: jmxtrans/jmxtrans
    container_name: jmxtrans
    volumes:
      - ./jmxtrans:/var/lib/jmxtrans

4. Grafana Deployment

Grafana is a visualization panel (Dashboard) with beautiful charts and layout, a full-featured measurement dashboard and graphic editor that supports Graphite, zabbix, InfluxDB, Prometheus, and OpenTSDB as data sources.
The main features of Grafana are as follows:
(1) Display method: Quick and flexible client charts, panel plug-ins have many different ways of visualization indicators and logs, and there are rich dashboard plug-ins in the official library, such as hot charts, line charts, charts and so on.
(2) Data sources: Graphite, InfluxDB, OpenTSDB, Prometheus, Elasticsearch, CloudWatch, KairosDB, etc.
(3) Notification Reminder: Visually define the alert rules for the most important indicators, Grafana will continuously calculate and send notifications, and be notified by Slack, PagerDuty, and so on when the data reaches the threshold.
(4) Mixed display: Mixing different data sources in the same chart allows you to specify a data source based on each query, or even customize the data source.
(5) Notes: Hovering over an event using a rich event annotation chart from different data sources displays the complete event metadata and tags.
(6) Filters: Ad-hoc filters allow dynamic creation of new key/value filters that are automatically applied to all queries using the data source.
GitHub address: https://github.com/grafana/grafana
Grafana Container Mirror Download:
docker pull grafana/grafana:6.5.0
Grafana container starts:
docker run -d --name=grafana -p 3000:3000 grafana/grafana:6.5.0
Web login: 192.168.0.105:3000

The first login uses admin/admin by default, which forces password changes.
Add data sources:

Import the DashBoard template:

The DashBoard template json file is as follows:

{
  "__inputs": [
    {
      "name": "DS_KAFKAMONITOR",
      "label": "KafkaMonitor",
      "description": "",
      "type": "datasource",
      "pluginId": "influxdb",
      "pluginName": "InfluxDB"
    }
  ],
  "__requires": [
    {
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "6.7.3"
    },
    {
      "type": "panel",
      "id": "graph",
      "name": "Graph",
      "version": ""
    },
    {
      "type": "datasource",
      "id": "influxdb",
      "name": "InfluxDB",
      "version": "1.0.0"
    }
  ],
  "annotations": {
    "list": [
      {
        "$$hashKey": "object:318",
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "java.lang:type=OperatingSystem",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 6,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "ProcessCpuLoad"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "process CPU Usage"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka process CPU Usage",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:1134",
          "format": "percentunit",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:1135",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "The server CPU Usage",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 8,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 2,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "SystemCpuLoad"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "CPU Usage"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "CPU Usage",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:369",
          "format": "percentunit",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:370",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "java.lang:type=OperatingSystem\nLinux System Load",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 16,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 4,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": true,
        "min": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "SystemLoadAverage"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "System Load"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "System Load",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:656",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:657",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "Kafka each broker The amount of data per second, including__consumer_offsets topic",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 0,
        "y": 12
      },
      "hiddenSeries": false,
      "id": 34,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            }
          ],
          "hide": false,
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "D",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Average per second"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
            }
          ]
        },
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            }
          ],
          "hide": false,
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "sum"
              },
              {
                "params": [
                  "All broker Average per second"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Topic Data volume per second",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:2118",
          "format": "none",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:2119",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "java.lang:type=OperatingSystem\n Server Available Physical Memory",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 8,
        "y": 12
      },
      "hiddenSeries": false,
      "id": 32,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "FreePhysicalMemorySize"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "System Remaining Physical Memory"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "free physical memory",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:2324",
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:2325",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "cacheTimeout": null,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "kafka.controller:type=KafkaController,name=ActiveControllerCount\n\nKafka Number of controllers, only one machine per cluster is 1, 1 machine is Kafka Controller Crontroller",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 12,
        "w": 8,
        "x": 16,
        "y": 12
      },
      "hiddenSeries": false,
      "id": 26,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pluginVersion": "6.7.3",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            }
          ],
          "measurement": "activeController",
          "orderByTime": "ASC",
          "policy": "default",
          "query": "SELECT sum(\"Value\") AS \"Get Number of Controllers\" FROM \"activeController\" WHERE $timeFilter GROUP BY time($__interval), \"hostname\"",
          "rawQuery": false,
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "Value"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Get Number of Controllers"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [],
          "tz": ""
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Number of controllers",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:4446",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:4447",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "Monitor kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec index",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 0,
        "y": 24
      },
      "hiddenSeries": false,
      "id": 16,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "FiveMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "mean"
              },
              {
                "params": [
                  "Bytes pulled per second"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=BytesOutPerSec"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Pull traffic per second",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:77",
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:78",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "Monitor kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec index",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 8,
        "y": 24
      },
      "hiddenSeries": false,
      "id": 14,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "F",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Average number of bytes entered per second"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=BytesInPerSec"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Entry traffic per second",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:77",
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:78",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "Monitor kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec and kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec index",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 16,
        "y": 24
      },
      "hiddenSeries": false,
      "id": 20,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Per second Fetch(Obtain)Number of requests for"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec"
            }
          ]
        },
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "D",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "MeanRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Per second Producer Number of requests sent"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Number of requests per second for production and consumption",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:77",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:78",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "java.lang:type=Memory",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 0,
        "y": 33
      },
      "hiddenSeries": false,
      "id": 8,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "E",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "HeapMemoryUsage_used"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Heap memory usage"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Use heap memory",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:1850",
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:1851",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "java.lang:type=Memory",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 8,
        "y": 33
      },
      "hiddenSeries": false,
      "id": 30,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "jvmMemory",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "E",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "NonHeapMemoryUsage_used"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "External memory usage"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Using out-of-heap memory",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:1850",
          "format": "decbytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:1851",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions\n Not zero means that some copies cannot keep up with them leader",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 16,
        "y": 33
      },
      "hiddenSeries": false,
      "id": 24,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pluginVersion": "6.7.3",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "underReplicated",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "Value"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Number of partitions not fully backed up"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Monitoring the number of partitions that are not fully backed up",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:11235",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:11236",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "cacheTimeout": null,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 0,
        "y": 46
      },
      "hiddenSeries": false,
      "id": 12,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pluginVersion": "6.7.3",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "5m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "network",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "Value"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "mean"
              },
              {
                "params": [
                  "Network Thread Pool Idle Ratio"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": []
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Average idle percentage of network thread pool threads",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:13734",
          "format": "percentunit",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:13735",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "cacheTimeout": null,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 8,
        "y": 46
      },
      "hiddenSeries": false,
      "id": 22,
      "legend": {
        "alignAsTable": true,
        "avg": false,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pluginVersion": "6.7.3",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            }
          ],
          "measurement": "network",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "A",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "IO Idle ratio"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": " I/O Average idle ratio of thread pool threads",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:13517",
          "format": "percentunit",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:13518",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_KAFKAMONITOR}",
      "description": "Monitor kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec and kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec index",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 13,
        "w": 8,
        "x": 16,
        "y": 46
      },
      "hiddenSeries": false,
      "id": 18,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "dataLinks": []
      },
      "percentage": false,
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "H",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "OneMinuteRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Per second Fetch(Obtain)Exceptional Request"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec"
            }
          ]
        },
        {
          "alias": "",
          "groupBy": [
            {
              "params": [
                "1m"
              ],
              "type": "time"
            },
            {
              "params": [
                "hostname"
              ],
              "type": "tag"
            },
            {
              "params": [
                "null"
              ],
              "type": "fill"
            }
          ],
          "measurement": "kafkaServer",
          "orderByTime": "ASC",
          "policy": "default",
          "refId": "J",
          "resultFormat": "time_series",
          "select": [
            [
              {
                "params": [
                  "MeanRate"
                ],
                "type": "field"
              },
              {
                "params": [],
                "type": "last"
              },
              {
                "params": [
                  "Per second Producer Exceptional Request"
                ],
                "type": "alias"
              }
            ]
          ],
          "tags": [
            {
              "key": "typeName",
              "operator": "=",
              "value": "type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec"
            }
          ]
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Kafka Number of failed production and consumption requests",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "$$hashKey": "object:77",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "$$hashKey": "object:78",
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": false,
  "schemaVersion": 22,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ]
  },
  "timezone": "",
  "title": "Kafka Cluster Monitoring Template",
  "uid": "PkULDneZkALL",
  "variables": {
    "list": []
  },
  "version": 27
}

5. docker-Compose.ymlfile

Integrate InfluxDB, JMXTrans, Grafana deployments for deployment using Docker-Compose, create KafkaMonitor directory, create influxdb directory, jmxtrans directory and docker-Compose.ymlFile, willJmxtrans.jsonPlace the file in the jmxtrans directory.
Docker-Compose.ymlThe files are as follows:

version: '2'
services:
  # JMXTrans Service
  jmxtrans:
    image: jmxtrans/jmxtrans
    container_name: jmxtrans
    volumes:
      - ./jmxtrans:/var/lib/jmxtrans
  # InfluxDB Service
  influxdb:
    image: influxdb
    container_name: influxdb
    volumes:
      - ./influxdb/conf:/etc/influxdb
      - ./influxdb/data:/var/lib/influxdb/data
      - ./influxdb/meta:/var/lib/influxdb/meta
      - ./influxdb/wal:/var/lib/influxdb/wal
    ports:
      - "8086:8086" # Expose ports, provide Grafana access
    restart: always
  # Grafana Service
  grafana:
    image: grafana/grafana:6.5.0  #Higher versions may have bug s
    container_name: grafana
    ports:
      - "3000:3000"  # Expose ports, provide web access

Start the Monitoring Framework service:
docker-compose -f docker-compose.yml up -d
You need the Web to log in to the Grafana service and configure the appropriate data sources and templates.

6. Monitoring and Viewing

Tags: Big Data kafka InfluxDB Database Java

Posted on Mon, 25 May 2020 10:37:33 -0700 by meckr