050. Cluster management - Prometheus+Grafana monitoring scheme

I. Prometheus overview

1.1 introduction to Prometheus

Prometheus is an open source monitoring system developed by SoundCloud company. It is the second graduation project of CNCF after Kubernetes. It has been widely used in the field of containers and microservices. The main features of Prometheus are as follows:
  • Multidimensional data model identified by indicator name and key value pair.
  • Adopt the flexible query language PromQL.
  • It does not rely on distributed storage and serves for autonomous single node.
  • Use HTTP to pull monitoring data.
  • Support to push timing data through the gateway.
  • Supports multiple graphics and Dashboard displays, such as Grafana.
Prometheus ecosystem is composed of various components for function expansion.
  • Prometheus Server: responsible for monitoring data collection and sequential data storage, and providing data query function.
  • CLIENT SDK: the development kit for docking Prometheus.
  • Push Gateway: the gateway component that pushes data.
  • Third party Exporter: various external indicator collection systems, whose data can be collected by Prometheus.
  • AlertManager: alert manager.
  • Other support AIDS.
The main functions of Prometheus Server, the core component of Prometheus, include:
  • Obtain the resource or service information to be monitored from Kubernetes Master;
  • Grab (Pull) indicator data from various exporters, and then save the indicator data in the time series database (TSDB);
  • Provide HTTP API to other systems for query;
  • Provide data query based on PromQL language;
  • You can Push alarm data to the alert manager, and so on.

1.2 Prometheus component architecture

Prometheus receives indicator data directly from jobs or passively through the Pushgateway gateway in the middle, stores all the acquired indicator data locally, and arranges these data with some rules to generate some aggregate data or alarm information, and then visualizes these data through Grafana or other tools.
The workflow is as follows:
  1. The Prometheus server obtains the measurement data from the configured jobs or exporters on a regular basis, or receives the measurement data sent from the push gateway.
  2. The Prometheus server stores the collected measurement data locally and aggregates the data;
  3. Run the defined alert.rules to record a new time series or push an alert to the alert manager.
  4. The alarm manager processes the received alarm according to the configuration file, and sends the alarm through email and other ways.
  5. Grafana and other graphical tools get monitoring data and display it in a graphical way.

1.3 Prometheus monitoring granularity

Prometheus, as a monitoring system, mainly realizes monitoring at the following levels:
  • Infrastructure layer: monitor all host server resources (including Kubernetes' Node and non Kubernetes' Node), such as CPU, memory, network throughput and bandwidth usage, disk I/O and disk usage.
  • Middleware layer: monitor middleware independently deployed outside Kubernetes cluster, such as MySQL, Redis, RabbitMQ, ElasticSearch, Nginx, etc.
  • Kubernetes cluster: key indicators of monitoring kubernetes cluster itself
  • Applications deployed on Kubernetes cluster: monitoring applications deployed on Kubernetes cluster

II. Relevant concepts of Prometheus

2.1 data model

Prometheus basically stores all data as a time series: a flow of timestamp values that belong to the same metric and the same set of dimensions. In addition to the stored time series, Prometheus may generate a temporary derived time series as the result of the query.
  • Measure name and label
Each time series is uniquely identified by a metric name and a set of key value pairs (also known as labels). The metric name specifies the characteristics of the system being measured (for example, HTTP requests total - the total number of HTTP requests received). It can contain ASCII letters and numbers, as well as underscores and colons. It must match the regular expression [a-za-z] [a-za-z0-9] *.
Label enables the dimension data model of Prometheus: for the same metric name, any given combination of labels identifies a specific dimension instance of that metric. Query languages allow filtering and aggregation based on these dimensions. Changing any label value, including adding or removing labels, creates a new time series. Label names may contain ASCII letters, numbers, and underscores. They must match the regular expression [a-za-z] [a-za-z0-9] *. Label names that begin with uuu are reserved for internal use.
  • sample
The actual time series, each of which includes a value of float64 and a time stamp in milliseconds.
  • format
Given a metric name and a set of labels, time series are usually identified in the following format: < metric name > {< label name > = < label value >,...}
For example, if the measurement name of a time series is API ﹣ http ﹣ requests ﹣ total, and the tags method = "POST" and handler = "/ messages", then the tag is:
api_http_requests_total{method="POST", handler="/messages"}

2.2 measurement type

The Prometheus client library mainly provides Counter, Gauge, Histogram and Summery four main metric types:
  • Counter (calculator)
Counter is an additive measure whose value can only be increased or reset to zero on reboot. For example, you can use counters to represent the number of requests provided, the number of tasks completed, or errors. Do not use counters to express reducible values. For example, instead of using counter to count the number of processes currently running, use Gauge.
  • Gauge (measurement)
Gauge represents a single value, a measure that can rise and fall arbitrarily. Gauge is often used to measure values such as temperature or current memory usage, but can also express "counts" of up and down, such as the number of goroutines running.
  • Histogram (histogram)
Histogram samples observe (for example, request duration or response size) and include them in configured buckets. It also provides the sum of all observations. Histogram with < basename > basic metric name displays multiple time series during data acquisition:
    • Observe the cumulative counter of the bucket, and expose it as < basename > ﹐ bucket {Le = "< upper inclusive bound >"}
    • Sum of all observations, exposed to < basename > < sum
    • Count of observed events, exposed as < basename > ﹣ count (equivalent to < basename > ﹣ bucket {Le = "+ Inf"})
  • Summery: similar to Histogram, summery sample observation (usually request duration and response size). Although it also provides the sum of the total number of observations and all observations, it calculates the configurable quantiles within the sliding time window. During data retrieval, summery with the < basename > base metric name displays multiple time series:
    • The observed event of flow φ quantile (0 ≤ φ ≤ 1) was exposed as < basename > {quantity = "< φ >"}
    • Sum of all observations, exposed to < basename > < sum
    • Count of events that have been observed, exposed as < basename >

2.3 work and examples

In Prometheus, the endpoint that can get data is called instance, which usually corresponds to a single process. A collection of instances with the same purpose, such as processes replicated for scalability or reliability, is called a job.

2.4 labels and time series

When Prometheus acquires the target, it will automatically attach some labels to the acquired time series to identify the acquired target:
  • Job: the name of the configuration job to which the target belongs.
  • instance: < host >: < port > the part of the target URL to be crawled.
If any of these tags already exist in the captured data, the behavior depends on the honor? Labels configuration option. For each instance fetch, Prometheus stores a sample in the following time series:
  • up{job = "< job name >", instance = "< instance ID >"}: 1 if the instance is healthy, it can be reached; or 0 fetching fails.
  • Scratch & duration & seconds {job = "< job name >", instance = "< instance ID >"}: the duration of grabbing.
  • Scratch ﹐ samples ﹐ post ﹐ metric ﹐ relabeling {job = "< job name >", instance = "< instance ID >"}: the number of samples remaining after the application of the metric re marking.
  • Scrape ﹐ samples ﹐ scraped {job = "< job name >", instance = "< instance ID >"}: number of samples exposed by the target.
The up time series is the monitoring of instance availability.

III. Prometheus deployment

3.1 create namespace

[root@k8smaster01 study]# vi monitor-namespace.yaml
  1 apiVersion: v1
  2 kind: Namespace
  3 metadata:
  4   name: monitoring
  5 
[root@k8smaster01 study]# kubectl create -f monitor-namespace.yaml

3.2 obtaining deployment files

[root@k8smaster01 study]# git clone https://github.com/prometheus/prometheus

3.3 create RBAC

[root@k8smaster01 ~]# cd prometheus/documentation/examples/
[root@k8smaster01 examples]# vi rbac-setup.yml
  1 apiVersion: rbac.authorization.k8s.io/v1beta1
  2 kind: ClusterRole
  3 metadata:
  4   name: prometheus
  5 rules:
  6 - apiGroups: [""]
  7   resources:
  8   - nodes
  9   - nodes/proxy
 10   - services
 11   - endpoints
 12   - pods
 13   verbs: ["get", "list", "watch"]
 14 - apiGroups:
 15   - extensions
 16   resources:
 17   - ingresses
 18   verbs: ["get", "list", "watch"]
 19 - nonResourceURLs: ["/metrics"]
 20   verbs: ["get"]
 21 ---
 22 apiVersion: v1
 23 kind: ServiceAccount
 24 metadata:
 25   name: prometheus
 26   namespace: monitoring               #Modify namespace
 27 ---
 28 apiVersion: rbac.authorization.k8s.io/v1beta1
 29 kind: ClusterRoleBinding
 30 metadata:
 31   name: prometheus
 32 roleRef:
 33   apiGroup: rbac.authorization.k8s.io
 34   kind: ClusterRole
 35   name: prometheus
 36 subjects:
 37 - kind: ServiceAccount
 38   name: prometheus
 39   namespace: monitoring               #Modify namespace
 40 
[root@k8smaster01 examples]# kubectl create -f rbac-setup.yml

3.4 create Prometheus ConfigMap

[root@k8smaster01 examples]# cat prometheus-kubernetes.yml | grep -v ^$ | grep -v "#" >> prometheus-config.yaml
[root@k8smaster01 examples]# vi prometheus-config.yaml
  1 apiVersion: v1
  2 kind: ConfigMap
  3 metadata:
  4   name: prometheus-server-conf
  5   labels:
  6     name: prometheus-server-conf
  7   namespace: monitoring               #Modify namespace
  8 data:
  9   prometheus.yml: |-
 10     global:
 11       scrape_interval: 10s
 12       evaluation_interval: 10s
 13 
 14     scrape_configs:
 15       - job_name: 'kubernetes-apiservers'
 16         kubernetes_sd_configs:
 17         - role: endpoints
 18         scheme: https
 19         tls_config:
 20           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 21         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 22         relabel_configs:
 23         - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
 24           action: keep
 25           regex: default;kubernetes;https
 26 
 27       - job_name: 'kubernetes-nodes'
 28         scheme: https
 29         tls_config:
 30           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 31         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 32         kubernetes_sd_configs:
 33         - role: node
 34         relabel_configs:
 35         - action: labelmap
 36           regex: __meta_kubernetes_node_label_(.+)
 37         - target_label: __address__
 38           replacement: kubernetes.default.svc:443
 39         - source_labels: [__meta_kubernetes_node_name]
 40           regex: (.+)
 41           target_label: __metrics_path__
 42           replacement: /api/v1/nodes/${1}/proxy/metrics
 43 
 44       - job_name: 'kubernetes-cadvisor'
 45         scheme: https
 46         tls_config:
 47           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
 48         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
 49         kubernetes_sd_configs:
 50         - role: node
 51         relabel_configs:
 52         - action: labelmap
 53           regex: __meta_kubernetes_node_label_(.+)
 54         - target_label: __address__
 55           replacement: kubernetes.default.svc:443
 56         - source_labels: [__meta_kubernetes_node_name]
 57           regex: (.+)
 58           target_label: __metrics_path__
 59           replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
 60 
 61       - job_name: 'kubernetes-service-endpoints'
 62         kubernetes_sd_configs:
 63         - role: endpoints
 64         relabel_configs:
 65         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
 66           action: keep
 67           regex: true
 68         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
 69           action: replace
 70           target_label: __scheme__
 71           regex: (https?)
 72         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
 73           action: replace
 74           target_label: __metrics_path__
 75           regex: (.+)
 76         - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
 77           action: replace
 78           target_label: __address__
 79           regex: ([^:]+)(?::\d+)?;(\d+)
 80           replacement: $1:$2
 81         - action: labelmap
 82           regex: __meta_kubernetes_service_label_(.+)
 83         - source_labels: [__meta_kubernetes_namespace]
 84           action: replace
 85           target_label: kubernetes_namespace
 86         - source_labels: [__meta_kubernetes_service_name]
 87           action: replace
 88           target_label: kubernetes_name
 89 
 90       - job_name: 'kubernetes-services'
 91         metrics_path: /probe
 92         params:
 93           module: [http_2xx]
 94         kubernetes_sd_configs:
 95         - role: service
 96         relabel_configs:
 97         - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
 98           action: keep
 99           regex: true
100         - source_labels: [__address__]
101           target_label: __param_target
102         - target_label: __address__
103           replacement: blackbox-exporter.example.com:9115
104         - source_labels: [__param_target]
105           target_label: instance
106         - action: labelmap
107           regex: __meta_kubernetes_service_label_(.+)
108         - source_labels: [__meta_kubernetes_namespace]
109           target_label: kubernetes_namespace
110         - source_labels: [__meta_kubernetes_service_name]
111           target_label: kubernetes_name
112 
113       - job_name: 'kubernetes-ingresses'
114         kubernetes_sd_configs:
115         - role: ingress
116         relabel_configs:
117         - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
118           action: keep
119           regex: true
120         - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
121           regex: (.+);(.+);(.+)
122           replacement: ${1}://${2}${3}
123           target_label: __param_target
124         - target_label: __address__
125           replacement: blackbox-exporter.example.com:9115
126         - source_labels: [__param_target]
127           target_label: instance
128         - action: labelmap
129           regex: __meta_kubernetes_ingress_label_(.+)
130         - source_labels: [__meta_kubernetes_namespace]
131           target_label: kubernetes_namespace
132         - source_labels: [__meta_kubernetes_ingress_name]
133           target_label: kubernetes_name
134 
135       - job_name: 'kubernetes-pods'
136         kubernetes_sd_configs:
137         - role: pod
138         relabel_configs:
139         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
140           action: keep
141           regex: true
142         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
143           action: replace
144           target_label: __metrics_path__
145           regex: (.+)
146         - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
147           action: replace
148           regex: ([^:]+)(?::\d+)?;(\d+)
149           replacement: $1:$2
150           target_label: __address__
151         - action: labelmap
152           regex: __meta_kubernetes_pod_label_(.+)
153         - source_labels: [__meta_kubernetes_namespace]
154           action: replace
155           target_label: kubernetes_namespace
156         - source_labels: [__meta_kubernetes_pod_name]
157           action: replace
158           target_label: kubernetes_pod_name
159 
[root@k8smaster01 examples]# kubectl create -f prometheus-config.yaml

3.5 create Prometheus Deployment

[root@k8smaster01 examples]# vi prometheus-deployment.yml
  1 apiVersion: apps/v1beta2
  2 kind: Deployment
  3 metadata:
  4   labels:
  5     name: prometheus-deployment
  6   name: prometheus-server
  7   namespace: monitoring
  8 spec:
  9   replicas: 1
 10   selector:
 11     matchLabels:
 12       app: prometheus-server
 13   template:
 14     metadata:
 15       labels:
 16         app: prometheus-server
 17     spec:
 18       containers:
 19         - name: prometheus-server
 20           image: prom/prometheus:v2.14.0
 21           command:
 22           - "/bin/prometheus"
 23           args:
 24             - "--config.file=/etc/prometheus/prometheus.yml"
 25             - "--storage.tsdb.path=/prometheus/"
 26             - "--storage.tsdb.retention=72h"
 27           ports:
 28             - containerPort: 9090
 29               protocol: TCP
 30           volumeMounts:
 31             - name: prometheus-config-volume
 32               mountPath: /etc/prometheus/
 33             - name: prometheus-storage-volume
 34               mountPath: /prometheus/
 35       serviceAccountName: prometheus
 36       imagePullSecrets:
 37         - name: regsecret
 38       volumes:
 39         - name: prometheus-config-volume
 40           configMap:
 41             defaultMode: 420
 42             name: prometheus-server-conf
 43         - name: prometheus-storage-volume
 44           emptyDir: {}
 45 
[root@k8smaster01 examples]# kubectl create -f prometheus-deployment.yml
Tip: if you need to store Prometheus persistently, you can create corresponding sc and PVC in advance. sc 044. Clustered storage StorageClass. PVC can refer to the following:
[root@k8smaster01 examples]# vi prometheus-pvc.yaml
  1 apiVersion: v1
  2 kind: PersistentVolumeClaim
  3 metadata:
  4   name: prometheus-pvc
  5   namespace: monitoring
  6   annotations:
  7     volume.beta.kubernetes.io/storage-class: ghstorageclass
  8 spec:
  9   accessModes:
 10   - ReadWriteMany
 11   resources:
 12     requests:
 13       storage: 5Gi
[root@k8smaster01 examples]# kubectl create -f prometheus-pvc.yaml
Modify the storage part of prometheus-deployment.yml to:
  1 ......
  2         - name: prometheus-storage-volume
  3           persistentVolumeClaim:
  4             claimName: prometheus-pvc
  5 ......
  6 

3.6 create Prometheus Service

[root@k8smaster01 examples]# vi prometheus-service.yaml
apiVersion: v1 kind: Service metadata: labels: app: prometheus-service name: prometheus-service namespace: monitoring spec: type: NodePort selector: app: prometheus-server ports: - port: 9090 targetPort: 9090 nodePort: 30909
[root@k8smaster01 examples]# kubectl create -f prometheus-service.yaml
[root@k8smaster01 examples]# kubectl get all -n monitoring
NAME READY STATUS RESTARTS AGE
pod/prometheus-server-fd5479489-q584s 1/1 Running 0 92s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-service NodePort 10.107.69.147 <none> 9090:30909/TCP 29s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-server 1/1 1 1 92s
NAME DESIRED CURRENT READY AGE
replicaset.apps/prometheus-server-fd5479489 1 1 1 92s

3.7 test Prometheus

Browser direct access: http://172.24.8.71:30909/
Check that Endpoint on all Kubernetes clusters is automatically connected to Prometheus through service discovery.
View memory through a graphical interface.
For more configurations of Prometheus, please refer to the official website: https://prometheus.io/docs/prometheus/latest/configuration/configuration/

IV. deployment of Grafana

4.1 obtaining deployment files

[root@uhost ~]# git clone https://github.com/liukuan73/kubernetes-addons

4.2 deploy grafana

[root@uhost ~]# cd /root/kubernetes-addons/monitor/prometheus+grafana
[root@k8smaster01 prometheus+grafana]# vi grafana.yaml
  1 ---
  2 apiVersion: v1
  3 kind: Service
  4 metadata:
  5   name: grafana
  6   namespace: monitoring
  7   labels:
  8     app: grafana
  9 spec:
 10   type: NodePort
 11   ports:
 12   - port: 3000
 13     targetPort: 3000
 14     nodePort: 30007
 15   selector:
 16     app: grafana
 17 ---
 18 apiVersion: extensions/v1beta1
 19 kind: Deployment
 20 metadata:
 21   labels:
 22     app: grafana
 23   name: grafana
 24   namespace: monitoring
 25 spec:
 26   replicas: 1
 27   revisionHistoryLimit: 2
 28   template:
 29     metadata:
 30       labels:
 31         app: grafana
 32     spec:
 33       containers:
 34       - name: gragana
 35         image: grafana/grafana:5.0.0
 36         imagePullPolicy: IfNotPresent
 37         ports:
 38         - containerPort: 3000
 39         volumeMounts:
 40         - mountPath: /var
 41           name: grafana-storage
 42         env:
 43           - name: GF_AUTH_BASIC_ENABLED
 44             value: "false"
 45           - name: GF_AUTH_ANONYMOUS_ENABLED
 46             value: "true"
 47           - name: GF_AUTH_ANONYMOUS_ORG_ROLE
 48             value: Admin
 49           - name: GF_SERVER_ROOT_URL
 50 #              value: /api/v1/proxy/namespaces/default/services/grafana/
 51             value: /
 52         readinessProbe:
 53           httpGet:
 54             path: /login
 55             port: 3000
 56       volumes:
 57       - name: grafana-storage
 58         emptyDir: {}
 59       nodeSelector:
 60         node-role.kubernetes.io/master: "true"
 61 #      tolerations:
 62 #      - key: "node-role.kubernetes.io/master"
 63 #        effect: "NoSchedule"
 64 
[root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster01 node-role.kubernetes.io/master=true
[root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster02 node-role.kubernetes.io/master=true
[root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster03 node-role.kubernetes.io/master=true
[root @ k8smaster01 Prometheus + grafana] ා kubectl tail node -- all node role.kubernetes.io - ා allow Master to deploy applications
[root@k8smaster01 prometheus+grafana]# kubectl create -f grafana.yaml
[root@k8smaster01 examples]# kubectl get all -n monitoring

4.3 validation

Browser access: http://172.24.8.71:30007, login with the default user name of admin/admin.

4.4 configuring data sources

Configuration ----> Data Sources.
Add a new data source.
Add Prometheus data source as follows. This environment is based on the highly available Kubernetes deployed in Appendix 012.Kubeadm deployment highly available Kubernetes. There are vip:172.24.8.100, or the Prometheus address tested in step 3.7.
Save and test for success.

4.5 configuring Grafana

Configure the Dashboard. In this experiment, 162 template is used to display the monitoring information of Kubernetes cluster.
Select the Prometheus data source added in 4.4 for presentation.

4.6 add users

You can add ordinary users and configure corresponding roles.
Copy login link: http://172.24.8.71:30007/invite/hlhkzz5o3dj94olhckiqn8bprzt40
Enter the link, set the new user password and log in:

4.7 other settings

It is recommended to set the time zone. For more configurations of other Grafana, please refer to https://grafana.com/docs/grafana/latest/installation/configuration/

4.8 view monitoring

Log in http://172.24.8.71:30007/ to view the corresponding Kubernetes monitoring.
Reference link of this scheme:
https://www.kubernetes.org.cn/4184.html
https://www.kubernetes.org.cn/3418.html
https://www.jianshu.com/p/c2e549480c50

Tags: Linux Kubernetes ascii git github

Posted on Wed, 25 Mar 2020 02:45:25 -0700 by tanita