prometheus (version 2.0.0) Series II

On the configuration of prometheus

Prometheus configuration is YAML, this article will use a Sample configuration To interpret

Official configuration here, good English

#Global configuration will be applied to all configuration items, and global configuration will be overridden for the same configuration in specific configuration items
global:
  scrape_interval:     15s
  #Grabbing interval, that is, the time interval for prometheus server to get indicators from a given exporter / port
  evaluation_interval: 30s
  #Evaluation interval, that is, the time interval for prometheus server to evaluate the captured indicators
  # scrape_timeout is set to the global default (10s).
  #The timeout for grabbing is set to 10s

  external_labels:
  #External label, custom key value pair, multiple
    monitor: codelab
    foo:     bar

rule_files:
#Rules can read multiple files or paths, and support certain regular matching. A sample format of general rules file will be attached later
- "first.rules"
- "my/*.rules"

remote_write:
#Remote write: write the data collected by this server to another host, and perform specific actions on some tags
  - url: http://remote1/push
    write_relabel_configs:
    - source_labels: [__name__]
      regex:         expensive.*
      action:        drop
  - url: http://remote2/push

remote_read:
#Remote read, get the index from other hosts, and specify the specific service / task (filter by tag)
  - url: http://remote1/read
    read_recent: true
  - url: http://remote3/read
    read_recent: false
    required_matchers:
      job: special

scrape_configs:
#Grab configuration
- job_name: prometheus
  honor_labels: true#It is used to solve the conflict between the captured data and the server-side label. If it is set to true, the label of the captured data will be retained. Otherwise, the grab object + server-side label will be applied
  # scrape_interval is defined by the configured global (15s).
  # scrape_timeout is defined by the global default (10s).

  # metrics_path defaults to '/metrics'
  # scheme defaults to 'http'.

  file_sd_configs:
  #Through the specified file or path for automatic service. Target discovery generally requires reload to take effect. The file format will be explained as an example later
    - files:
      - foo/*.slow.json
      - foo/*.slow.yml
      - single/file.yml
      refresh_interval: 10m
    - files:
      - bar/*.yaml

  static_configs:
  #The static configuration of the service / target is in the following format. The list contains the endpoint
  - targets: ['localhost:9090', 'localhost:9191']
    labels:#Target label, custom key value pair, multiple
      my:   label
      your: label

  relabel_configs:#Label override configuration
  - source_labels: [job, __meta_dns_name]#Source label
    regex:         (.*)some-[regex]#Regular matching
    target_label:  job#Target label
    replacement:   foo-${1}#Replacement value
    # action defaults to 'replace'
  - source_labels: [abc]
    target_label:  cde
  - replacement:   static
    target_label:  abc
  - regex:
    replacement:   static
    target_label:  abc

  bearer_token_file: valid_token_file


- job_name: service-x

  basic_auth:#Authentication settings for services requiring authentication
    username: admin_name
    password: "multiline\nmysecret\ntest"

  scrape_interval: 50s
  scrape_timeout:  5s

  sample_limit: 1000

  metrics_path: /my_path
  scheme: https

  dns_sd_configs:
  - refresh_interval: 15s
    names:
    - first.dns.address.domain.com
    - second.dns.address.domain.com
  - names:
    - first.dns.address.domain.com
    # refresh_interval defaults to 30s.

  relabel_configs:
  - source_labels: [job]
    regex:         (.*)some-[regex]
    action:        drop
  - source_labels: [__address__]
    modulus:       8
    target_label:  __tmp_hash
    action:        hashmod
  - source_labels: [__tmp_hash]
    regex:         1
    action:        keep
  - action:        labelmap
    regex:         1
  - action:        labeldrop
    regex:         d
  - action:        labelkeep
    regex:         k

  metric_relabel_configs:
  - source_labels: [__name__]
    regex:         expensive_metric.*
    action:        drop

- job_name: service-y

  consul_sd_configs:
  - server: 'localhost:1234'
    token: mysecret
    services: ['nginx', 'cache', 'mysql']
    scheme: https
    tls_config:
      ca_file: valid_ca_file
      cert_file: valid_cert_file
      key_file:  valid_key_file
      insecure_skip_verify: false

  relabel_configs:
  - source_labels: [__meta_sd_consul_tags]
    separator:     ','
    regex:         label:([^=]+)=([^,]+)
    target_label:  ${1}
    replacement:   ${2}

- job_name: service-z

  tls_config:
    cert_file: valid_cert_file
    key_file: valid_key_file

  bearer_token: mysecret

- job_name: service-kubernetes

  kubernetes_sd_configs:
  - role: endpoints
    api_server: 'https://localhost:1234'

    basic_auth:
      username: 'myusername'
      password: 'mysecret'

- job_name: service-kubernetes-namespaces

  kubernetes_sd_configs:
  - role: endpoints
    api_server: 'https://localhost:1234'
    namespaces:
      names:
        - default

- job_name: service-marathon
  marathon_sd_configs:
  - servers:
    - 'https://marathon.example.com:443'

    tls_config:
      cert_file: valid_cert_file
      key_file: valid_key_file

- job_name: service-ec2
  ec2_sd_configs:
    - region: us-east-1
      access_key: access
      secret_key: mysecret
      profile: profile

- job_name: service-azure
  azure_sd_configs:
    - subscription_id: 11AAAA11-A11A-111A-A111-1111A1111A11
      tenant_id: BBBB222B-B2B2-2B22-B222-2BB2222BB2B2
      client_id: 333333CC-3C33-3333-CCC3-33C3CCCCC33C
      client_secret: mysecret
      port: 9100

- job_name: service-nerve
  nerve_sd_configs:
    - servers:
      - localhost
      paths:
      - /monitoring

- job_name: 0123service-xxx
  metrics_path: /metrics
  static_configs:
    - targets:
      - localhost:9090

- job_name: Testing
  metrics_path: /metrics
  static_configs:
    - targets:
      - localhost:9090

- job_name: service-triton
  triton_sd_configs:
  - account: 'testAccount'
    dns_suffix: 'triton.example.com'
    endpoint: 'triton.example.com'
    port: 9163
    refresh_interval: 1m
    version: 1
    tls_config:
      cert_file: testdata/valid_cert_file
      key_file: testdata/valid_key_file

alerting:#Configure the alertmanager to receive alarms. The endpoint can be multiple
  alertmanagers:
  - scheme: https
    static_configs:
    - targets:
      - "1.2.3.4:9093"
      - "1.2.3.5:9093"
      - "1.2.3.6:9093"

Rule format:
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

Sample configuration

groups:#Rule group
- name: example#First group name
  rules:#In group rules, the following alarm rules can be multiple, and two are listed below
  - alert: HighErrorRate1
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency
  - alert: HighErrorRate2
    expr: job:request_latency_seconds:mean5m{job="yourjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

Service configuration, example configuration, json format. The service is in the form of a list, and the internal target of the service is also in the form of a list:

[
{
    "targets": [
        "127.0.0.1:9104"
    ],
    "labels": {
        "job":"job1",
        "service":"service1"
    }
},
{
        "targets": [
                "127.0.0.1:9105""127.0.0.1:9106""127.0.0.1:9107"
        ],
        "labels": {
                "job":"job2",
                "service":"service2"
        }
}
]

Another thing to note is about the prometheus configuration overload:

Prometheus can reload its configuration at runtime. If the new configuration is not well-formed, the changes will not be applied. A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). This will also reload any configured rule files.

If we want to overload the service configuration by sending overload request to the endpoint, we need to add the following parameters (two operation modes) when running the program:

Command line start
nohup ./prometheus --web.enable-lifecycle --config.file=prometheus.yml &

//Or modify the service startup file:
"/usr/lib/systemd/system/prometheus.service"
# -*- mode: conf -*-

[Unit]
Description=The Prometheus monitoring system and time series database.
Documentation=https://prometheus.io
After=network.target

[Service]
EnvironmentFile=-/etc/default/prometheus
User=prometheus
ExecStart=/usr/bin/prometheus \
          --web.enable-lifecycle \##Note that this line does not exist by default. You need to add it to enable Lifecycle APIs
          --config.file=/etc/prometheus/prometheus.yml \
          --storage.tsdb.path=/var/lib/prometheus/data \
          --web.console.libraries=/usr/share/prometheus/console_libraries \
          --web.console.templates=/usr/share/prometheus/consoles \
          $PROMETHEUS_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure

[Install]
WantedBy=multi-user.target               

There are two ways to overload the configuration of prometheus:

1: Send a SIGHUP signal to the main process of the application:

kill -1 pid

2: Send a post request to the specified endpoint:

curl -XPOST http://ip:9090/-/reload
#For this method, pay attention to adding the above-mentioned -- web.enable-lifecycle startup parameter at startup

Tags: DNS JSON Kubernetes Nginx

Posted on Fri, 01 May 2020 06:45:17 -0700 by somenoise