Prometheus service discovery

1, Prometheus deployment

Environmental preparation

hostnamectl set-hostname prometheus		 
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
vim /etc/resolv.conf
nameserver 114.114.114.114
ntpdate ntp1.aliyun.com 					#Time synchronization must be done, otherwise there will be problems

Unpack and start service

#Drag the installation package in and unzip the specified directory
tar zxvf prometheus-2.27.1.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
cd prometheus-2.27.1.linux-amd64/
./prometheus

Open another terminal and check whether the port has been opened

[root@prometheus ~]#netstat -antp | grep 9090
tcp6       0      0 :::9090                 :::*                    LISTEN      2463/./prometheus   
tcp6       0      0 ::1:9090                ::1:53170               ESTABLISHED 2463/./prometheus   
tcp6       0      0 ::1:53170               ::1:9090                ESTABLISHED 2463/./prometheus

Visit the web page 192.168 74.135:9090 (expression browser)

Visit 192.168 74.135:9090/metrics view the internal key indicators of prometheus

2, Deploy and monitor other nodes

host name	address	Required installation package
prometheus	192.168.74.135	prometheus-2.27.1.linux-amd64.tar.gz
server1	192.168.74.122	node_exporter-1.1.2.linuz-amd64.tar.gz
server2	192.168.74.128	node_exporter-1.1.2.linuz-amd64.tar.gz
server3	192.168.74.131	node_exporter-1.1.2.linuz-amd64.tar.gz

Since the primary server has been configured, it will not be reconfigured

1. Main configuration file analysis

cd prometheus-2.27.1.linux-amd64/
vim prometheus.yml
my global config
global:													#Global component
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  												#How often do I grab indicators? 1 minute is not set by default
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.	  				 
  														#Evaluation cycle of built-in alarm rules
 #scrape_timeout is set to the global default (10s).

# Alertmanager configuration		 					#Docked altermanager (third party alarm module)
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:		 										#Alarm rules; Alarm rules can be written using yml rules
- "first_rules.yml"
- "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:		 									#data acquisition module 
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.		##The source of the captured indicator data collection is the job_name
  	- job_name: 'prometheus'		 					#For the label of indicators and Prometheus SQL (query statement): for example, prometheus{target='values'}

  	 # metrics_path defaults to '/metrics' 		  		#The path to collect data; Show how to use metrics mode
    	 # scheme defaults to 'http'.		 			#The default fetching method is http

   	 static_configs:		 							#For the static configuration of Prometheus, the listening port is the specific data collection location. The default port is 9090
   	 - targets: ['localhost:9090']

2. server node configuration

Upload compressed package loading node_exporter
tar zxvf node_exporter-1.1.2.linux-amd64.tar.gz
cd node_exporter-1.1.2.linux-amd64/
cp node_exporter /usr/local/bin/

Open service

./node_exporter
netstat -antp | grep 9100

./node_exporter --help		 						#You can view command options
 Service management mode utilfile(File reading tool)
[Unit]
Description=node_exporter
Documentation=https:/prometheus.io/
After=network.targets
[serveice]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/node_exporter \
    --collector.ntp \
    --collector.mountstats \
    --collector.systemd \
    --collertor.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target

Visit the save server node to view the captured content
Access master node to view content

3. Join slave node monitoring

Required at 192.168 74.135 the prometheus server stops prometheus, modifies the configuration file and adds static targets before the server node can join

cd /usr/local/prometheus-2.27.1.linux-amd64/
vim prometheus.yml		 						#Add the following at the end of the configuration file
  - job_name: 'nodes'
    static_configs:
    - targets:
      - 192.168.74.122:9100
      - 192.168.74.128:9100
      - 192.168.74.131:9100

./prometheus									#Start service

4. Verify whether the join is successful

3, Expression browser

1. General use of expression browser

Data filtering can be performed on the Prometheus UI console

View total CPU usage

node_cpu_seconds_total

Calculate the CPU idle rate in the past 5 minutes

irate(node_cpu_seconds_total{mode="idle"}[5m])

Resolution:
irate: rate calculation function (very sensitive)
node_cpu_seconds_total: total CPU usage of node (indicator)
mode = "idle" idle indicator (label)
5m: sample values of all CPU idle numbers in the past 5 minutes, and rate calculation is performed for each value
{mode = "idle"}: the whole is called label filter

Average CPU usage per host within 5 groups

(1- avg (irate(node_cpu_seconds_total{mode='idle'}[5m]))by (instance))* 100

Resolution:
avg: Average
avg (irate(node_cpu_seconds_total{mode = 'idle'} [5m]): it can be understood as the percentage of CPU idle
by (instance): indicates all nodes
(1- avg (irate(node_cpu_seconds_total{mode = 'idle'} [5m]) by (instance)) * 100: average CPU utilization in 5 minutes

Query the time series when the average load in one minute exceeds twice the number of host CPU s

node_load1 > on (instance) 2 * count (node_cpu_ceconds_total{mode='idle'}) by(instance)

2. Memory utilization

node_memory_MemTotal_bytes
node_memory_MemFree_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes

#Calculate usage
 Available space: the sum of the last three indicators above
 Used space: total space minus available space
 Usage: used space divided by total space

4, Service discovery

1. Prometheus service discovery

① File based service discovery:
Define a group of resource "child" configuration files in yaml format, which only stores the targets information that the party needs to collect. In this way, it can be dynamically obtained by pro without restart
② DNS based service discovery:
SRV form
③ API based service discovery:
Kubernetes, Consul, Azure, retag
target re marking
metric re marking
④ K8S based service discovery

2. prometheus service discovery mechanism

① The data capture of Prometheus Server works in the Pull model. Therefore, it must know the location of each Target in advance before it can capture data from the corresponding Exporter or Instrumentation
② For a small system environment, the problem can be solved by specifying each Target through static_configurations, which is also the simplest configuration method. Each Target is identified by a network endpoint (ip:port);
③ For medium and large-scale system environments or cloud computing environments with strong dynamics, static configuration is obviously difficult to apply; therefore, Prometheus has specially designed a set of service discovery mechanisms to be based on the service registry (Service Bus) Automatically discover, detect and classify targets that can be monitored, and update the life cycle of Target indicator capture that has changed
④ During each scene_interval, Prometheus will check the executed jobs (jobs); these jobs will first generate a target list according to the discovery configuration specified on the Job, which is the service discovery process; service discovery will return a target list containing a group of labels called metadata, which are marked with "meta_" Is a prefix;
⑤ Service discovery will also set other tags with "prefix and suffix, b including" scheme "," address "and" metrics path_ ", according to the target configuration, The target supporting protocol (http or https, the default is http), the address of the target and the URI path of the index (the default is / metrics) are saved respectively;
⑥ If there are any parameters in the URI path, their prefix will be set to "parameter". These target lists and labels will be returned to Prometheus, and some of them can also be overwritten in the configuration;
⑦ The configuration tag will be reused in the captured life cycle to generate other tags. For example, the default value of the instance tag on the indicator comes from the value of the address tag;
⑧ Prometheus provides an opportunity to relabel the discovered targets. It is defined in the relabel_config configuration of the job configuration section and is commonly used to achieve the following functions

3. Static configuration discovery

#Modify the configuration file on the prometheus server to specify the port of targets, which has been configured above
vim prometheus.yml		 						
  - job_name: 'nodes'
    static_configs:
    - targets:
      - 192.168.8.19:9100
      - 192.168.8.18:9100
      - 192.168.8.17:9100

4. Dynamic discovery

4.1 document based service discovery

File based service discovery is only slightly better than statically configured service discovery. It does not depend on any platform or third-party services, so it is also the simplest and most common implementation.
prometheus server regularly loads target information from the file (pro server pull indicator discovery mechanism - job_name obtains the object target I want to pull). The file can only use json and yaml formats, which contains the defined target list and optional label information
The following first configuration can convert the default static configuration of prometheus into the configuration required for file based service discovery
(prometheus will periodically read and reload the configuration in this file to achieve dynamic discovery and update operations)

① Environmental preparation

cd /usr/local/prometheus-2.27.1.linux-amd64/
mkdir file_sd
cd file_sd
mkdir targets
#Upload the modified Prometheus.yml to the file_sd directory
cd targets
#Upload nodes_centos.yaml and Prometheus_server.yaml to the targets directory

Matching file resolution
② Specify profile startup

./prometheus --config.file=./file_sd/prometheus.yml

③ Open three slave nodes

./node_exporter

④ Browser login view http://192.168.74.135:9090/targets

⑤ Restart a terminal, add a node information, and check whether the node information is added

4.2 role of document discovery

If you add a node or Prometheus server node, you only need to change the nodes_centos.yaml prometheus_server.yaml two files add address on the line, do not need to stop the service

5. Automatic discovery based on DNS

DNS based service discovery periodically queries a group of DNS domain names to find the target to be monitored. The DNS server used in the query is controlled by / etc / resolv Conf file assignment
The discovery mechanism relies on A, AAAA and SRv resource records, and only supports this kind of method, but does not support the advanced DNS discovery method in RFC6763
Ps:

##SRV: SRv records are used to indicate the services provided under a domain name. example:
http._tcp.example.com.SRV 10 5 80. www.example.comSRv Meaning of the following items:
10-priority,similar MX record
5-weight
80-port
www.example.com -The host name of the actual service. meanwhile SRv You can specify which port corresponds to service
#Based on the SRv record in the Dws service, th prometheus finds that the corresponding port on the specified target corresponds to exporter or instrumentation

6. Discovery based on consumer

6.1 general

An open source tool developed based on golana, mainly for distributed and service-oriented systems, it provides the functions of service registration, service discovery and configuration management, and provides the functions of service registration / discovery, health check, Key/value storage, multi data center and distributed consistency assurance
Principle:
By defining the json file, the services that can be used for data collection are registered in the consumer for automatic discovery. At the same time, prametheus is used as the client to obtain the services registered on the consumer, so as to obtain data

6.2 deployment and installation

prometheus automatically discovers the host list configuration through consumer

Idea:
prometheus servers The JSON file contains its host information. The host information contains the corresponding tag tags: "prometheus". This configuration file is loaded by consumer and will be displayed on the 8500 port. prometheus also defines two job s in the yml file: "prometheus" and "nodes", which are associated with the location of consumer 192.168 At 74.135:8500, prometheus will regularly go to the consumer 8500 to find the node labeled prometheus. You can get the host information on the 8500. After finding it, you can go directly to http://192.168.74.135:9090/metrics Collect information on the ui and display it through the ui expression browser

① Install consumer_ 1.9. 0 version

[root@prometheus ~]#wget http://101.34.22.188/consul/consul_1.9.0_linux_amd64.zip &> /dev/null
[root@prometheus ~]#ls
consul_1.9.0_linux_amd64.zip
[root@prometheus ~]#unzip consul_1.9.0_linux_amd64.zip -d /usr/local/bin/
Archive:  consul_1.9.0_linux_amd64.zip
  inflating: /usr/local/bin/consul

② Start developer mode

The consumer developer mode can quickly start a single node consumer service with complete functions to facilitate development and testing

[root@prometheus ~]#mkdir -pv /consul/data
mkdir: Directory created "/consul"
mkdir: Directory created "/consul/data"
[root@prometheus ~]#mkdir /etc/consul
[root@prometheus ~]#cd /etc/consul/
[root@prometheus /etc/consul]#consul agent -dev -ui -data-dir=/consul/data/ -config-dir=/etc/consul/ -client=0.0.0.0
......
#Parameter analysis
consul agent 		#Use agent to open 
-dev    			#Developer mode
-ui 				#Enable ui interface
-data-dir 			#Location of data files
-config-dir 		#Configuration file location for consumer
-client 			#Listening clients are all

-③ Edit Prometheus servers. In the / etc / consumer directory JSON configuration file

[root@prometheus ~]#vim /etc/consul/prometheus-servers.json

{
  "services": [
    {
      "id": "prometheus-server-node01",
      "name": "prom-server-node01",
      "address": "192.168.74.135",
      "port": 9090,
      "tags": ["prometheus"],
      "checks": [{
        "http": "http://192.168.74.135:9090/metrics",
        "interval": "5s"
      }]
    }
  ]
}

[root@prometheus ~]#consul reload
Configuration reload triggered
[root@prometheus ~]#netstat -antp |grep consul
tcp        0      0 127.0.0.1:8300          0.0.0.0:*               LISTEN      64781/consul        
tcp        0      0 127.0.0.1:8301          0.0.0.0:*               LISTEN      64781/consul        
tcp        0      0 127.0.0.1:8302          0.0.0.0:*               LISTEN      64781/consul        
tcp        0      0 127.0.0.1:45987         127.0.0.1:8300          ESTABLISHED 64781/consul        
tcp        0      0 127.0.0.1:8300          127.0.0.1:45987         ESTABLISHED 64781/consul        
tcp6       0      0 :::8600                 :::*                    LISTEN      64781/consul        
tcp6       0      0 :::8500                 :::*                    LISTEN      64781/consul        
tcp6       0      0 :::8502                 :::*                    LISTEN      64781/consul

④ Terminate the Prometheus service first and modify the configuration file

[root@prometheus ~]#ps aux|grep prometheus
root      63526  0.3  2.4 1114924 94196 pts/1   Sl+  12:10   0:22 ./prometheus --config.file=./file_sd/prometheus.yml
root      64823  0.0  0.0 112728   976 pts/2    S+   14:04   0:00 grep --color=auto prometheus
[root@prometheus ~]#kill -9 63526
[root@prometheus ~]#ps aux|grep prometheus
root      64826  0.0  0.0 112728   976 pts/2    S+   14:04   0:00 grep --color=auto prometheus
[root@prometheus ~]#cd /usr/local/prometheus-2.27.1.linux-amd64/
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#mkdir consul_sd
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#ls
console_libraries  consoles  consul_sd  data  file_sd  LICENSE  nohup.out  NOTICE  prometheus  prometheus.yml  promtool
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#cd consul_sd/
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64/consul_sd]#wget http://101.34.22.188/consul/prometheus/prometheus.yml
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64/consul_sd]#cat prometheus.yml 
# my global config
# Author: MageEdu <mage@magedu.com>
# Repo: http://gitlab.magedu.com/MageEdu/prometheus-configs/
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    consul_sd_configs:
    - server: "192.168.10.20:8500"
      tags:
      - "prometheus"
      refresh_interval: 2m 

  # All nodes
  - job_name: 'nodes'
    consul_sd_configs:
    - server: "192.168.10.20:8500"
      tags:
      - "nodes"
      refresh_interval: 2m 

#Pay attention to modifying the IP address
#Specify the location of the configuration file and run Prometheus. You can nohup& Background operation
[root@prometheus /usr/local/prometheus-2.27.1.linux-amd64]#./prometheus --config.file=./consul_sd/prometheus.yml
......

⑤ Access with browser http://192.168.8.20:8500 Check whether the node is joined

[root@prometheus /etc/consul]#wget http://101.34.22.188/consul/prometheus/nodes.json
......
[root@prometheus /etc/consul]#cat nodes.json 
{
  "services": [
    {
      "id": "node_exporter-node01",
      "name": "node01",
      "address": "192.168.74.131",
      "port": 9100,
      "tags": ["nodes"],
      "checks": [{
        "http": "http://192.168.74.131:9100/metrics",
        "interval": "5s"
      }]
    },
    {
      "id": "node_exporter-node02",
      "name": "node02",
      "address": "192.168.74.122",
      "port": 9100,
      "tags": ["nodes"],
      "checks": [{
        "http": "http://192.168.74.122:9100/metrics",
        "interval": "5s"
      }]
    }
  ]
}

[root@prometheus /etc/consul]#consul reload
Configuration reload triggered

7.Grafana deployment and template display

7.1Grafana overview

grafana is a general visualization tool developed based on go language. It supports loading and displaying data from different data sources. It can be used as part of its data source. The storage system is as follows:
① TSDB: Prometheus, InfluxDB, OpenTSDB and graph
② Log and document storage: Loki and Elasticsearch
③ Distributed request tracking: Zipkin, Jaeger, and Tenpo
④ SQLDB: MySQL, PostgreSQL and Microsoft SQL Server
grafana foundation listens to port 3000 of TCP protocol by default, supports the integration of other authentication services, and can output built-in indicators through / metrics

Supported presentation methods:
① Data Source: provides a storage system for displaying data
② Dashboard: a visual panel for organizing and managing data
③ Team and users: provide management capability for enterprise organization level

7.2 installation

wget http://101.34.22.188/grafana/grafana-7.3.6-1.x86_64.rpm
yum install -y grafana-7.3.6-1.x86_64.rpm
systemctl enable grafana-server	&& systemctl start grafana-server
netstat -nuptl|grep 3000

Browser access http://IP:3000 , default account password: admin admin