Prometheus installation and configuration

I. Prometheus installation and configuration

1. Download and unzip the installation package

cd /usr/local/src/

export VER="2.13.1"
wget https://github.com/prometheus/prometheus/releases/download/v${VER}/prometheus-${VER}.linux-amd64.tar.gz

mkdir -p /data0/prometheus 
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheus
 
tar -xvf prometheus-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv prometheus-${VER}.linux-amd64 /data0/prometheus/prometheus_server
 
cd /data0/prometheus/prometheus_server/
mkdir -p {data,config,logs,bin} 
mv prometheus promtool bin/
mv prometheus.yml config/
 
chown -R prometheus.prometheus /data0/prometheus

2. Set environment variables

vim /etc/profile

PATH=/data0/prometheus/prometheus_server/bin:$PATH:$HOME/bin

source /etc/profile

3. Check the configuration file

promtool check config /data0/prometheus/prometheus_server/config/prometheus.yml

Checking /data0/prometheus/prometheus_server/config/prometheus.yml
  SUCCESS: 0 rule files found

4. Create the systemd unit file of prometheus.service

  • 4.1 regular services
sudo tee /etc/systemd/system/prometheus.service <<-'EOF'
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF

systemctl enable prometheus.service
systemctl stop prometheus.service
systemctl restart prometheus.service
systemctl status prometheus.service
  • 4.2. Use supervisor to manage Prometheus? Server
yum install -y epel-release supervisor

sudo tee /etc/supervisord.d/prometheus.ini<<-"EOF"
[program:prometheus]
# Order to start the program;
command = /data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d
# It also starts automatically when the supervisor starts;
autostart = true
# Restart automatically after the program exits abnormally;
autorestart = true
# If there is no abnormal exit after 5 seconds of startup, it will be regarded as normal startup;
startsecs = 5
# The number of automatic retries for startup failure is 3 by default;
startretries = 3
# The user who started the program;
user = prometheus
# Redirect stderr to stdout, false by default;
redirect_stderr = true
# Standard log output;
stdout_logfile=/data0/prometheus/prometheus_server/logs/out-prometheus.log
# Error log output;
stderr_logfile=/data0/prometheus/prometheus_server/logs/err-prometheus.log
# Standard log file size, 50MB by default;
stdout_logfile_maxbytes = 20MB
# Number of standard log file backups;
stdout_logfile_backups = 20
EOF

systemctl daemon-reload
systemctl enable supervisord
systemctl stop supervisord
systemctl restart supervisord
supervisorctl restart prometheus
supervisorctl status

5. prometheus.yml configuration file

#Create alert manager alert rule file
mkdir -p /data0/prometheus/prometheus_server/rules/
touch /data0/prometheus/prometheus_server/rules/node_down.yml
touch /data0/prometheus/prometheus_server/rules/memory_over.yml
touch /data0/prometheus/prometheus_server/rules/disk_over.yml
touch /data0/prometheus/prometheus_server/rules/cpu_over.yml

#prometheus profile
cat > /data0/prometheus/prometheus_server/config/prometheus.yml << \EOF
# my global config
global:
  scrape_interval: 15s # Set the pull interval, which is 1m by default
  evaluation_interval: 15s # Set the rule evaluation interval, which is 1m by default
  # scrape_timeout is set to the global default (10s).

# Alarm management configuration, default configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.56.11:9093 # Change here to the address of alertmanagers

# Load rules and evaluate them periodically at set intervals
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
  - "/data0/prometheus/prometheus_server/rules/node_down.yml"                 # Instance survival alarm rule file
  - "/data0/prometheus/prometheus_server/rules/memory_over.yml"               # Memory alarm rule file
  - "/data0/prometheus/prometheus_server/rules/disk_over.yml"                 # Disk alarm rule file
  - "/data0/prometheus/prometheus_server/rules/cpu_over.yml"                  # cpu alarm rule file

# Pull, i.e. monitoring target configuration
# By default, only the monitoring configuration of the host itself is available
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    # The capture interval that can override the global configuration settings is rewritten from 15 seconds to 5 seconds.
    scrape_interval: 10s

    static_configs:
      - targets: ['localhost:9090', 'localhost:9100']

  - job_name: 'DMC_HOST'
    file_sd_configs:
      - files: ['./hosts.json']  
      # The monitored host can list all machines through static config. Here, it can be read in the form of file SD config parameter loading file
      # The monitored host can be written in json or yaml format. I write in json format here. The ip address of the monitored machine is written in target. labels are not required and can be determined by you
EOF

#List of hosts configured in the form of file SD config parameter
cat > /data0/prometheus/prometheus_server/config/hosts.json << \EOF
[
{
"targets": [
  "192.168.56.11:9100",
  "192.168.56.12:9100",
  "192.168.56.13:9100"
],
"labels": {
    "service": "db_node"
    }
},
{
"targets": [
  "192.168.56.14:9100",
  "192.168.56.15:9100",
  "192.168.56.16:9100"
],
"labels": {
    "service": "web_node"
    }
}
]
EOF

# Server survival alarm
cat > /data0/prometheus/prometheus_server/rules/node_down.yml <<\EOF
groups:
- name: Instance survival warning rules
  rules:
  - alert: Instance survival alarm
    expr: up == 0
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
EOF

# mem alarm
cat > /data0/prometheus/prometheus_server/rules/memory_over.yml <<\EOF
groups:
- name: Memory alarm rules
  rules:
  - alert: Memory usage alarm
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "The server: Memory usage over 80%!(Current value: {{ $value }}%)"
EOF

# disk alarm
cat > /data0/prometheus/prometheus_server/rules/disk_over.yml <<\EOF
groups:
- name: Disk alarm rules
  rules:
  - alert: Disk usage alarm
    expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "The server: Disk device: Use more than 80%!(Mount point: {{ $labels.mountpoint }} Current value: {{ $value }}%)"
EOF

# cpu alarm
cat > /data0/prometheus/prometheus_server/rules/cpu_over.yml <<\EOF
groups:
- name: CPU Alarm rules
  rules:
  - alert: CPU Usage alarm
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "The server: CPU Use more than 90%!(Current value: {{ $value }}%)"
EOF

6, check ui

Prometheus comes with a simple UI, http://192.168.56.11:9090/

http://192.168.56.11:9090/targets
http://192.168.56.11:9090/graph

II. Installation and configuration of node  exporter

1. Download and unzip the installation package

cd /usr/local/src/

export VER="0.18.1"
wget https://github.com/prometheus/node_exporter/releases/download/v${VER}/node_exporter-${VER}.linux-amd64.tar.gz

mkdir -p /data0/prometheus 
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheus
 
tar -xvf node_exporter-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv node_exporter-${VER}.linux-amd64 /data0/prometheus/node_exporter
 
chown -R prometheus.prometheus /data0/prometheus

2. Create the systemd unit file of node [exporter. Service]

  • Create service under centos
cat > /usr/lib/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/node_exporter/node_exporter
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF
  • Create service under ubuntu
cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
 
[Service]
Type=simple
User=prometheus
ExecStart=/data0/prometheus/node_exporter/node_exporter
Restart=on-failure
 
[Install]
WantedBy=multi-user.target
EOF

3. Start service

systemctl daemon-reload
systemctl stop node_exporter.service
systemctl enable node_exporter.service
systemctl restart node_exporter.service

4. Operation status

systemctl status node_exporter.service

5. Customer monitoring end data report

Visit: http://192.168.56.11:9100/metrics View the specific data that can be retrieved from the exporter as follows:

III. deployment of Alertmanager nail alarm

1. Download and unzip the installation package

cd /usr/local/src/

export VER="0.19.0"
wget https://github.com/prometheus/alertmanager/releases/download/v${VER}/alertmanager-${VER}.linux-amd64.tar.gz

mkdir -p /data0/prometheus 
groupadd prometheus
useradd -g prometheus prometheus -d /data0/prometheus
 
tar -xvf alertmanager-${VER}.linux-amd64.tar.gz
cd /usr/local/src/
mv alertmanager-${VER}.linux-amd64 /data0/prometheus/alertmanager
 
chown -R prometheus.prometheus /data0/prometheus

2. Configure Alertmanager

Alert manager's webhook integrates nail alarm. Nail robot has strict requirements on file format, so it can only be sent to your nail robot through specific format conversion. Someone has written the conversion plug-in, so use it directly (https://github.com/timonwong/prometheus-webhook-dingtalk.git)
cat >/data0/prometheus/alertmanager/alertmanager.yml<<-"EOF"
# Global configuration item
global:
  resolve_timeout: 5m # Processing timeout, 5 min by default

# Define routing tree information
route:
  group_by: [alertname]  # Alarm group by
  receiver: ops_notify   # Set default recipient
  group_wait: 30s        # Initially, how long does it take for the first time to send a set of alerts
  group_interval: 60s    # Waiting time before sending a new alert
  repeat_interval: 1h    # Repeat the alarm time. Default 1h
  routes:

  - receiver: ops_notify  # Basic alarm notice
    group_wait: 10s
    match_re:
      alertname: Instance survival alarm|Disk usage alarm   # Send name in matching alarm rules

  - receiver: info_notify  # Message alert notification
    group_wait: 10s
    match_re:
      alertname: Memory usage alarm|CPU Usage alarm

# Define basic alarm receiver
receivers:
- name: ops_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/ops_dingding/send 
    send_resolved: true  # Notify when the alert is resolved

# Define message alarm receiver
- name: info_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/info_dingding/send 
    send_resolved: true

# An initiation rule is a rule that invalidates an alert matching a set of matchers when an alert matching another set of matchers exists. Both alerts must have the same set of labels. 
inhibit_rules: 
  - source_match: 
      severity: 'critical' 
    target_match: 
      severity: 'warning' 
    equal: ['alertname', 'dev', 'instance']
EOF

3. Start alertmanager

cat >/lib/systemd/system/alertmanager.service<<\EOF
[Unit]
Description=Prometheus: the alerting system
Documentation=http://prometheus.io/docs/
After=prometheus.service

[Service]
ExecStart=/data0/prometheus/alertmanager/alertmanager --config.file=/data0/prometheus/alertmanager/alertmanager.yml
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

systemctl enable alertmanager.service
systemctl stop alertmanager.service
systemctl restart alertmanager.service
systemctl status alertmanager.service

#View port
netstat -anpt | grep 9093

4. Connect the pin to Prometheus AlertManager WebHook

#The command line tests whether the robot can send messages successfully. Sometimes prometheus webhook dingtalk will report an error of 422, because of the pin's security restrictions (the security policy here is to send messages, which can only be sent normally if prometheus is included)
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9

curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223
  • 4.1. Binary package deployment plug-in
cd /usr/local/src/
export VER="0.3.0"
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v${VER}/prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz
tar -zxvf prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-${VER}.linux-amd64 /data0/prometheus/alertmanager/prometheus-webhook-dingtalk

#How to use: Prometheus webhook dingtalk -- Ding. Profile = the value of the receiving group = the value of webhook

cat > /etc/systemd/system/prometheus-webhook-dingtalk.service<<\EOF
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk \
          --ding.profile=ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9 \
          --ding.profile=info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223          

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl stop prometheus-webhook-dingtalk
systemctl restart prometheus-webhook-dingtalk
systemctl status prometheus-webhook-dingtalk

netstat -nltup|grep 8060
  • 4.2. docker deployment plug-ins
docker pull timonwong/prometheus-webhook-dingtalk:v0.3.0

#docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="<web-hook-name>=<dingtalk-webhook>"

docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223"

Two variables are explained here:

< web hook name >: Prometheus webhook dingtalk supports multiple pin webhooks. Different webhooks are mapped by name corresponding to URL. To support multiple pin webhooks, multiple -- ding.profile parameters can be used, for example: sudo dockerun - D -- restart always - P 8060:8060:8060 timwong / profileus webhook-dingtalk: v0.3.0 -- ding.profile="webhook1 = https://oapi.dingtalk.com/robot/send? Access \ u token = token 1 -- ding.profile =" webhook2 = https://oapi.dingtalk.com/robot.com/robot/send? Access? Access? Access? Access? Access? Access? Access? Access? Token 1 "--ding.profile =" webhook2 = https://oapi.dingtalk.com/robot.com/robot/send? Access? Access? Access? Access? Access? Access? Access? Access? Access? Access? Accessing.profile = "webhook2 = https://oapi.dingtalk.com/robot.com SS Ou token = t Oken2 ". The corresponding rules of name and URL are as follows: ding.profile="webhook1 =...", and the corresponding API URL is: http://localhost:8060/dingtalk/webhook1/send

< dingtalk webhook >: This is the previously obtained pin webhook
  • 4.3. Plug in deployment in source mode
#Installing the golang environment
cd /usr/local/src/
wget https://dl.google.com/go/go1.13.4.linux-amd64.tar.gz
tar -zxvf go1.13.4.linux-amd64.tar.gz
mv go/ /usr/local/

#vim /etc/profile
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin

#Add environment variable GOPATH
mkdir -p /opt/path
export GOPATH=/opt/path

#If $GOPATH/bin is not added to $PATH, you need to move its executable file to $GOBIN
export GOPATH=/opt/path
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
source /etc/profile

#Download plugins
cd /usr/local/src/
git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
go get github.com/timonwong/prometheus-webhook-dingtalk/cmd/prometheus-webhook-dingtalk
make   #(after make succeeds, a Prometheus webhook dingtalk binary will be generated.)

#Copy the pin alarm plug-in to the alertmanager directory
cp prometheus-webhook-dingtalk /data0/prometheus/alertmanager/

#Startup service
nohup /data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223" 2>&1 1>/tmp/dingding.log &

#Check port
netstat -anpt | grep 8060

IV. Grafana installation and configuration

1. Download and install

cd /usr/local/src/

export VER="6.4.3"
wget https://dl.grafana.com/oss/release/grafana-${VER}-1.x86_64.rpm
yum localinstall -y grafana-${VER}-1.x86_64.rpm

2. Start service

systemctl daemon-reload
systemctl enable grafana-server.service
systemctl stop grafana-server.service
systemctl restart grafana-server.service

3. Access to WEB interface

Default account / password: admin/admin http://192.168.56.11:3000

4. Grafana adds data source

On the landing page, click the "configuration data sources" button to jump to the add data source page. The configuration is as follows:
Name: prometheus
Type: prometheus
URL: http://192.168.56.11:9090
Access: Server
 Cancel the check of Default, and click "Add" for other defaults, as follows:

Plug in for pie chart needs to be installed
grafana-cli plugins install grafana-piechart-panel
systemctl restart grafana-server.service

Please make sure that the pie chart can be added normally after installation.

Install the consumer data source plug-in
grafana-cli plugins install sbueringer-consul-datasource
systemctl restart grafana-server.service

V. replacing grafana's dashboards

https://grafana.com/dashboards

https://grafana.com/grafana/dashboards/11074 basic monitoring new

https://grafana.com/dashboards/8919 basic monitoring

https://grafana.com/dashboards/7362 database monitoring

Reference documents:

https://www.jianshu.com/p/e59cfd15612e Centos 7 deployment Prometheus, Alertmanager, Grafana monitoring Linux host

https://juejin.im/entry/5c2c4a7f6fb9a049b82a90ee Monitoring Ceph with Prometheus

https://blog.csdn.net/xiegh2014/article/details/84936174 Centos7.5 prometheus2.5 + grafana 5.4 monitoring deployment

https://www.cnblogs.com/smallSevens/p/7805842.html Grafana+Prometheus to build an all-round stereo monitoring system

https://www.cnblogs.com/sfnz/p/6566951.html install prometheus+grafana to monitor mysql redis kubernetes et al

https://blog.csdn.net/hzs33/article/details/86553259 prometheus+grafana monitor mysql and canal servers

Keywords: Operation & Maintenance Linux github JSON network

Added by corcode on Tue, 12 Nov 2019 13:48:18 +0200