I. Prometheus installation and configuration
1. Download and unzip the installation package
cd /usr/local/src/ export VER="2.13.1" wget https://github.com/prometheus/prometheus/releases/download/v${VER}/prometheus-${VER}.linux-amd64.tar.gz mkdir -p /data0/prometheus groupadd prometheus useradd -g prometheus prometheus -d /data0/prometheus tar -xvf prometheus-${VER}.linux-amd64.tar.gz cd /usr/local/src/ mv prometheus-${VER}.linux-amd64 /data0/prometheus/prometheus_server cd /data0/prometheus/prometheus_server/ mkdir -p {data,config,logs,bin} mv prometheus promtool bin/ mv prometheus.yml config/ chown -R prometheus.prometheus /data0/prometheus
2. Set environment variables
vim /etc/profile PATH=/data0/prometheus/prometheus_server/bin:$PATH:$HOME/bin source /etc/profile
3. Check the configuration file
promtool check config /data0/prometheus/prometheus_server/config/prometheus.yml Checking /data0/prometheus/prometheus_server/config/prometheus.yml SUCCESS: 0 rule files found
4. Create the systemd unit file of prometheus.service
- 4.1 regular services
sudo tee /etc/systemd/system/prometheus.service <<-'EOF' [Unit] Description=Prometheus Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl enable prometheus.service systemctl stop prometheus.service systemctl restart prometheus.service systemctl status prometheus.service
- 4.2. Use supervisor to manage Prometheus? Server
yum install -y epel-release supervisor sudo tee /etc/supervisord.d/prometheus.ini<<-"EOF" [program:prometheus] # Order to start the program; command = /data0/prometheus/prometheus_server/bin/prometheus --config.file=/data0/prometheus/prometheus_server/config/prometheus.yml --storage.tsdb.path=/data0/prometheus/prometheus_server/data --storage.tsdb.retention=60d # It also starts automatically when the supervisor starts; autostart = true # Restart automatically after the program exits abnormally; autorestart = true # If there is no abnormal exit after 5 seconds of startup, it will be regarded as normal startup; startsecs = 5 # The number of automatic retries for startup failure is 3 by default; startretries = 3 # The user who started the program; user = prometheus # Redirect stderr to stdout, false by default; redirect_stderr = true # Standard log output; stdout_logfile=/data0/prometheus/prometheus_server/logs/out-prometheus.log # Error log output; stderr_logfile=/data0/prometheus/prometheus_server/logs/err-prometheus.log # Standard log file size, 50MB by default; stdout_logfile_maxbytes = 20MB # Number of standard log file backups; stdout_logfile_backups = 20 EOF systemctl daemon-reload systemctl enable supervisord systemctl stop supervisord systemctl restart supervisord supervisorctl restart prometheus supervisorctl status
5. prometheus.yml configuration file
#Create alert manager alert rule file mkdir -p /data0/prometheus/prometheus_server/rules/ touch /data0/prometheus/prometheus_server/rules/node_down.yml touch /data0/prometheus/prometheus_server/rules/memory_over.yml touch /data0/prometheus/prometheus_server/rules/disk_over.yml touch /data0/prometheus/prometheus_server/rules/cpu_over.yml #prometheus profile cat > /data0/prometheus/prometheus_server/config/prometheus.yml << \EOF # my global config global: scrape_interval: 15s # Set the pull interval, which is 1m by default evaluation_interval: 15s # Set the rule evaluation interval, which is 1m by default # scrape_timeout is set to the global default (10s). # Alarm management configuration, default configuration alerting: alertmanagers: - static_configs: - targets: - 192.168.56.11:9093 # Change here to the address of alertmanagers # Load rules and evaluate them periodically at set intervals rule_files: # - "first_rules.yml" # - "second_rules.yml" - "/data0/prometheus/prometheus_server/rules/node_down.yml" # Instance survival alarm rule file - "/data0/prometheus/prometheus_server/rules/memory_over.yml" # Memory alarm rule file - "/data0/prometheus/prometheus_server/rules/disk_over.yml" # Disk alarm rule file - "/data0/prometheus/prometheus_server/rules/cpu_over.yml" # cpu alarm rule file # Pull, i.e. monitoring target configuration # By default, only the monitoring configuration of the host itself is available scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. # The capture interval that can override the global configuration settings is rewritten from 15 seconds to 5 seconds. scrape_interval: 10s static_configs: - targets: ['localhost:9090', 'localhost:9100'] - job_name: 'DMC_HOST' file_sd_configs: - files: ['./hosts.json'] # The monitored host can list all machines through static config. Here, it can be read in the form of file SD config parameter loading file # The monitored host can be written in json or yaml format. I write in json format here. The ip address of the monitored machine is written in target. labels are not required and can be determined by you EOF #List of hosts configured in the form of file SD config parameter cat > /data0/prometheus/prometheus_server/config/hosts.json << \EOF [ { "targets": [ "192.168.56.11:9100", "192.168.56.12:9100", "192.168.56.13:9100" ], "labels": { "service": "db_node" } }, { "targets": [ "192.168.56.14:9100", "192.168.56.15:9100", "192.168.56.16:9100" ], "labels": { "service": "web_node" } } ] EOF # Server survival alarm cat > /data0/prometheus/prometheus_server/rules/node_down.yml <<\EOF groups: - name: Instance survival warning rules rules: - alert: Instance survival alarm expr: up == 0 for: 1m labels: user: prometheus severity: warning annotations: description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes." EOF # mem alarm cat > /data0/prometheus/prometheus_server/rules/memory_over.yml <<\EOF groups: - name: Memory alarm rules rules: - alert: Memory usage alarm expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80 for: 1m labels: user: prometheus severity: warning annotations: description: "The server: Memory usage over 80%!(Current value: {{ $value }}%)" EOF # disk alarm cat > /data0/prometheus/prometheus_server/rules/disk_over.yml <<\EOF groups: - name: Disk alarm rules rules: - alert: Disk usage alarm expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 80 for: 1m labels: user: prometheus severity: warning annotations: description: "The server: Disk device: Use more than 80%!(Mount point: {{ $labels.mountpoint }} Current value: {{ $value }}%)" EOF # cpu alarm cat > /data0/prometheus/prometheus_server/rules/cpu_over.yml <<\EOF groups: - name: CPU Alarm rules rules: - alert: CPU Usage alarm expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 90 for: 1m labels: user: prometheus severity: warning annotations: description: "The server: CPU Use more than 90%!(Current value: {{ $value }}%)" EOF
6, check ui
Prometheus comes with a simple UI, http://192.168.56.11:9090/
http://192.168.56.11:9090/targets http://192.168.56.11:9090/graph
II. Installation and configuration of node exporter
1. Download and unzip the installation package
cd /usr/local/src/ export VER="0.18.1" wget https://github.com/prometheus/node_exporter/releases/download/v${VER}/node_exporter-${VER}.linux-amd64.tar.gz mkdir -p /data0/prometheus groupadd prometheus useradd -g prometheus prometheus -d /data0/prometheus tar -xvf node_exporter-${VER}.linux-amd64.tar.gz cd /usr/local/src/ mv node_exporter-${VER}.linux-amd64 /data0/prometheus/node_exporter chown -R prometheus.prometheus /data0/prometheus
2. Create the systemd unit file of node [exporter. Service]
- Create service under centos
cat > /usr/lib/systemd/system/node_exporter.service <<EOF [Unit] Description=node_exporter Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/data0/prometheus/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target EOF
- Create service under ubuntu
cat > /etc/systemd/system/node_exporter.service <<EOF [Unit] Description=node_exporter Documentation=https://prometheus.io/ After=network.target [Service] Type=simple User=prometheus ExecStart=/data0/prometheus/node_exporter/node_exporter Restart=on-failure [Install] WantedBy=multi-user.target EOF
3. Start service
systemctl daemon-reload systemctl stop node_exporter.service systemctl enable node_exporter.service systemctl restart node_exporter.service
4. Operation status
systemctl status node_exporter.service
5. Customer monitoring end data report
Visit: http://192.168.56.11:9100/metrics View the specific data that can be retrieved from the exporter as follows:
III. deployment of Alertmanager nail alarm
1. Download and unzip the installation package
cd /usr/local/src/ export VER="0.19.0" wget https://github.com/prometheus/alertmanager/releases/download/v${VER}/alertmanager-${VER}.linux-amd64.tar.gz mkdir -p /data0/prometheus groupadd prometheus useradd -g prometheus prometheus -d /data0/prometheus tar -xvf alertmanager-${VER}.linux-amd64.tar.gz cd /usr/local/src/ mv alertmanager-${VER}.linux-amd64 /data0/prometheus/alertmanager chown -R prometheus.prometheus /data0/prometheus
2. Configure Alertmanager
Alert manager's webhook integrates nail alarm. Nail robot has strict requirements on file format, so it can only be sent to your nail robot through specific format conversion. Someone has written the conversion plug-in, so use it directly (https://github.com/timonwong/prometheus-webhook-dingtalk.git)
cat >/data0/prometheus/alertmanager/alertmanager.yml<<-"EOF" # Global configuration item global: resolve_timeout: 5m # Processing timeout, 5 min by default # Define routing tree information route: group_by: [alertname] # Alarm group by receiver: ops_notify # Set default recipient group_wait: 30s # Initially, how long does it take for the first time to send a set of alerts group_interval: 60s # Waiting time before sending a new alert repeat_interval: 1h # Repeat the alarm time. Default 1h routes: - receiver: ops_notify # Basic alarm notice group_wait: 10s match_re: alertname: Instance survival alarm|Disk usage alarm # Send name in matching alarm rules - receiver: info_notify # Message alert notification group_wait: 10s match_re: alertname: Memory usage alarm|CPU Usage alarm # Define basic alarm receiver receivers: - name: ops_notify webhook_configs: - url: http://localhost:8060/dingtalk/ops_dingding/send send_resolved: true # Notify when the alert is resolved # Define message alarm receiver - name: info_notify webhook_configs: - url: http://localhost:8060/dingtalk/info_dingding/send send_resolved: true # An initiation rule is a rule that invalidates an alert matching a set of matchers when an alert matching another set of matchers exists. Both alerts must have the same set of labels. inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] EOF
3. Start alertmanager
cat >/lib/systemd/system/alertmanager.service<<\EOF [Unit] Description=Prometheus: the alerting system Documentation=http://prometheus.io/docs/ After=prometheus.service [Service] ExecStart=/data0/prometheus/alertmanager/alertmanager --config.file=/data0/prometheus/alertmanager/alertmanager.yml Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target EOF systemctl enable alertmanager.service systemctl stop alertmanager.service systemctl restart alertmanager.service systemctl status alertmanager.service #View port netstat -anpt | grep 9093
4. Connect the pin to Prometheus AlertManager WebHook
#The command line tests whether the robot can send messages successfully. Sometimes prometheus webhook dingtalk will report an error of 422, because of the pin's security restrictions (the security policy here is to send messages, which can only be sent normally if prometheus is included) curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9 curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223
- 4.1. Binary package deployment plug-in
cd /usr/local/src/ export VER="0.3.0" wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v${VER}/prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz tar -zxvf prometheus-webhook-dingtalk-${VER}.linux-amd64.tar.gz mv prometheus-webhook-dingtalk-${VER}.linux-amd64 /data0/prometheus/alertmanager/prometheus-webhook-dingtalk #How to use: Prometheus webhook dingtalk -- Ding. Profile = the value of the receiving group = the value of webhook cat > /etc/systemd/system/prometheus-webhook-dingtalk.service<<\EOF [Unit] Description=prometheus-webhook-dingtalk After=network-online.target [Service] Restart=on-failure ExecStart=/data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk \ --ding.profile=ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9 \ --ding.profile=info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223 [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl stop prometheus-webhook-dingtalk systemctl restart prometheus-webhook-dingtalk systemctl status prometheus-webhook-dingtalk netstat -nltup|grep 8060
- 4.2. docker deployment plug-ins
docker pull timonwong/prometheus-webhook-dingtalk:v0.3.0 #docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="<web-hook-name>=<dingtalk-webhook>" docker run -d --restart always -p 8060:8060 timonwong/prometheus-webhook-dingtalk:v0.3.0 --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223" Two variables are explained here: < web hook name >: Prometheus webhook dingtalk supports multiple pin webhooks. Different webhooks are mapped by name corresponding to URL. To support multiple pin webhooks, multiple -- ding.profile parameters can be used, for example: sudo dockerun - D -- restart always - P 8060:8060:8060 timwong / profileus webhook-dingtalk: v0.3.0 -- ding.profile="webhook1 = https://oapi.dingtalk.com/robot/send? Access \ u token = token 1 -- ding.profile =" webhook2 = https://oapi.dingtalk.com/robot.com/robot/send? Access? Access? Access? Access? Access? Access? Access? Access? Token 1 "--ding.profile =" webhook2 = https://oapi.dingtalk.com/robot.com/robot/send? Access? Access? Access? Access? Access? Access? Access? Access? Access? Access? Accessing.profile = "webhook2 = https://oapi.dingtalk.com/robot.com SS Ou token = t Oken2 ". The corresponding rules of name and URL are as follows: ding.profile="webhook1 =...", and the corresponding API URL is: http://localhost:8060/dingtalk/webhook1/send < dingtalk webhook >: This is the previously obtained pin webhook
- 4.3. Plug in deployment in source mode
#Installing the golang environment cd /usr/local/src/ wget https://dl.google.com/go/go1.13.4.linux-amd64.tar.gz tar -zxvf go1.13.4.linux-amd64.tar.gz mv go/ /usr/local/ #vim /etc/profile export GOROOT=/usr/local/go export PATH=$PATH:$GOROOT/bin #Add environment variable GOPATH mkdir -p /opt/path export GOPATH=/opt/path #If $GOPATH/bin is not added to $PATH, you need to move its executable file to $GOBIN export GOPATH=/opt/path export PATH=$PATH:$GOROOT/bin:$GOPATH/bin source /etc/profile #Download plugins cd /usr/local/src/ git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git cd prometheus-webhook-dingtalk go get github.com/timonwong/prometheus-webhook-dingtalk/cmd/prometheus-webhook-dingtalk make #(after make succeeds, a Prometheus webhook dingtalk binary will be generated.) #Copy the pin alarm plug-in to the alertmanager directory cp prometheus-webhook-dingtalk /data0/prometheus/alertmanager/ #Startup service nohup /data0/prometheus/alertmanager/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=18f977769d50518e9d4f99a0d5dc1376f05615b61ea3639a87f106459f75b5c9" --ding.profile="info_dingding=https://oapi.dingtalk.com/robot/send?access_token=11a0496d0af689d56a5861ae34dc47d9f1607aee6f342747442cc83e36715223" 2>&1 1>/tmp/dingding.log & #Check port netstat -anpt | grep 8060
IV. Grafana installation and configuration
1. Download and install
cd /usr/local/src/ export VER="6.4.3" wget https://dl.grafana.com/oss/release/grafana-${VER}-1.x86_64.rpm yum localinstall -y grafana-${VER}-1.x86_64.rpm
2. Start service
systemctl daemon-reload systemctl enable grafana-server.service systemctl stop grafana-server.service systemctl restart grafana-server.service
3. Access to WEB interface
Default account / password: admin/admin http://192.168.56.11:3000
4. Grafana adds data source
On the landing page, click the "configuration data sources" button to jump to the add data source page. The configuration is as follows: Name: prometheus Type: prometheus URL: http://192.168.56.11:9090 Access: Server Cancel the check of Default, and click "Add" for other defaults, as follows: Plug in for pie chart needs to be installed grafana-cli plugins install grafana-piechart-panel systemctl restart grafana-server.service Please make sure that the pie chart can be added normally after installation. Install the consumer data source plug-in grafana-cli plugins install sbueringer-consul-datasource systemctl restart grafana-server.service
V. replacing grafana's dashboards
https://grafana.com/dashboards
https://grafana.com/grafana/dashboards/11074 basic monitoring new https://grafana.com/dashboards/8919 basic monitoring https://grafana.com/dashboards/7362 database monitoring
Reference documents:
https://www.jianshu.com/p/e59cfd15612e Centos 7 deployment Prometheus, Alertmanager, Grafana monitoring Linux host
https://juejin.im/entry/5c2c4a7f6fb9a049b82a90ee Monitoring Ceph with Prometheus
https://blog.csdn.net/xiegh2014/article/details/84936174 Centos7.5 prometheus2.5 + grafana 5.4 monitoring deployment
https://www.cnblogs.com/smallSevens/p/7805842.html Grafana+Prometheus to build an all-round stereo monitoring system
https://www.cnblogs.com/sfnz/p/6566951.html install prometheus+grafana to monitor mysql redis kubernetes et al
https://blog.csdn.net/hzs33/article/details/86553259 prometheus+grafana monitor mysql and canal servers