Prometheus monitoring deployment + Grafana visual display

1. Introduction to Prometheus

prometheus (developed by go language (golang)) is a combination of open source monitoring & Alarm & time series database. Suitable for monitoring docker containers. Because the popularity of kubernetes (commonly known as k8s) has driven the development of prometheus.
Prometheus official website

2. Time series data

2.1 what is time series data

Time series data: data that records system and equipment status changes in time order is called time series data.

Application scenario:

The longitude, latitude, speed, direction, distance of nearby objects, etc. to be recorded during the operation of driverless vehicles. Record and analyze the data all the time.
Track data of vehicles in a certain area
Real time transaction data of traditional securities industry
Real time operation and maintenance monitoring data

2.2 features

Good performance
The performance of relational database for large-scale data processing is poor. NOSQL can handle large-scale data better, but it is still not as good as time series database.
Low storage cost
Efficient compression algorithm saves storage space and effectively reduces IO
Prometheus has a very efficient time series data storage method. Each sampling data only occupies about 3.5byte space, millions of time series, 30 second interval, retention for 60 days, and it takes about more than 200 G (from official data).

3. Main features of Prometheus

The flexible query language of multi-dimensional data model does not rely on distributed storage. A single server node is independent. It can pull time series data through the pull model through HTTP. It can also support the push model through the intermediate gateway. It can discover the target service object through service discovery or static configuration, and support a variety of charts and interface displays.

4. Schematic diagram of Prometheus

Prometheus grabs indicators from the detected jobs directly or through the intermediary push gateway for short-term jobs. It stores all captured samples locally and runs rules on these data to aggregate and record new time series from existing data or generate alerts. Grafana or other API users can be used to visualize the collected data.

5. Applicable scenarios

5.1 when is it suitable?

Prometheus is suitable for recording any pure digital time series. It is suitable for both machine centered monitoring and highly dynamic service-oriented architecture. In the world of microservices, its support for multidimensional data collection and query is a special advantage.

Prometheus is designed for reliability and becomes the system you access during power outages, allowing you to quickly diagnose problems. Each Prometheus server is independent and does not rely on network storage or other remote services. When other parts of the infrastructure are broken, you can rely on it, and you don't need to set up a lot of infrastructure to use it.

5.2 when is not appropriate?

Prometheus values reliability. Even in the event of a failure, you can view available statistics about the system at any time. If you need 100% accuracy, such as billing on request, Prometheus is not a good choice because the data collected may not be detailed and complete. In this case, you'd better use other systems to collect and analyze billing data, and use Prometheus for other monitoring.

6. Prometheus+Grafana deployment

Environmental requirements

system	host name	IP	Required services
Centos8	server	192.168.249.141	prometheus-2.28.0
Centos8	agent	192.168.249.145	node_exporter-1.1.2 grafana

Related software Download address

Deploy Prometheus on the server host

//Download installation package
[root@server ~]# ls
anaconda-ks.cfg prometheus-2.28.0.linux-amd64.tar.gz
//decompression
[root@server ~]# tar xf prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# ls
anaconda-ks.cfg prometheus-2.28.0.linux-amd64 prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# mv prometheus-2.28.0.linux-amd64 /usr/local/prometheus
[root@server ~]# ls /usr/local/
bin etc games include lib lib64 libexec prometheus sbin share src

//Check the help document of the main program and how to start the main program
[root@server ~]# cd /usr/local/prometheus/
[root@server prometheus]# ls
console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
[root@server prometheus]# ./prometheus --help
usage: prometheus [<flags>]

The Prometheus monitoring server

Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml" #This is the startup method
Prometheus configuration file path.
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and telemetry.
--web.config.file="" [EXPERIMENTAL] Path to configuration file that can enable TLS or
authentication.
--web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle
connections.
--web.max-connections=512 Maximum number of simultaneous connections.
--web.external-url=<URL> The URL under which Prometheus is externally reachable (for example, if
Prometheus is served via a reverse proxy). Used for generating relative and
absolute links back to Prometheus itself. If the URL has a path portion, it
will be used to prefix all HTTP endpoints served by Prometheus. If omitted,
relevant URL components will be derived automatically.
--web.route-prefix=<path> Prefix for the internal routes of web endpoints. Defaults to path of
--web.external-url.
--web.user-assets=<path> Path to static asset directory, available at /user.
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
--web.enable-admin-api Enable API endpoints for admin control actions.
--web.console.templates="consoles"
Path to the console template directory, available at /consoles.
--web.console.libraries="console_libraries"
Path to the console library directory.
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored. Example:
'https?://(domain1|domain2)\.com'
--storage.tsdb.path="data/"
Base path for metrics storage.
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in storage. This flag has been
deprecated, use "storage.tsdb.retention.time" instead.
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When this flag is set it overrides
"storage.tsdb.retention". If neither this flag nor "storage.tsdb.retention" nor
"storage.tsdb.retention.size" is set, the retention time defaults to 15d. Units
Supported: y, w, d, h, m, s, ms.
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. A unit is
required, supported units: B, KB, MB, GB, TB, PB, EB. Ex: "512MB". This flag is
experimental and can be changed in future releases.
--storage.tsdb.no-lockfile
Do not create lockfile in data directory.
--storage.tsdb.allow-overlapping-blocks
[EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical
compaction and vertical query merge.
--storage.tsdb.wal-compression
Compress the tsdb WAL.
--storage.remote.flush-deadline=<duration>
How long to wait flushing sample on shutdown or config reload.
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via the remote read interface, in a
single query. 0 means no limit. This limit is ignored for streamed response
types.
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls. 0 means no limit.
--storage.remote.read-max-bytes-in-frame=1048576
Maximum number of bytes in a single frame for streaming remote read response
types before marshalling. Note that client might have limit on frame size as
well. 1MB as recommended by protobuf by default.
--storage.exemplars.exemplars-limit=100000
[EXPERIMENTAL] Maximum number of exemplars to store in in-memory exemplar
storage total. 0 disables the exemplar storage. This flag is effective only
with --enable-feature=exemplar-storage.
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for restoring "for" state of alert.
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored "for" state. This is maintained
only for alerts with configured "for" time greater than grace period.
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending an alert to Alertmanager.
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending Alertmanager notifications.
--query.lookback-delta=5m The maximum lookback duration for retrieving metrics during expression
evaluations and federation.
--query.timeout=2m Maximum time a query may take before being aborted.
--query.max-concurrency=20
Maximum number of queries executed concurrently.
--query.max-samples=50000000
Maximum number of samples a single query can load into memory. Note that
queries will fail if they try to load more samples than this into memory, so
this also limits the number of samples a query can return.
--enable-feature= ... Comma separated feature names to enable. Valid options: promql-at-modifier,
promql-negative-offset, remote-write-receiver, exemplar-storage,
expand-external-labels. See
https://prometheus.io/docs/prometheus/latest/disabled_features/ for more
details.
--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn,
error]
--log.format=logfmt Output format of log messages. One of: [logfmt, json]
//Direct start
[root@server prometheus]# ./prometheus --config.file="prometheus.yml"
[root@server prometheus]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 *:9090 *:*
LISTEN 0 128 [::]:22 [::]:*

Turn off firewall and selinux

[root@server ~]# systemctl stop firewalld
[root@server ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@server ~]# setenforce 0
[root@server ~]# vim /etc/selinux/config 
selinux=disabled

6.1 login in web interface

View the monitoring list. At present, only this host is in the monitoring list

6.2 monitor other hosts

Install node on the host to be monitored_ exporter-1.1.2. linux-amd64. tar. GZ component, which can be downloaded from the official website

//decompression
[root@agent ~]# ls
anaconda-ks.cfg  node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# tar xf node_exporter-1.1.2.linux-amd64.tar.gz 
[root@agent ~]# ls
anaconda-ks.cfg  node_exporter-1.1.2.linux-amd64  node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# mv node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter

//start-up
[root@agent node_exporter]# nohup /usr/local/node_exporter/node_exporter &
[1] 10337
[root@agent node_exporter]# Nohup: ignore input and append output to 'nohup out'

[root@agent node_exporter]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port               Peer Address:Port       Process       
LISTEN        0             128                        0.0.0.0:22                      0.0.0.0:*                        
LISTEN        0             128                              *:9100                          *:*                        
LISTEN        0             128                           [::]:22                         [::]:*

View monitoring information

Added by Pudgemeister on Sun, 23 Jan 2022 01:05:34 +0200

Programming VIP