Prometheus monitoring deployment + Grafana visual display

1. Introduction to Prometheus

prometheus (developed by go language (golang)) is a combination of open source monitoring & Alarm & time series database. Suitable for monitoring docker containers. Because the popularity of kubernetes (commonly known as k8s) has driven the development of prometheus.
Prometheus official website

2. Time series data

2.1 what is time series data

Time series data: data that records system and equipment status changes in time order is called time series data.

Application scenario:

  • The longitude, latitude, speed, direction, distance of nearby objects, etc. to be recorded during the operation of driverless vehicles. Record and analyze the data all the time.
  • Track data of vehicles in a certain area
  • Real time transaction data of traditional securities industry
  • Real time operation and maintenance monitoring data

2.2 features

  1. Good performance
    The performance of relational database for large-scale data processing is poor. NOSQL can handle large-scale data better, but it is still not as good as time series database.
  2. Low storage cost
    Efficient compression algorithm saves storage space and effectively reduces IO
    Prometheus has a very efficient time series data storage method. Each sampling data only occupies about 3.5byte space, millions of time series, 30 second interval, retention for 60 days, and it takes about more than 200 G (from official data).

3. Main features of Prometheus

The flexible query language of multi-dimensional data model does not rely on distributed storage. A single server node is independent. It can pull time series data through the pull model through HTTP. It can also support the push model through the intermediate gateway. It can discover the target service object through service discovery or static configuration, and support a variety of charts and interface displays.

4. Schematic diagram of Prometheus


Prometheus grabs indicators from the detected jobs directly or through the intermediary push gateway for short-term jobs. It stores all captured samples locally and runs rules on these data to aggregate and record new time series from existing data or generate alerts. Grafana or other API users can be used to visualize the collected data.

5. Applicable scenarios

5.1 when is it suitable?

Prometheus is suitable for recording any pure digital time series. It is suitable for both machine centered monitoring and highly dynamic service-oriented architecture. In the world of microservices, its support for multidimensional data collection and query is a special advantage.

Prometheus is designed for reliability and becomes the system you access during power outages, allowing you to quickly diagnose problems. Each Prometheus server is independent and does not rely on network storage or other remote services. When other parts of the infrastructure are broken, you can rely on it, and you don't need to set up a lot of infrastructure to use it.

5.2 when is not appropriate?

Prometheus values reliability. Even in the event of a failure, you can view available statistics about the system at any time. If you need 100% accuracy, such as billing on request, Prometheus is not a good choice because the data collected may not be detailed and complete. In this case, you'd better use other systems to collect and analyze billing data, and use Prometheus for other monitoring.

6. Prometheus+Grafana deployment

Environmental requirements

systemhost nameIPRequired services
Centos8server192.168.249.141prometheus-2.28.0
Centos8agent192.168.249.145node_exporter-1.1.2
grafana

Related software Download address

Deploy Prometheus on the server host

//Download installation package
[root@server ~]# ls
anaconda-ks.cfg  prometheus-2.28.0.linux-amd64.tar.gz
//decompression
[root@server ~]# tar xf prometheus-2.28.0.linux-amd64.tar.gz 
[root@server ~]# ls
anaconda-ks.cfg  prometheus-2.28.0.linux-amd64  prometheus-2.28.0.linux-amd64.tar.gz
[root@server ~]# mv prometheus-2.28.0.linux-amd64 /usr/local/prometheus
[root@server ~]# ls /usr/local/
bin  etc  games  include  lib  lib64  libexec  prometheus  sbin  share  src

//Check the help document of the main program and how to start the main program
[root@server ~]# cd /usr/local/prometheus/
[root@server prometheus]# ls
console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool
[root@server prometheus]# ./prometheus --help
usage: prometheus [<flags>]

The Prometheus monitoring server

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
      --config.file="prometheus.yml"    #This is the startup method
                                 Prometheus configuration file path.
      --web.listen-address="0.0.0.0:9090"  
                                 Address to listen on for UI, API, and telemetry.
      --web.config.file=""       [EXPERIMENTAL] Path to configuration file that can enable TLS or
                                 authentication.
      --web.read-timeout=5m      Maximum duration before timing out read of the request, and closing idle
                                 connections.
      --web.max-connections=512  Maximum number of simultaneous connections.
      --web.external-url=<URL>   The URL under which Prometheus is externally reachable (for example, if
                                 Prometheus is served via a reverse proxy). Used for generating relative and
                                 absolute links back to Prometheus itself. If the URL has a path portion, it
                                 will be used to prefix all HTTP endpoints served by Prometheus. If omitted,
                                 relevant URL components will be derived automatically.
      --web.route-prefix=<path>  Prefix for the internal routes of web endpoints. Defaults to path of
                                 --web.external-url.
      --web.user-assets=<path>   Path to static asset directory, available at /user.
      --web.enable-lifecycle     Enable shutdown and reload via HTTP request.
      --web.enable-admin-api     Enable API endpoints for admin control actions.
      --web.console.templates="consoles"  
                                 Path to the console template directory, available at /consoles.
      --web.console.libraries="console_libraries"  
                                 Path to the console library directory.
      --web.page-title="Prometheus Time Series Collection and Processing Server"  
                                 Document title of Prometheus instance.
      --web.cors.origin=".*"     Regex for CORS origin. It is fully anchored. Example:
                                 'https?://(domain1|domain2)\.com'
      --storage.tsdb.path="data/"  
                                 Base path for metrics storage.
      --storage.tsdb.retention=STORAGE.TSDB.RETENTION  
                                 [DEPRECATED] How long to retain samples in storage. This flag has been
                                 deprecated, use "storage.tsdb.retention.time" instead.
      --storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME  
                                 How long to retain samples in storage. When this flag is set it overrides
                                 "storage.tsdb.retention". If neither this flag nor "storage.tsdb.retention" nor
                                 "storage.tsdb.retention.size" is set, the retention time defaults to 15d. Units
                                 Supported: y, w, d, h, m, s, ms.
      --storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE  
                                 [EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. A unit is
                                 required, supported units: B, KB, MB, GB, TB, PB, EB. Ex: "512MB". This flag is
                                 experimental and can be changed in future releases.
      --storage.tsdb.no-lockfile  
                                 Do not create lockfile in data directory.
      --storage.tsdb.allow-overlapping-blocks  
                                 [EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical
                                 compaction and vertical query merge.
      --storage.tsdb.wal-compression  
                                 Compress the tsdb WAL.
      --storage.remote.flush-deadline=<duration>  
                                 How long to wait flushing sample on shutdown or config reload.
      --storage.remote.read-sample-limit=5e7  
                                 Maximum overall number of samples to return via the remote read interface, in a
                                 single query. 0 means no limit. This limit is ignored for streamed response
                                 types.
      --storage.remote.read-concurrent-limit=10  
                                 Maximum number of concurrent remote read calls. 0 means no limit.
      --storage.remote.read-max-bytes-in-frame=1048576  
                                 Maximum number of bytes in a single frame for streaming remote read response
                                 types before marshalling. Note that client might have limit on frame size as
                                 well. 1MB as recommended by protobuf by default.
      --storage.exemplars.exemplars-limit=100000  
                                 [EXPERIMENTAL] Maximum number of exemplars to store in in-memory exemplar
                                 storage total. 0 disables the exemplar storage. This flag is effective only
                                 with --enable-feature=exemplar-storage.
      --rules.alert.for-outage-tolerance=1h  
                                 Max time to tolerate prometheus outage for restoring "for" state of alert.
      --rules.alert.for-grace-period=10m  
                                 Minimum duration between alert and restored "for" state. This is maintained
                                 only for alerts with configured "for" time greater than grace period.
      --rules.alert.resend-delay=1m  
                                 Minimum amount of time to wait before resending an alert to Alertmanager.
      --alertmanager.notification-queue-capacity=10000  
                                 The capacity of the queue for pending Alertmanager notifications.
      --query.lookback-delta=5m  The maximum lookback duration for retrieving metrics during expression
                                 evaluations and federation.
      --query.timeout=2m         Maximum time a query may take before being aborted.
      --query.max-concurrency=20  
                                 Maximum number of queries executed concurrently.
      --query.max-samples=50000000  
                                 Maximum number of samples a single query can load into memory. Note that
                                 queries will fail if they try to load more samples than this into memory, so
                                 this also limits the number of samples a query can return.
      --enable-feature= ...      Comma separated feature names to enable. Valid options: promql-at-modifier,
                                 promql-negative-offset, remote-write-receiver, exemplar-storage,
                                 expand-external-labels. See
                                 https://prometheus.io/docs/prometheus/latest/disabled_features/ for more
                                 details.
      --log.level=info           Only log messages with the given severity or above. One of: [debug, info, warn,
                                 error]
      --log.format=logfmt        Output format of log messages. One of: [logfmt, json]
//Direct start
[root@server prometheus]# ./prometheus --config.file="prometheus.yml" 
[root@server prometheus]# ss -antl
State        Recv-Q       Send-Q             Local Address:Port             Peer Address:Port      Process      
LISTEN       0            128                      0.0.0.0:22                    0.0.0.0:*                      
LISTEN       0            128                            *:9090                        *:*                      
LISTEN       0            128                         [::]:22                       [::]:*                      

Turn off firewall and selinux

[root@server ~]# systemctl stop firewalld
[root@server ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@server ~]# setenforce 0
[root@server ~]# vim /etc/selinux/config 
selinux=disabled

6.1 login in web interface


View the monitoring list. At present, only this host is in the monitoring list


6.2 monitor other hosts

Install node on the host to be monitored_ exporter-1.1.2. linux-amd64. tar. GZ component, which can be downloaded from the official website

//decompression
[root@agent ~]# ls
anaconda-ks.cfg  node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# tar xf node_exporter-1.1.2.linux-amd64.tar.gz 
[root@agent ~]# ls
anaconda-ks.cfg  node_exporter-1.1.2.linux-amd64  node_exporter-1.1.2.linux-amd64.tar.gz
[root@agent ~]# mv node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter

//start-up
[root@agent node_exporter]# nohup /usr/local/node_exporter/node_exporter &
[1] 10337
[root@agent node_exporter]# Nohup: ignore input and append output to 'nohup out'

[root@agent node_exporter]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port               Peer Address:Port       Process       
LISTEN        0             128                        0.0.0.0:22                      0.0.0.0:*                        
LISTEN        0             128                              *:9100                          *:*                        
LISTEN        0             128                           [::]:22                         [::]:*                        

View monitoring information

Added by Pudgemeister on Sun, 23 Jan 2022 01:05:34 +0200