Construction practice of 100 million ELK log platform

This article mainly talks about the real experience in work, how to build a 100 million log platform, and teach you how to establish such a 100 million ELK system. For the specific development process of the log platform, please refer to the previous chapter "Evolution from ELK to EFK"

No more nonsense. The old drivers are seated and we are ready to start~~~

Overall architecture

The overall architecture is mainly divided into four modules, which provide different functions respectively

Filebeat: lightweight data collection engine. Based on the original logstash fowarder source code. In other words: filebeat is the new version of logstash fowarder, and it will also be the first choice of ELK Stack in the Agent.

Kafka: data buffer queue. As a message queue, it decouples the processing process and improves the scalability. With peak processing capacity, the use of message queuing can make key components withstand the sudden access pressure without completely crashing due to sudden overloaded requests.

Logstash: data collection and processing engine. Support the dynamic collection of data from various data sources, filter, analyze, enrich and unify the data, and then store it for subsequent use.

Elasticsearch: distributed search engine. It has the characteristics of high scalability, high reliability and easy management. It can be used for full-text retrieval, structured retrieval and analysis, and can combine the three. Elasticsearch is developed based on Lucene and is now one of the most widely used open source search engines. Wikipedia, StackOverflow, Github, etc. all build their own search engines based on it.

Kibana: visualization platform. It can search and display index data stored in Elasticsearch. Using it, you can easily display and analyze data with charts, tables and maps.

Version Description

Copy after login

Filebeat: 6.2.4
Kafka: 2.11-1
Logstash: 6.2.4
Elasticsearch: 6.2.4
Kibana: 6.2.4

The corresponding version is best to download the corresponding plug-in

Concrete practice

Let's take the common Nginx log as an example. The log content is in JSON format

Copy after login

{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}
{"@timestamp":"2017-12-27T16:38:17+08:00","host":"192.168.56.11","clientip":"192.168.56.11","size":26,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"192.168.56.11","url":"/nginxweb/index.html","domain":"192.168.56.11","xff":"-","referer":"-","status":"200"}

Filebeat

Why use Filebeat instead of the original Logstash?

The reason is very simple, and the resource consumption is relatively large.

Since Logstash runs on the JVM and consumes a lot of resources, the author later wrote a lightweight Agent called Logstash forwarder with less functions but less resource consumption with GO.

Later, the author joined elastic The development of logstash forwarder in CO company was carried out by the internal GO team of the company, and finally named Filebeat.

Filebeat needs to be deployed on each application server. You can push and install the configuration through Salt.

download

Copy after login

$ wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz

decompression

Copy after login

tar -zxvf filebeat-6.2.4-darwin-x86_64.tar.gz
mv filebeat-6.2.4-darwin-x86_64 filebeat
cd filebeat

Modify configuration

Modify the Filebeat configuration to support the collection of local directory logs and output the logs to the Kafka cluster

Copy after login

$ vim fileat.yml
filebeat.prospectors:
- input_type: log
  paths:
    -  /opt/logs/server/nginx.log
  json.keys_under_root: true
  json.add_error_key: true
  json.message_key: log

output.kafka:   
  hosts: ["192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092"]
  topic: 'nginx'

After Filebeat 6.0, some configuration parameters have changed greatly, such as document_type is not supported. You need to use fields instead, and so on.

start-up

Copy after login

$ ./filebeat -e -c filebeat.yml

Kafka

The number of nodes in the Kafka cluster in the production environment is recommended to be (2N + 1). Here is an example of three nodes

download

Download Kafka directly from the official website

Copy after login

$ wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

decompression

Copy after login

tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka

Modify Zookeeper configuration

Modify Zookeeper configuration and set up Zookeeper clusters (2N + 1)

ZK cluster is recommended to adopt Kafka to reduce the interference of network related factors

Copy after login

$ vim zookeeper.properties

tickTime=2000
dataDir=/opt/zookeeper
clientPort=2181
maxClientCnxns=50
initLimit=10
syncLimit=5

server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

Add a myid file under the Zookeeper data directory, which represents the zookeeper node id (1, 2, 3), and ensure that it is not repeated

Copy after login

$ vim /opt/zookeeper/myid
1

Start the Zookeeper node

Start three Zookeeper nodes respectively to ensure high availability of the cluster

Copy after login

$ ./zookeeper-server-start.sh -daemon ./config/zookeeper.properties

Modify Kafka configuration

There are three Kafka clusters. You can modify the Kafka configuration one by one. Note that the broker ID (1, 2, 3)

Copy after login

$ vim ./config/server.properties
broker.id=1
port=9092
host.name=192.168.0.1
num.replica.fetchers=1
log.dirs=/opt/kafka_logs
num.partitions=3
zookeeper.connect=192.168.0.1: 192.168.0.2: 192.168.0.3:2181
zookeeper.connection.timeout.ms=6000
zookeeper.sync.time.ms=2000
num.io.threads=8
num.network.threads=8
queued.max.requests=16
fetch.purgatory.purge.interval.requests=100
producer.purgatory.purge.interval.requests=100
delete.topic.enable=true

Start Kafka cluster

Start three Kafka nodes respectively to ensure the high availability of the cluster

Copy after login

$ ./bin/kafka-server-start.sh -daemon ./config/server.properties

Check whether the topic is created successfully

Copy after login

$ bin/kafka-topics.sh --list --zookeeper localhost:2181

nginx

Monitor Kafka Manager

Kafka manager is an open source cluster management tool of Yahoo.

You can download and install on Github: https://github.com/yahoo/kafka-manager

If Kafka consumption is not timely, you can add partition on the specific cluster page. Kafka improves the speed of concurrent consumption through partition partition

Logstash

Logstash provides three functions

  • INPUT entry
  • FILTER filtering function
  • OUTPUT out

If you use the Filter function, it is highly recommended Grok debugger To pre parse the log format.

download

Post login replication

$ wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz

Unzip rename

Copy after login

$ tar -zxvf logstash-6.2.4.tar.gz
$ mv logstash-6.2.4 logstash

Modify Logstash configuration

Modify Logstash configuration to provide indexer function and insert data into Elasticsearch cluster

Copy after login

$ vim nginx.conf

input {
  kafka {
    type => "kafka"
    bootstrap_servers => "192.168.0.1:2181,192.168.0.2:2181,192.168.0.3:2181"
    topics => "nginx"
    group_id => "logstash"
    consumer_threads => 2
  }
}

output {
  elasticsearch {
    host => ["192.168.0.1","192.168.0.2","192.168.0.3"]
    port => "9300"
    index => "nginx-%{+YYYY.MM.dd}"
  }
}

Start Logstash

Copy after login

$ ./bin/logstash -f nginx.conf

Elasticsearch

download

Copy after login

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz

decompression

Copy after login

$ tar -zxvf elasticsearch-6.2.4.tar.gz
$ mv elasticsearch-6.2.4.tar.gz elasticsearch

Modify configuration

Copy after login

$ vim config/elasticsearch.yml

cluster.name: es 
node.name: es-node1
network.host: 192.168.0.1
discovery.zen.ping.unicast.hosts: ["192.168.0.1"]
discovery.zen.minimum_master_nodes: 1

start-up

Start in the background through - d

Copy after login

$ ./bin/elasticsearch -d

Open web page http://192.168.0.1:9200/ , if the following message appears, the configuration is successful

Copy after login

{
    name: "es-node1",
    cluster_name: "es",
    cluster_uuid: "XvoyA_NYTSSV8pJg0Xb23A",
    version: {
        number: "6.2.4",
        build_hash: "ccec39f",
        build_date: "2018-04-12T20:37:28.497551Z",
        build_snapshot: false,
        lucene_version: "7.2.1",
        minimum_wire_compatibility_version: "5.6.0",
        minimum_index_compatibility_version: "5.0.0"
    },
    tagline: "You Know, for Search"
}

Console

The name cerebro may seem strange to you. In fact, its name used to be kopf! Because Elasticsearch 5.0 no longer supports site plugin, the kopf author abandoned the original project and started cerebro. In the form of independent single page application, he continued to support the management of Elasticsearch under the new version.

Attention

  1. The Master is separated from the Data node. When there are more than three Data nodes, it is recommended to separate responsibilities to reduce the pressure
  2. The memory of the Data Node does not exceed 32G. It is recommended to set it to 31G. See the previous article for specific reasons
  3. discovery.zen.minimum_master_nodes is set to (total / 2 + 1) to avoid cerebral fissure
  4. The most important thing is not to expose ES to the public network. It is recommended to install X-PACK to strengthen its security

kibana

download

Copy after login

$ wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz

decompression

Copy after login

$ tar -zxvf kibana-6.2.4-darwin-x86_64.tar.gz
$ mv kibana-6.2.4-darwin-x86_64.tar.gz kibana

Modify configuration

Post login replication

$ vim config/kibana.yml

server.port: 5601
server.host: "192.168.0.1"
elasticsearch.url: "http://192.168.0.1:9200"

Start Kibana

Copy after login

$ nohup ./bin/kibana &

Interface display

To create an index page, you need to specify it by prefix in , management - > index patterns ,

Final effect display

summary

To sum up, the above deployment commands are used to realize the whole set of components of ELK, including all the processes of log collection, filtering, indexing and visualization. The log analysis function is realized based on this system. At the same time, by horizontally expanding Kafka and Elasticsearch clusters, the daily average log real-time processing can be realized.

Keywords: ELK

Added by ciaran on Sat, 19 Feb 2022 19:44:17 +0200