1. ELK Unified Log Management Platform Part 3 - The Use of Logstash Group Plug-ins
In this blog, I will mainly explain the following knowledge points and practical experience for your reference:
_1. Standard Specification for Log Content of JAVA Applications:
_2. How to use the grok plug-in of logstash to split the message field?
_3. Delete the index of Es regularly:
1. Standard specification for log content of JAVA applications:
Recently, the company has been promoting the ELK project, and I am the operator of the ELK project. So there will be a lot of experience output for ELK project; because our company's business system is mainly developed in JAVA language, especially Spring Cloud, Spring Book and other frameworks. How to standardize the logs of business systems is a problem that R&D architects need to consider. At present, our ELK log specification is defined as follows:
<pattern>[%date{ISO8601}][%level] %logger{80} [%thread] Line:%-3L [%X{TRACE_ID}] ${dev-group-name} ${app-name} - %msg%n</pattern> | Time | Log Level | Class File | Thread Number | Line of Code Occurring | Global Pipeline Number | Development Team | System Name | Log Information Time: Record log generation time; Log level: ERROR, WARN, INFO, DEBUG; Class file: Print class file name; Thread name: the name of the thread executing the operation; Line of code occurrence: Log events occur in the location of the code; Global Flow Number: Global Flow Number that runs through a business process; Development Team: Team Name for System Development System Name: Project Name Group Name INFO: Record detailed log information For example, the standard format of log output for a business system is as follows: [2019-06-2409:32:14,262] [ERROR] com.bqjr.cmm.aps.job.ApsAlarmJob [scheduling-1] [] tstteam tst Line:157 - ApsAlarmJob class execute method,'[test system early warning] check index abnormal three early warning'early warning error: nested exception is org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: java.lang.NullPointerException ### Cause: java.lang.NullPointerException org.mybatis.spring.MyBatisSystemException: nested exception is
2. How to use the grok plug-in of logstash to split the message field?
_Now our logs are output according to standard fields, but in the kibana interface it is still a message field. Now we need to decompose the message into each field, which can be searched by each field.
_Our ELK log platform architecture is: all business systems install filebeat log collection software, collect logs intact to KAFKA cluster, send logstash cluster from Kafka cluster, export logstash cluster to ES cluster, export ES cluster to kibana display and search. The reason why logstash software is used in the middle is that logstash software has powerful text processing functions, such as grok plug-in. It can realize the formatted output of text.
logstash software has built-in many regular expression templates, which can match logs such as nginx, httpd, syslog and so on.
#logstash default group syntax template path: /usr/local/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns #The grok grammar template comes with logstash: [root@SZ1PRDELK00AP005 patterns]# ll total 116 -rw-r--r-- 1 root root 271 Jun 24 16:05 application -rw-r--r-- 1 root root 1831 Apr 13 2018 aws -rw-r--r-- 1 root root 4831 Apr 13 2018 bacula -rw-r--r-- 1 root root 260 Apr 13 2018 bind -rw-r--r-- 1 root root 2154 Apr 13 2018 bro -rw-r--r-- 1 root root 879 Apr 13 2018 exim -rw-r--r-- 1 root root 10095 Apr 13 2018 firewalls -rw-r--r-- 1 root root 5338 Apr 13 2018 grok-patterns -rw-r--r-- 1 root root 3251 Apr 13 2018 haproxy -rw-r--r-- 1 root root 987 Apr 13 2018 httpd -rw-r--r-- 1 root root 1265 Apr 13 2018 java -rw-r--r-- 1 root root 1087 Apr 13 2018 junos -rw-r--r-- 1 root root 1037 Apr 13 2018 linux-syslog -rw-r--r-- 1 root root 74 Apr 13 2018 maven -rw-r--r-- 1 root root 49 Apr 13 2018 mcollective -rw-r--r-- 1 root root 190 Apr 13 2018 mcollective-patterns -rw-r--r-- 1 root root 614 Apr 13 2018 mongodb -rw-r--r-- 1 root root 9597 Apr 13 2018 nagios -rw-r--r-- 1 root root 142 Apr 13 2018 postgresql -rw-r--r-- 1 root root 845 Apr 13 2018 rails -rw-r--r-- 1 root root 224 Apr 13 2018 redis -rw-r--r-- 1 root root 188 Apr 13 2018 ruby -rw-r--r-- 1 root root 404 Apr 13 2018 squid #Among them is a java template, which has built-in many java classes, timestamps, etc. [root@SZ1PRDELK00AP005 patterns]# cat java JAVACLASS (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]* #Space is an allowed character to match special cases like 'Native Method' or 'Unknown Source' JAVAFILE (?:[A-Za-z0-9_. -]+) #Allow special <init>, <clinit> methods JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*) #Line number is optional in special cases 'Native method' or 'Unknown source' JAVASTACKTRACEPART %{SPACE}at %{JAVACLASS:class}\.%{JAVAMETHOD:method}\(%{JAVAFILE:file}(?::%{NUMBER:line})?\) # Java Logs JAVATHREAD (?:[A-Z]{2}-Processor[\d]+) JAVACLASS (?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$]+ JAVAFILE (?:[A-Za-z0-9_.-]+) JAVALOGMESSAGE (.*) # MMM dd, yyyy HH:mm:ss eg: Jan 9, 2014 7:13:13 AM CATALINA_DATESTAMP %{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM) # yyyy-MM-dd HH:mm:ss,SSS ZZZ eg: 2014-01-09 17:32:25,527 -0800 TOMCAT_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) %{ISO8601_TIMEZONE} CATALINALOG %{CATALINA_DATESTAMP:timestamp} %{JAVACLASS:class} %{JAVALOGMESSAGE:logmessage} # 2014-01-09 20:03:28,269 -0800 | ERROR | com.example.service.ExampleService - something compeletely unexpected happened... TOMCATLOG %{TOMCAT_DATESTAMP:timestamp} \| %{LOGLEVEL:level} \| %{JAVACLASS:class} - %{JAVALOGMESSAGE:logmessage} [root@SZ1PRDELK00AP005 patterns]# #But the default template alone can't match our company's custom log content, so I wrote one myself. [root@SZ1PRDELK00AP005 patterns]# cat application APP_DATESTAMP 20%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) THREADS_NUMBER (?:[a-zA-Z0-9-]+) GLOBAL_PIPELINE_NUMBER (?:[a-zA-Z0-9-]+) DEV_TEAM (?:[a-zA-Z0-9-]+) SYSTEM_NAME (?:[a-zA-Z0-9-]+) LINE_NUMBER (Line:[0-9]+) JAVALOGMESSAGE (.*) APPLOG \[%{APP_DATESTAMP:timestamp}\] \[%{LOGLEVEL:loglevel}\] %{JAVACLASS:class} \[%{THREADS_NUMBER:threads_number}\] \[%{GLOBAL_PIPELINE_NUMBER:global_pipeline_number}\] %{DEV_TEAM:team} %{SYSTEM_NAME:system_name} %{LINE_NUMBER:linenumber} %{JAVALOGMESSAGE:logmessage} # Then you configure logstash [root@SZ1PRDELK00AP005 patterns]# cat /usr/local/logstash/config/yunyan.conf input { kafka { bootstrap_servers => "192.168.1.12:9092,192.168.1.14:9092,192.168.1.15:9092" topics_pattern => "elk-tst-tst-info.*" group_id => "test-consumer-group" codec => json consumer_threads => 3 decorate_events => true auto_offset_reset => "latest" } } filter { grok { match => {"message" => ["%{APPLOG}","%{JAVALOGMESSAGE:message}"]} #Note that APPLOG here is the name I defined above. overwrite => ["message"] } } output { elasticsearch { hosts => ["192.168.1.19:9200","192.168.1.24:9200"] user => "elastic" password => "111111" index => "%{[@metadata][kafka][topic]}-%{+YYYY-MM-dd}" workers => 1 } } #output { # stdout{ # codec => "rubydebug" # } #} #It is generally recommended that when debugging, first output to stdout standard output, not directly to es. When standard output confirms that OK has been achieved, all formatted fields can be output separately, and then output to ES. # How to write the regular expression of grok, there is an online grok expression test address: http://grokdebug.herokuapp.com/
_After the standard log content is output, the search can be queried according to the key:value format. For example, in the search bar, the log level: ERROR only searches for the log content with the ERROR level.
3. Delete Es index regularly:
_Index is defined according to the output plug-in configuration in logstash, such as the index by day, which is followed by the index name -%{+YYYY-MM-dd}. If you want to change the index by month, that is -%{+YYYY-MM}. Indexes of different contents should be defined in different ways. For example, the logs of operating system classes can be indexed by month if there are few changes every day. But the program log of the business system itself is more suitable to use the index by day because there are more logs produced every day. Because for elastic search, too large an index can also affect performance, and too many indexes can also affect performance. The main performance bottleneck of elastic search is in CPU
In the process of operation and maintenance ELK project, I found that because the index files are too large and the number of indexes is too large, but our es data node cpu configuration is too low, causing ES cluster crash. There are several ways to solve this problem. The first is to delete the useless index regularly, and the second is to optimize the index parameters of ES. The second point is that I haven't practiced it yet. After that, I summarize the document. First, I write out the methods of deleting the index regularly and manually.
#/bin/bash #Designated date (7 days ago) DATA=`date -d "1 week ago" +%Y-%m-%d` #current date time=`date` #Delete logs 7 days ago curl -u elastic:654321 -XGET "http://192.168.1.19:9200/_cat/indices/?v"|grep $DATA if [ $? == 0 ];then curl -u elastic:654321 -XDELETE "http://127.0.0.1:9200/*-${DATA}" echo "to $time Clear $DATA Indexes!" fi curl -u elastic:654321 \-XGET "http://192.168.1.19:9200/_cat/indices/?v"|awk '{print $3}'|grep elk >> /tmp/es.txt #Delete the index manually, output the index name to a text file, and then delete it through a loop for i in `cat /tmp/es.txt`;do curl -u elastic:654321 -X DELETE "192.168.1.19:9200/$i";done
OK, that's all for the time being. Recently, I've been very busy with my work. It's hard to find time to update my technology blog. Basically, I work late in the evening or wake up early in the morning to update my blog. It's really hard to find time to update my blog because of many tasks during working hours. Thank you for your continued attention.