1, Introduction to awk
awk is a programming language used to process text and data under linux/unix.
The data can come from standard input, one or more files, or the output of other commands.
It supports advanced functions such as user-defined functions and dynamic regular expressions. It is linux/unix
A powerful programming tool under.
It is used on the command line, but it is more used as a script.
awk processes text and data in such a way that it scans the file line by line, from the first line to the last line, looks for lines matching a specific pattern, and performs the operation you want on these lines. If no processing action is specified, the matching lines are displayed to the standard output (screen). If no mode is specified, all the lines specified by the operation are processed.
awk stands for the first letter of the author's last name. Because it is written by three people, Alfred Aho, Brian Kernighan and Peter Weinberger.
gawk is the GNU version of awk, which provides some extensions of Bell Labs and GNU.
2, Two formal syntax formats of awk
awk [options] 'commands' file1 file2
awk [options] -f awk-script-file filenames
options:
-F for the content processed each time, you can specify a sub defined separator. The default separator is a blank character (space or tab key)
command:
BEGIN{} {} END{} Action before processing all content Actions in processing content actions after processing all content
Examples
awk 'BEGIN{print "----Start processing---"} {print "ok"} END{print "----All processed---"}' /etc/hosts ----Start processing--- ok ok ok ----All processed---
BEGIN {} is usually used to define some variables, such as BEGIN{FS=":";OFS = "---"}
3, awk working principle
[root@5e4b448b73e5 ~]# awk -F: '{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 adm 3 ...slightly...
(1)awk will process each line of the file. Each time, it uses one line as input and assigns this line to the internal variable $0. Each line can also be called a record and ends with a newline character
(2) Then, the line is** 😗* (space or tab by default) is broken down into fields (or fields), and each field is stored in a numbered variable, starting with $1,
Up to 100 fields
(3) How does awk know how to separate fields with white space characters? Because there is an internal variable fs to determine the field separator. Initially, FS is assigned as a blank character
(4) When awk prints fields, it will print with the built-in method using the print function. Awk adds a space between the printed fields. This space is an internal variable. The separator and comma of OFS output field will be mapped with OFS. The value of this output separator can be controlled through OFS.
(5) After awk output, another line will be obtained from the file and stored in $0, overwriting the original content, and then the new string will be separated into fields and processed. This process will continue until all rows are processed
4, Record internal variables related to fields:
View help:
man awk
$0: the awk variable $0 holds the contents of the line currently being processed
NR: the line currently being processed is awk the total number of lines processed.
FNR: the line number of the line currently being processed in its file.
NF: the total number of fields when each row is processed
$NF: the value of the last field after the separation of the current processing line
FS: field separator when entering a line. The default is blank character (space tab)
OFS: output field separator, which is a space by default
awk 'BEGIN{FS=":"; OFS="+++"} /^root/{print $1,$2,$3,$4}' /etc/passwd
ORS output record separator. The default is newline
Examples
Merge each line of the file into one line
ORS outputs a record by default and should enter, but a space is added here
awk 'BEGIN{ORS=" "} {print $0}' /etc/passwd
Output:
root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin [root@5e4b448b73e5 ~]#
5, Format output:
printf function
awk -F: '{printf "%-15s %-10s %-15s\n", $1,$2,$3}' /etc/passwd awk -F: '{printf "|%-15s| %-10s| %-15s|\n", $1,$2,$3}' /etc/passwd
- %s character type
- %d decimal integer
- %f floating point type, with 2 digits after the decimal point reserved. printf "%.2f\n" 10
- %-15s takes up 15 characters - indicates left alignment, and the default is right alignment
- printf will not automatically wrap at the end of the line by default, plus \ n
6, awk mode and action
Any awk statement consists of patterns and actions.
The mode section determines when an action statement is triggered.
If the mode part is omitted, the action in the command will act on each line.
The mode can be
- regular expression
- Logical expression
- Compound statement containing regular expression and logical expression
1 regular expression:
- Regular matching the whole row (including):
Is whether the currently processed row contains the specified pattern (written regular expression)
/Regular / regular needs to be written in double slash
! Used for negation, which is to find rows that do not match the regular pattern
The default action of AWK is to print the whole line, so when printing the whole line, the action command of printing can be omitted.
awk '/^root/' /etc/passwd awk '!/^root/' /tec/ passwd
- Regular match a field:
Matching operators that can be used (~ and! ~)
Field ~ / regular/
awk -F: '$3 ~ /^1/' /etc/passwd awk -F: '$NF !~ /bash$/' /etc/passwd
- Match lines that start with bin or start with root
awk -F: '/^(bin|root)/' /etc/passwd # output root:x:0:0:root:/root:/bin/zsh bin:x:1:1:bin:/bin:/sbin/nologin
2 logical expression
The logical expression uses to compare the text. The specified action is executed only when the condition is true.
Logical expressions use * * * relational operator * * * to compare two values, which can be used to compare numbers with strings.
To achieve the exact equality of strings, you need to use = = and=
The string needs to be enclosed in double quotation marks
awk -F: '$NF == "/bin/bash"' /etc/passwd awk -F: '$1 == "root"' /etc/passwd
Other operation symbols:
Relational operators have
< less than, e.g. x < y
>Greater than, for example, x > y
< = less than or equal to, e.g. x < = y
==Equal to e.g. x==y
!= Not equal to, for example, X= y
>=Greater than or equal to, e.g. x > = y`
Examples
awk -F: '$3 == 0' /etc/passwd awk -F: '$3 < 10' /etc/passwd df -P | grep '/' |awk '$4 > 25000 {print $0}'
- Arithmetic operations: +, -, *, /,% (modulus: remainder), ^ (Power: 2 ^ 3)
You can perform calculations in logical expressions, and awk will perform arithmetic operations as floating-point numbers
awk -F: '$3 * 10 > 5000{print $0}' /etc/passwd
Compound mode 3
In coincidence mode, logical operation symbols are usually used
&&Logical and, equivalent to and
||Logical or, equivalent to or
! Logical negation
awk -F: '$1~/root/ && $3<=15' /etc/passwd awk -F: '$1~/root/ || $3<=15' /etc/passwd awk -F: '!($1~/root/ || $3<=15)' /etc/passwd
4 range mode
The patterns are separated by commas
The syntax used is: start expression and end expression
The following meaning is: from the beginning of the line with bin to the end of the line with adm
That is, all contents from the line beginning with bin to the line containing adm meet the matching conditions.
awk -F: '/^bin/,/adm/ {print $0 }' /etc/passwd bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin
The range mode is not combined with other modes
The following d operation is invalid
[root@shark ~]# echo Yes | awk '/1/,/2/ || /Yes/' [root@shark ~]#
Comprehensive practice
Organization No. organization name Province Trunk No. trunk status receiving teller No. receiving teller name currency balance 11007eee Beijing Dongcheng District Street sub branch 03 Unclaimed one hundred and fifty-six nineteen thousand and one.68 11007fff Beijing Dongcheng District Street sub branch 03 Unclaimed eight hundred and forty two thousand six hundred and seventy-two.00 11007aaa Beijing Dongcheng District Street sub branch 04 Unclaimed one hundred and fifty-six seven thousand two hundred and sixty-one.31 11007ccc Beijing Chaoyang District Road sub branch 02 Unclaimed one hundred and fifty-six one hundred and sixty-one thousand four hundred and ninety.08 110088ee Beijing Chaoyang District Road sub branch 03 Unclaimed eight hundred and forty nineteen thousand seven hundred and eleven.00 34009eff Shanxi Coal Mine Road sub branch 03 Unclaimed one hundred and fifty-six two hundred and eighty-two thousand three hundred and seventy.23 11007eee Shanxi colliery District Road sub branch 03 Unclaimed one hundred and fifty-six two hundred and eighty-two thousand three hundred and seventy.23 11007eee Shanxi colliery District Road sub branch 03 Unclaimed one hundred and fifty-six two hundred and eighty-two thousand three hundred and seventy.23 11007264 Shandong Pingyin County sub branch Shandong 02 Unclaimed one hundred and fifty-six three hundred and four thousand five hundred and sixteen.23 11007889 Shandong Jiyang County sub branch Beijing 04 Unclaimed eight hundred and forty twenty-four thousand five hundred and fifty-one.00 11007264 Beijing Chaoyang District sub branch Beijing 02 Unclaimed one hundred and fifty-six three hundred and four thousand five hundred and sixteen.23 11007284 Beijing Chaoyang District sub branch Beijing 02 Requisition 1002 Lich King one hundred and fifty-six three hundred and four thousand five hundred and sixteen.23 11007194 Beijing Chaoyang District Bank Beijing 02 Unclaimed one hundred and fifty-six three hundred and four thousand five hundred and sixteen.23 11007264 Henan Zhongyuan District sub branch Henan 02 Unclaimed one hundred and fifty-six three hundred and four thousand five hundred and sixteen.23 11007284 Henan Erqi sub branch Henan 03 Requisition 1003 Zhong Kui one hundred and fifty-six nine thousand and forty-six.23
- Find the unclaimed, organization number and balance of the trunk in Henan Province#
- Find all the structure numbers in Beijing,
- Find the organization in Beijing and the trunk is not collected
- Find county-level sub branch#
- Find Pingyin County sub branch
- Find Beijing branch with balance greater than 8000#
- Find out which tellers have used the trunk, and print out the teller number and teller name
- All require formatted output#
7, awk scripting
1 if statement
Format {if (expression) {statement; statement;...}}
awk -F: '{ if ($3==0) {print $1 " is administrator."}}' /etc/passwd Output: root is administrator. # Count the number of system level users awk -F: '{ if ($3>0 && $3<1000) {count++} } END{print count}' /etc/passwd Output: 22
2. If... else statement
Format {if (expression) {statement; statement;...} else {statement; statement;...}}
awk -F: '{ if ($3==0){print $1} else {print $7} }' /etc/passwd awk -F: '{ if ($3==0){count++} else{i++} } END{print "Number of administrators: "count "Number of system users: "i}' /etc/passwd Input: Number of administrators: 1 Number of system users: 24 awk -F: '{ if($3==0){count++} else{i++} } END{print "Number of administrators: "count ; print "Number of system users: "i}' /etc/passwd Output: Number of administrators: 1 Number of system users: 24
3 if... else if... else statement
format
{if (expression 1) {statement; statement;...} else if (expression 2) {statement; statement;...} else if (expression 3) {statement; statement;...} else {statement; statement;...}}
awk -F: '{if($3==0){i++} else if($3>999){k++} else{j++}} END{print i; print k; print j}' /etc/passwd Output: 1 2 22
awk -F: '{if($3==0){i++} else if($3>999){k++} else{j++}} END{print "Number of administrators: "i; print "Number of ordinary: "k; print "System user: "j}' /etc/passwd Output: Number of administrators: 1 Number of ordinary: 2 System user: 22
8, awk uses external variables:
1 use custom shell variables
Method 1: awk parameter - v (recommended and easy to read)
[root@shark ~]# read -p ">>:" user >>:root [root@shark ~]# awk -F: -v awk_name=$user '==awk_name {print "user exists"} '/ etc/passwd User presence [root@shark ~]#
2 use the environment variables of the shell
[root@shark ~]# read -p ">>:" user >>:root [root@shark ~]# export user [root@shark ~]# awk -F: '==ENVIRON["user"] {print "user exists"}' / etc/passwd User presence [root@shark ~]# unset user
9, Specify multiple delimiters: []
echo "a b|c d| ||||e | |" |awk -F'[ |]' '{print $10}' e echo "a b|c d| ||||e | |" |awk -F'[ |]+' '{print $5}' e
[root@shark ~]# echo "110.183.58.144 - - [10/May/2018:23:49:27 +0800] GET http://app." |awk -F'[][ ]' '{print $5 }' 10/May/2018:23:49:27 [root@shark ~]# echo "110.183.58.144 - - [10/May/2018:23:49:27 +0800] GET http://app." |awk -F'[][ ]+' '{print $4 }' 10/May/2018:23:49:27
Note: any characters in brackets are regarded as ordinary characters, such as* Are regarded as ordinary characters.
For example:
$ echo "a.b*c" |awk -F'[.*]' '{print $1, $2,$3}' a b c
Job:
1. Obtain network card IP (all IP except ipv6)
2. Get the memory usage
3. Get disk usage
5. Print out the last field of / etc/hosts file (separated by spaces)
6. Print the directory name under the specified directory
10, Production example:
Count and sort the IP accesses in a certain time range in the log
Partial log
110.183.58.144 - - [10/May/2018:23:49:27 +0800] "GET http://app.znds.com/html/20180504/y222sks_2.2.3_dangbei.dangbei HTTP/1.1" 200 14306614 "-" "okhttp/3.4.1" 1.69.17.127 - - [10/May/2018:23:49:31 +0800] "GET http://app.znds.com/down/20180205/ttjs_3.0.0.1_dangbei.apk HTTP/1.1" 200 13819375 "-" "okhttp/3.4.1" 1.69.17.127 - - [10/May/2018:23:49:40 +0800] "GET http://app.znds.com/down/20180416/ttyj_1.1.6.0_dangbei.apk HTTP/1.1" 200 16597231 "-" "okhttp/3.4.1" 1.69.17.127 - - [10/May/2018:23:50:00 +0800] "GET http://app.znds.com/down/20170927/jydp_1.06.00_dangbei.apk HTTP/1.1" 200 36659203 "-" "okhttp/3.4.1"
Concrete implementation
Log file name: app log
$ start_dt='10/May/2018:23:47:43 $ end_dt='10/May/2018:23:49:05' $ awk -v st=${start_dt} -v ent=${end_dt} -F'[][ ]' '$5 == st,$5 == ent {print $1}' app.log |sort |uniq -c |sort -nr |head -n 10 66 223.13.142.15 6 110.183.13.212 4 1.69.17.127 1 113.25.94.69 1 110.183.58.144
Time conversion tool
[root@shark ~]# unset months_array [root@shark ~]# declare -A month_array [root@shark ~]# month_array=([01]="Jan" [02]="Feb" [03]="Mar" [04]="Apr" [05]="May" [06]="Jun" [07]="Jul" [08]="Aug" [09]="Sept" [10]="Oct" [11]="Nov" [12]="Dec") [root@shark ~]# echo ${month_array[01]} [root@shark ~]# Jan [root@shark ~]# m=10 [root@shark ~]# echo ${month_array[$m]} [root@shark ~]# Oct
String slicing
Syntax: ${var: start with this index number: how many characters to take out:}
Index numbers start at 0
[root@shark ~]# st="20180510234931" [root@shark ~]# m=${st:4:2} [root@shark ~]# echo $m [root@shark ~]# 05
Common log analysis statements
# TOP 20 IP access 16348 58.16.183.52 awk '$9==200 {print $1}' 2018-05-10-0000-2330_app.log | sort |uniq -c |sort -r |head -20 # Access the IP address of TOP 10 with status code 20X 2097 125.70.184.99 2000 183.225.69.158 awk '$9 > 200 && $9 < 300{print $1}' 2018-05-10-0000-2330_app.log | sort |uniq -c |sort -r |head # url to access TOP 20 250563 http://app.xxx.com/update/2018-04-04/dangbeimarket_4.0.9_162_znds.apk awk '$9 == 200{print $7,$9}' 2018-05-10-0000-2330_app.log | sort |uniq -c |sort -r |head -20 # Access the url of TOP 10 with status code 20X 248786 http://app.znds.com/update/2018-04-04/dangbeimarket_4.0.9_162_znds.apk awk '$9 > 200 && $9 < 300{print $7,$9}' 2018-05-10-0000-2330_app.log | sort |uniq -c |sort -r |head # IP with more than 1W accesses 58.16.184.247 58.16.183.52 awk '{print $1}' 2018-05-10-0000-2330_app.log| sort |uniq -c |sort -r |awk '$1 > 10000 {print $1}' # Access the url of TOP 10 with status code 404 1017 http://app.xxx.com/update/fixedaddress/kuaisou_qcast.apk.md5 awk '$9 == 404 {print $1}' 2018-05-10-0000-2330_app.log | sort |uniq -c |sort -r |head
11, Improve article
1 split information into files
awk splitting files is very simple, just use redirection.
The following example is to separate files according to the third example, which is quite simple (NR!=1 means that the header is not processed).
$ awk 'NR!=1{print > $3}' gy.txt $ ls gy.txt Beijing Shanxi Shandong Henan
You can also output the specified columns to a file:
awk 'NR!=1{print $2,$3,$4,$5 > $3}' gy.txt
A little more complicated: (note the if else if statement. It can be seen that awk is actually a script interpreter)
$ awk 'NR!=1 {if($3 ~ /Beijing|Shandong/) print > "1.txt"; else if($3 ~ /Shanxi/) print > "2.txt"; else print > "3.txt" }' gy.txt $ ls ?.txt 1.txt 2.txt 3.txt
2 AWK array
Syntax: array_name[index]=value
- array_name array name
- Index index
- Value value
The array in awk is the same as the associative array in shell. The index can be any string, and the index can also use the corresponding valid variables.
[root@shark ~]# awk 'BEGIN{ arr["a"]=1; arr["b"]=2+3; print arr["a"]; print arr["b"] }' 1 5 [root@shark ~]# echo "a b" |awk 'BEGIN{ arr["a"]=1; arr["b"]=2+3; print arr["a"]; print arr["b"] }' 1 5
Sample file
Void Walker mathematics 68 Void Walker English 88 Dark girl language 98 Dark girl math 68 Wuji swordsman language 78 Wuji Jiansheng mathematics 48 Zither fairy language 90 Harp fairy mathematics 68 Harp fairy English 61 Main language of shadow stream 68 Master of shadow flow 88 Main English 98
[root@shark ~]# awk '{arr[$1]++;print $1,arr[$1]}' hero Void Walker 1 Void Walker 2 Daughter of darkness 1 Daughter of darkness 2 Limitless swordsman 1 Limitless swordsman 2 Harp fairy 1 Harp fairy 2 Harp Fairy 3 Master of shadow stream 1 Master of shadow flow 2 Master of shadow flow 3
[root@shark ~]# awk '{arr[$1]++} END{print $1,arr[$1]}' hero Master of shadow flow 3
[root@shark ~]# awk '{arr[$1]++} END{for (i in arr) print i, arr[i]}' hero Daughter of darkness 2 Void Walker 2 Harp Fairy 3 Master of shadow flow 3 Limitless swordsman 2 [root@shark ~]#
Let's take another look at the usage of counting the number of outlets in each province:
$ awk 'NR!=1 {a[$3]++;} END {for (i in a) print i ", " a[i];}' gy.txt Beijing, 69 Shanxi, 20 Jiangsu, 10 Shandong, 16
3 examples of built-in functions
#Find a line with a length greater than 80 from the file awk 'length>80' file # Length is the built-in function of awk, and the job is to count the string length of each line
4 casual play
The following command calculates the total file size of all txt files.
$ ls -l *.txt | awk '{sum+=$5} END {print sum}' 2511401
other
# Let's take a look at how much memory each user's process occupies (Note: RSS column of sum) $ ps aux | awk 'NR!=1 {a[$1]+=$6;} END { for(i in a) print i ", " a[i]"KB";}' #View client IP by number of connections netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr #Print 99 multiplication table seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i==NR?"\n":"\t")}'
Sample data
110.183.58.144 - - [10/May/2018:23:49:27 +0800] "GET http://app.znds.com/html/20180504/y222sks_2.2.3_dangbei.dangbei HTTP/1.1" 200 14306614 "-" "okhttp/3.4.1" 1.69.17.124 - - [10/May/2018:23:49:31 +0800] "GET http://app.znds.com/down/20180205/ttjs_3.0.0.1_dangbei.apk HTTP/1.1" 200 13819375 "-" "okhttp/3.4.1" 1.69.17.125 - - [10/May/2018:23:49:40 +0800] "GET http://app.znds.com/down/20180416/ttyj_1.1.6.0_dangbei.apk HTTP/1.1" 200 16597231 "-" "okhttp/3.4.1" 1.69.17.126 - - [10/May/2018:23:50:00 +0800] "GET http://app.znds.com/down/20170927/jydp_1.06.00_dangbei.apk HTTP/1.1" 200 36659203 "-" "okhttp/3.4.1" 1.69.17.127 - - [10/May/2018:23:50:00 +0800] "GET http://app.znds.com/down/20170927/jydp_1.06.00_dangbei.apk HTTP/1.1" 200 36659203 "-" "okhttp/3.4.1" 1.69.17.128 - - [10/May/2018:23:50:00 +0800] "GET http://app.znds.com/down/20170927/jydp_1.06.00_dangbei.apk HTTP/1.1" 200 36659203 "-" "okhttp/3.4.1" 1.69.17.129 - - [10/May/2018:23:50:00 +0800] "GET http://app.znds.com/down/20170927/jydp_1.06.00_dangbei.apk HTTP/1.1" 200 36659203 "-" "okhttp/3.4.1"
#!/bin/bash declare -A month_array month_array=([01]="Jan" [02]="Feb" [03]="Mar" [04]="Apr" [05]="May" [06]="Jun" [07]="Jul" [08]="Aug" [09]="Sept" [10]="Oct" [11]="Nov" [12]="Dec") st=$1 end=$2 logfile=$3 start_yer=${st:0:4} start_month=${st:4:2} start_day=${st:6:2} start_h=${st:8:2} start_m=${st:10:2} start_s=${st:12:2} # Get mapped month start_month=${month_array[$start_month]} # Replace the date and time entered by the user with the format in the log: 10/May/2018:23:49:31 start_dt="$start_day/$start_month/${start_yer}:${start_h}:${start_m}:$start_s" echo $start_dt end_yer=${end:0:4} end_month=${end:4:2} end_day=${end:6:2} end_h=${end:8:2} end_m=${end:10:2} end_s=${end:12:2} end_month=${month_array[$end_month]} end_dt="$end_day/$end_month/${end_yer}:${end_h}:${end_m}:$end_s" echo $end_dt echo $logfile export start_dt end_dt awk -F '[][ ]+' '{ if ($4==ENVIRON["start_dt"]){flag=1} else if($4==ENVIRON["end_dt"]){flag=0}; if (flag || $4==ENVIRON["end_dt"]){print $1} }' $logfile unset start_dt unset end_dt
[root@shark ~]# sh search-log.sh 20180510234931 20180510235000 access.log 10/May/2018:23:49:31 10/May/2018:23:50:00 access.log 1.69.17.124 1.69.17.125 1.69.17.126 1.69.17.127 1.69.17.128 1.69.17.129
12, References
Built in variables, see: http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din-Variables
For flow control, see: http://www.gnu.org/software/gawk/manual/gawk.html#Statements
Built in functions, see: http://www.gnu.org/software/gawk/manual/gawk.html#Built_002din
Regular expressions, see: http://www.gnu.org/software/gawk/manual/gawk.html#Regexp