Using the awk command
1. What is awk
AWK, a data filtering tool (similar to grep and more powerful than grep), is a data processing engine. It checks the input text based on pattern matching, processes and outputs it line by line. It is usually used in Shell scripts to obtain the specified data. When used alone, it can make statistics on text data
2. Format syntax
1. Format
Format 1: pre command | awk [options] condition {edit instruction} '
Format 2: awk [options] condition {edit instruction} 'file
When editing instructions contain multiple statements, they can be separated by semicolons. When processing text, if no separator is specified, spaces, tabs, etc. are used as separators by default. print is the most common instruction
2. Options
-F fs | Specifies the input file split separator. fs is a string or a positive side expression, such as - F |
---|---|
-v | Assign a user-defined variable |
-f | Read awk command from script |
-W | Run awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored |
' ' | Reference code block |
// | Matching code blocks, which can be strings or regular expressions |
{} | Command code block containing one or more commands |
; | Multiple commands are separated by semicolons |
BEGIN | At the beginning of the awk program, it is executed before reading any data. Actions after BEGIN are executed only once at the beginning of the program |
END | When the awk program has finished processing all data and is about to END, execute? The action after END is executed only once at the END of the program |
- BEGIN is mainly the initialization code block. Before processing each line, the initialization code mainly refers to the global variable and sets the FS separator
- END is mainly the END code block. The code block executed after processing each line is mainly used for final calculation or output END summary information
3. awk built in variables
$0 | Represents the entire current row |
---|---|
$1 ~ $n | The nth field of the current record |
FS | Enter field separator (default is space) |
RS | Enter the record separator, and the default line feed character (that is, the text is entered line by line) |
NF | The number of fields is the number of columns |
NR | The number of records, which is the line number, starts from 1 by default |
FNR | Similar to NR, but multiple file records are not incremented, and each file starts with 1 |
OFS | Output field separator, default space |
ORS | Output record separator, default line break |
\n | Newline character |
~ | Matching regular expressions |
!~ | Mismatch regular expression |
= += -= *= /= %= ^= **= | assignment |
&& | Logic and |
< <= > >= != ==< <= > >= != == | Relational operator |
$ | Field reference |
* / % | Multiplication, division and remainder |
4. Instance operation
-
BEGIN
[root@localhost ~]# cat data Have you eaten yet? [root@localhost ~]# awk 'BEGIN{print "good morning"} {print{print $0}' data}' data good morning Have you eaten yet?
-
Output the characters in the first and fifth columns of each line in the text
// awk '{[pattern] action}' {filenames} # line matching statement awk '' can only use single quotation marks [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk '{print $1,$5}' test 1 header 2 first 3 second 8 last
-
awk -F #-F is equivalent to the built-in variable FS, which specifies the split character
// Use multiple separators. First use spaces to split, and then use "," to split the split result awk -F '[ ,]' '{print $1,$3}' test.txt [root@localhost ~]# awk -F'[ ,]' '{print $1,$3}' test 1 is 2 is 3 is 8 is
-
awk -v # set variable
// Set a variable a=2 and use the + operation [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk -v a=2 '{print $1,$1+a}' test 1 3 2 4 3 5 8 10
-
Matching mechanism
The power of wk lies in the script command, which consists of two parts: matching rules and executing commands, as shown below:
'Matching rules{Execute command}'
To specify that the script command can act on a specific line in the text content, which can be specified by string (for example, / demo /, which means to view the line containing demo string) or regular expression. In addition, it should be noted that the whole script command is enclosed in single quotation marks (''), and the execution command part needs to be enclosed in curly braces ({}).
// When the awk program is executed, if no execution command is specified, the matching lines will be output by default; if no matching rule is specified, all lines in the text will be matched by default. [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk '/^$/ {print "spring breeze"}' test The spring breeze brushed my face The spring breeze brushed my face The spring breeze brushed my face The spring breeze brushed my face
-
Records and fields
awk treats each input line as a record, while words (i.e. columns) separated by spaces or tabs are used as fields (the characters used to separate fields are called separators).
[root@localhost ~]# echo 'wsnd hh zz' | awk '{print $1}' wsnd [root@localhost ~]# echo 'wsnd hh zz' | awk 'BEGIN{a=1;b=2}{print $(a+b)}' zz //Print ip [root@localhost ~]# ip a | grep 'inet ' | grep -v '127.0.0.1' | awk -F'[ /]+' '{print $3}' 192.168.200.145
-
Division of fields
awk can split fields in three ways
- The first method is to separate fields with white space characters. Set fs to a space. In this case, the leading and ending white space characters (spaces and / or tabs) of the record will be ignored. And the fields will be separated by spaces and / or tabs. Because the default value of FS is a space, this is also the usual method awk to divide records into fields.
- The second method is to use other single characters to separate fields. For example, awk programs often use ":" as the separator. When FS represents any single character, another field will be separated wherever this character appears. If two consecutive separators appear, the field value between them is an empty string.
- The method is that if you set more than one character as the field separator, it will be interpreted as a regular expression.
[root@localhost ~]# cat /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin [root@localhost ~]# awk 'BEGIN{FS=":"}{print $3}' /etc/passwd 0 1 2
-
Logical operation
// Filter rows greater than 2 [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk '$1>2' test 3 This is the second data line. 8 This is the last line. //Filter rows equal to 2 and output the first and third columns [root@localhost ~]# awk '$1==2 {print $1,$3}' test 2 is //Filter rows whose first column is greater than 1 and whose fifth column is equal to 'first' [root@localhost ~]# awk '$1>1 && $5=="first" {print $1,$2,$3}' test 2 This is
-
Mode inversion
// Take the row that does not contain data in column 6 and output columns 1 and 5 [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk '$6 !~ /data/ {print $1,$5}' test 1 header 8 last
-
OFS output field separation
[root@localhost ~]# cat passwd root:x:0:0 root:/root:/bin/bash bin:x:1:1 bin:/bin:/sbin/nologin daemon:x:2:2 daemon:/sbin:/sbin/nologin [root@localhost ~]# awk 'BEGIN{FS=":"}{print $1,$6}' passwd root /bin/bash bin /sbin/nologin daemon /sbin/nologin [root@localhost ~]# awk 'BEGIN{FS=":";OFS="="}{print $1,$6}' passwd root=/bin/bash bin=/sbin/nologin daemon=/sbin/nologin
-
NF field quantity variable
//Query how many columns are there in each row of data [root@localhost ~]# cat data john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83 [root@localhost ~]# awk '{print NF}' data 6 7 8
-
NR line number
//Output the content and line number of each line, and divide it with "." (DOT) [root@localhost ~]# cat data john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83 [root@localhost ~]# awk '{print NR "." $0}' data 1.john 85 92 78 94 88 2.andrea 89 90 75 90 86 92 3.jasper 84 88 80 92 84 94 83
-
Record separator for RS input
// Take the newline character as the field separator, set the record separator to null, and change the output separator to colon to output the contents of the first column [root@localhost ~]# cat test 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 8 This is the last line. [root@localhost ~]# awk 'BEGIN{FS="\n";RS="";ORS=":"}{print $1}' test 1 This is the header line.:2 This is the first data line.:3 This is the second data line.:8 This is the last line.:
-
END
[root@localhost ~]# cat data john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83 [root@localhost ~]# awk 'BEGIN{print "hi"} {print $0} END{print "byby"}' data hi john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83 byby