awk
1, Introduction
AWK is a language for processing text files and a powerful text analysis tool.
AWK is called because it takes the first character of the Family Name of the three founders Alfred Aho, Peter Weinberger, and Brian Kernighan.
2, Basic grammar
//awk [options] 'script command' file name
awk [Option parameters] 'script' var=value file(s)
perhaps
awk [Option parameters] -f scriptfile var=value file(s)
Common option parameters
parameter | notes |
---|---|
-F fs | Specifies the input file split separator. fs is a string or a positive side expression, such as - F |
-v | Assign a user-defined variable |
-f | Read awk command from script |
-W | Run awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored. |
' ' | Reference code block |
// | Matching code blocks, which can be strings or regular expressions |
{} | Command code block containing one or more commands |
; | Multiple commands are separated by semicolons |
BEGIN | At the beginning of the awk program, it is executed before reading any data. Actions after BEGIN are executed only once at the beginning of the program |
END | When the awk program has finished processing all data and is about to END, execute? The action after END is executed only once at the END of the program |
Take a chestnut
source file
[root@localhost ~]# cat xu.txt 1 This is html 2 How are you 3 You are beautiful 7 happy mid-Autumn Festival
Output the characters in the first and fourth columns of each line in the text
[root@localhost ~]# awk '{print $1,$4}' xu.txt 1 html 2 you 3 beautiful 7 Festival
perhaps
awk -F #-F is equivalent to the built-in variable FS, which specifies the split character
Use multiple delimiters.First use space segmentation, and then use the segmentation result","division awk -F '[ ,]' '{print $1,$2,$4}' xu.txt
result
[root@localhost ~]# awk -F '[ ,]' '{print $1,$2,$4}' xu.txt 1 This html 2 How you 3 You beautiful 7 happy Festival
awk -v # set variable
Chestnuts
source file
[root@localhost ~]# cat xu.txt 1 This is html 2 How are you 3 You are beautiful 7 happy mid-Autumn Festival
Set variables in the awk command, then do a + operation, and finally print
[root@localhost ~]# awk -v a=1 '{print $1,$1+a}' xu.txt 1 2 2 3 3 4 7 8
Matching mechanism
The strength of awk lies in the script command, which consists of two parts: matching rules and executing commands, as shown below:
'Matching rules{Execute command}'
To specify that the script command can act on a specific line in the text content, which can be specified by string (for example, / demo /, which means to view the line containing demo string) or regular expression. In addition, it should be noted that the entire script command is enclosed in single quotation marks ('), and the execution command part needs to be enclosed in curly braces ({}).
stay awk During program execution, if no execution command is specified, the matching line will be output by default; If no matching rule is specified, all lines in the text are matched by default.
Take a chestnut
awk '/^$/ {print "runing"}' com.txt
In this command, / ^ $/ is a regular expression. Its function is to match blank lines in the text. At the same time, it can be seen that the print command is used to execute the command. This command is often used. Its function is very simple, that is, to output the specified text. Therefore, the function of the whole command is that if test.txt has N blank lines, executing this command will output N running.
source file [root@localhost ~]# cat com.txt This is the computer line. This is the name data line. This is the classroom data line. This is the become line.
result
[root@localhost ~]# awk '/^$/ {print "runing"}' com.txt runing runing runing [root@localhost ~]#
Built in variables commonly used in awk
parameter | notes |
---|---|
$0 | Represents the entire current row |
$1 ~ $n | The nth field of the current record |
FS | Enter field separator (default is space) |
RS | Enter the record separator, and the default line feed character (that is, the text is entered line by line) |
NF | The number of fields is the number of columns |
NR | The number of records, which is the line number, starts from 1 by default |
FNR | Similar to NR, but multiple file records are not incremented, and each file starts with 1 |
OFS | Output field separator, default space |
ORS | Output record separator, default line break |
\n | Newline character |
~ | Matching regular expressions |
!~ | Mismatch regular expression |
= += -= *= /= %= ^= **= | assignment |
&& | Logic and |
< <= > >= != == | Relational operator |
* / % | Multiplication, division and remainder |
$ | Field reference |
Logical operation
source file
[root@localhost ~]# cat xu.txt 1 This is html 2 How are you 3 You are beautiful 7 happy mid-Autumn Festival
Filter rows with the first column greater than 2
[root@localhost ~]# awk '$1>2' xu.txt 3 You are beautiful 7 happy mid-Autumn Festival [root@localhost ~]#
Filter the rows with the first column equal to 2 and output the first and third columns
[root@localhost ~]# awk '$1==2 {print $1,$3}' xu.txt 2 are [root@localhost ~]#
Filter rows whose first column is greater than 1 and whose third column is equal to 'are'
[root@localhost ~]# awk '$1>1 && $3=="are" {print $1,$2,$3}' xu.txt 2 How are 3 You are [root@localhost ~]#
Mode inversion
source file
[root@localhost ~]# cat xu.txt 1 This is html 2 How are you 3 You are beautiful 7 happy mid-Autumn Festival
Take the third row and output the second and fourth columns excluding the is row
[root@localhost ~]# awk '$3 !~ /is/ {print $2,$4}' xu.txt How you You beautiful happy Festival [root@localhost ~]#
BEGIN keyword
Awk can also specify the running time of script commands. By default, awk will read a line of text from the input, and then execute the program script for the line of data. However, sometimes it may be necessary to run some script commands before processing data, which requires the BEGIN keyword.
BEGIN will force awk to execute the script command specified after the keyword before reading data, for example:
source file
[root@localhost ~]# cat bub.txt hello how are you
After executing the following command
awk 'BEGIN {print "hi xiaohua"} {print $0}' bub.txt
result
[root@localhost ~]# awk 'BEGIN {print "hi xiaohua"} {print $0}' bub.txt hi xiaohua hello how are you [root@localhost ~]#
As you can see, the script command here is divided into two parts. The script command of BEGIN will run before the awk command processes the data, and the second script command is really used to process the data.
END keyword
Corresponding to BEGIN keyword, END keyword allows us to specify some script commands. awk will execute them after reading the data, for example:
source file
[root@localhost ~]# cat bub.txt hello how are you
After executing the following command
awk 'BEGIN {print "hi xiaohua"}{print $0} END {print"bye"}' bub.txt
result
[root@localhost ~]# awk 'BEGIN {print "hi xiaohua"}{print $0} END {print"bye"}' bub.txt hi xiaohua hello how are you bye
It can be seen that the script command in END will not be executed until the awk program prints the contents of the file.
NF
The variable is defined as the number of fields (i.e. several columns) of the current input record
source file
[root@localhost ~]# cat xm.txt john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83
Query how many columns are there in each row of data
[root@localhost ~]# awk '{print NF}' xm.txt 6 7 8
Adding $to NF is the last column of each row
[root@localhost ~]# awk '{print $NF}' xm.txt 88 92 83
NR
NR is the record number of each line, that is, the line number. Multiple file records are incremented
source file
[root@localhost ~]# cat xm.txt john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83
Output the first column of each row and output the row number
[root@localhost ~]# awk '{print NR $1}' xm.txt 1john 2andrea 3jasper
Output the first column of each row, and output the row number and divide it with "." (DOT)
[root@localhost ~]# awk '{print NR "." $1}' xm.txt 1.john 2.andrea 3.jasper
Output the content and line number of each line, and divide it with "." (DOT)
[root@localhost ~]# awk '{print NR "." $0}' xm.txt 1.john 85 92 78 94 88 2.andrea 89 90 75 90 86 92 3.jasper 84 88 80 92 84 94 83
RS
Record separator entered
[root@localhost ~]# cat xm.txt john 85 92 78 94 88 andrea 89 90 75 90 86 92 jasper 84 88 80 92 84 94 83
Take the newline character as the field separator and set the record separator to null to output the contents of the first column
[root@localhost ~]# awk 'BEGIN{FS="\n";RS=""}{print $1}' xm.txt john 85 92 78 94 88
After this setting, the program will consider it as a whole.
OFS
Output field separator
OFS is an output separator equivalent to FS, and its default value is space
source file
[root@localhost ~]# cat com.txt This is the computer line. This is the name data line. This is the classroom data line. This is the become line.
Take the newline character as the field separator, set the record separator to null, and change the output separator to colon to output the contents of the first column.
[root@localhost ~]# awk 'BEGIN{FS="\n";RS="";ORS=":"}{print $1}' com.txt This is the computer line.:This is the name data line.:This is the classroom data line.:This is the become line.: