awk advanced usage

awk

1, Introduction

AWK is a language for processing text files and a powerful text analysis tool.

AWK is called because it takes the first character of the Family Name of the three founders Alfred Aho, Peter Weinberger, and Brian Kernighan.

2, Basic grammar

//awk [options] 'script command' file name

awk [Option parameters] 'script' var=value file(s)

perhaps

awk [Option parameters] -f scriptfile var=value file(s)

Common option parameters

parameternotes
-F fsSpecifies the input file split separator. fs is a string or a positive side expression, such as - F
-vAssign a user-defined variable
-fRead awk command from script
-WRun awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored.
' 'Reference code block
//Matching code blocks, which can be strings or regular expressions
{}Command code block containing one or more commands
;Multiple commands are separated by semicolons
BEGINAt the beginning of the awk program, it is executed before reading any data. Actions after BEGIN are executed only once at the beginning of the program
ENDWhen the awk program has finished processing all data and is about to END, execute? The action after END is executed only once at the END of the program

Take a chestnut

source file

[root@localhost ~]# cat xu.txt 
1 This is html
2 How are you
3 You are beautiful
7 happy mid-Autumn Festival 

Output the characters in the first and fourth columns of each line in the text

[root@localhost ~]# awk '{print $1,$4}' xu.txt 
1 html
2 you
3 beautiful
7 Festival

perhaps
awk -F #-F is equivalent to the built-in variable FS, which specifies the split character

Use multiple delimiters.First use space segmentation, and then use the segmentation result","division
  awk -F '[ ,]'  '{print $1,$2,$4}'   xu.txt

result

[root@localhost ~]# awk -F '[ ,]'  '{print $1,$2,$4}' xu.txt 
1 This html
2 How you
3 You beautiful
7 happy Festival

awk -v # set variable

Chestnuts

source file

[root@localhost ~]# cat xu.txt 
1 This is html
2 How are you
3 You are beautiful
7 happy mid-Autumn Festival 

Set variables in the awk command, then do a + operation, and finally print

[root@localhost ~]# awk -v a=1 '{print $1,$1+a}' xu.txt 
1 2
2 3
3 4
7 8

Matching mechanism

The strength of awk lies in the script command, which consists of two parts: matching rules and executing commands, as shown below:

'Matching rules{Execute command}'

To specify that the script command can act on a specific line in the text content, which can be specified by string (for example, / demo /, which means to view the line containing demo string) or regular expression. In addition, it should be noted that the entire script command is enclosed in single quotation marks ('), and the execution command part needs to be enclosed in curly braces ({}).

stay awk During program execution, if no execution command is specified, the matching line will be output by default; If no matching rule is specified, all lines in the text are matched by default.

Take a chestnut

awk '/^$/ {print "runing"}' com.txt

In this command, / ^ $/ is a regular expression. Its function is to match blank lines in the text. At the same time, it can be seen that the print command is used to execute the command. This command is often used. Its function is very simple, that is, to output the specified text. Therefore, the function of the whole command is that if test.txt has N blank lines, executing this command will output N running.

source file
[root@localhost ~]# cat com.txt 
This is the computer line.

This is the name data line.

This is the classroom data line.

This is the become line.

result

[root@localhost ~]#  awk '/^$/ {print "runing"}' com.txt 
runing
runing
runing
[root@localhost ~]# 

Built in variables commonly used in awk

parameternotes
$0Represents the entire current row
$1 ~ $nThe nth field of the current record
FSEnter field separator (default is space)
RSEnter the record separator, and the default line feed character (that is, the text is entered line by line)
NFThe number of fields is the number of columns
NRThe number of records, which is the line number, starts from 1 by default
FNRSimilar to NR, but multiple file records are not incremented, and each file starts with 1
OFSOutput field separator, default space
ORSOutput record separator, default line break
\nNewline character
~Matching regular expressions
!~Mismatch regular expression
= += -= *= /= %= ^= **=assignment
&&Logic and
< <= > >= != ==Relational operator
* / %Multiplication, division and remainder
$Field reference

Logical operation
source file

[root@localhost ~]# cat xu.txt 
1 This is html
2 How are you
3 You are beautiful
7 happy mid-Autumn Festival 

Filter rows with the first column greater than 2

[root@localhost ~]# awk '$1>2' xu.txt 
3 You are beautiful
7 happy mid-Autumn Festival 
[root@localhost ~]# 

Filter the rows with the first column equal to 2 and output the first and third columns

[root@localhost ~]# awk '$1==2 {print $1,$3}' xu.txt 
2 are
[root@localhost ~]# 

Filter rows whose first column is greater than 1 and whose third column is equal to 'are'

[root@localhost ~]# awk '$1>1 && $3=="are" {print $1,$2,$3}' xu.txt 
2 How are
3 You are
[root@localhost ~]# 

Mode inversion
source file

[root@localhost ~]# cat xu.txt 
1 This is html
2 How are you
3 You are beautiful
7 happy mid-Autumn Festival 

Take the third row and output the second and fourth columns excluding the is row

[root@localhost ~]# awk '$3 !~ /is/ {print $2,$4}' xu.txt 
How you
You beautiful
happy Festival
[root@localhost ~]# 

BEGIN keyword
Awk can also specify the running time of script commands. By default, awk will read a line of text from the input, and then execute the program script for the line of data. However, sometimes it may be necessary to run some script commands before processing data, which requires the BEGIN keyword.

BEGIN will force awk to execute the script command specified after the keyword before reading data, for example:

source file

[root@localhost ~]# cat bub.txt 
hello
how
are
you

After executing the following command

awk 'BEGIN {print "hi xiaohua"} {print $0}' bub.txt 

result

[root@localhost ~]# awk 'BEGIN {print "hi xiaohua"} {print $0}' bub.txt 
hi xiaohua
hello
how
are
you

[root@localhost ~]# 

As you can see, the script command here is divided into two parts. The script command of BEGIN will run before the awk command processes the data, and the second script command is really used to process the data.
END keyword
Corresponding to BEGIN keyword, END keyword allows us to specify some script commands. awk will execute them after reading the data, for example:

source file

[root@localhost ~]# cat bub.txt 
hello
how
are
you

After executing the following command

awk 'BEGIN {print "hi xiaohua"}{print $0} END {print"bye"}' bub.txt

result

[root@localhost ~]# awk 'BEGIN {print "hi xiaohua"}{print $0} END {print"bye"}' bub.txt 
hi xiaohua
hello
how
are
you

bye

It can be seen that the script command in END will not be executed until the awk program prints the contents of the file.

NF
The variable is defined as the number of fields (i.e. several columns) of the current input record
source file

[root@localhost ~]# cat xm.txt
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83

Query how many columns are there in each row of data

[root@localhost ~]# awk '{print NF}' xm.txt
6
7
8

Adding $to NF is the last column of each row

[root@localhost ~]# awk '{print $NF}' xm.txt 
88
92
83

NR
NR is the record number of each line, that is, the line number. Multiple file records are incremented
source file

[root@localhost ~]# cat xm.txt
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83

Output the first column of each row and output the row number

[root@localhost ~]# awk '{print NR $1}' xm.txt
1john
2andrea
3jasper

Output the first column of each row, and output the row number and divide it with "." (DOT)

[root@localhost ~]# awk '{print NR "." $1}' xm.txt
1.john
2.andrea
3.jasper

Output the content and line number of each line, and divide it with "." (DOT)

[root@localhost ~]# awk '{print NR "." $0}' xm.txt 
1.john 85 92 78 94 88
2.andrea 89 90 75 90 86 92
3.jasper 84 88 80 92 84 94 83

RS
Record separator entered

[root@localhost ~]# cat xm.txt 
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83

Take the newline character as the field separator and set the record separator to null to output the contents of the first column

[root@localhost ~]# awk 'BEGIN{FS="\n";RS=""}{print $1}' xm.txt 
john 85 92 78 94 88

After this setting, the program will consider it as a whole.

OFS
Output field separator
OFS is an output separator equivalent to FS, and its default value is space
source file

[root@localhost ~]# cat com.txt 
This is the computer line.

This is the name data line.

This is the classroom data line.

This is the become line.

Take the newline character as the field separator, set the record separator to null, and change the output separator to colon to output the contents of the first column.

[root@localhost ~]# awk 'BEGIN{FS="\n";RS="";ORS=":"}{print $1}' com.txt 
This is the computer line.:This is the name data line.:This is the classroom data line.:This is the become line.:

Keywords: Linux regex perl

Added by scriptkiddie on Tue, 21 Sep 2021 11:41:10 +0300