Using the awk command

1. What is awk

AWK, a data filtering tool (similar to grep and more powerful than grep), is a data processing engine. It checks the input text based on pattern matching, processes and outputs it line by line. It is usually used in Shell scripts to obtain the specified data. When used alone, it can make statistics on text data

2. Format syntax

1. Format

Format 1: pre command | awk [options] condition {edit instruction} '

Format 2: awk [options] condition {edit instruction} 'file

When editing instructions contain multiple statements, they can be separated by semicolons. When processing text, if no separator is specified, spaces, tabs, etc. are used as separators by default. print is the most common instruction

2. Options

-F fs	Specifies the input file split separator. fs is a string or a positive side expression, such as - F
-v	Assign a user-defined variable
-f	Read awk command from script
-W	Run awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored
' '	Reference code block
//	Matching code blocks, which can be strings or regular expressions
{}	Command code block containing one or more commands
;	Multiple commands are separated by semicolons
BEGIN	At the beginning of the awk program, it is executed before reading any data. Actions after BEGIN are executed only once at the beginning of the program
END	When the awk program has finished processing all data and is about to END, execute? The action after END is executed only once at the END of the program

BEGIN is mainly the initialization code block. Before processing each line, the initialization code mainly refers to the global variable and sets the FS separator
END is mainly the END code block. The code block executed after processing each line is mainly used for final calculation or output END summary information

3. awk built in variables

$0	Represents the entire current row
$1 ~ $n	The nth field of the current record
FS	Enter field separator (default is space)
RS	Enter the record separator, and the default line feed character (that is, the text is entered line by line)
NF	The number of fields is the number of columns
NR	The number of records, which is the line number, starts from 1 by default
FNR	Similar to NR, but multiple file records are not incremented, and each file starts with 1
OFS	Output field separator, default space
ORS	Output record separator, default line break
\n	Newline character
~	Matching regular expressions
!~	Mismatch regular expression
= += -= = /= %= ^= *=	assignment
&&	Logic and
< <= > >= != ==< <= > >= != ==	Relational operator
$	Field reference
* / %	Multiplication, division and remainder

4. Instance operation

BEGIN

[root@localhost ~]# cat data
 Have you eaten yet?
[root@localhost ~]# awk 'BEGIN{print "good morning"} {print{print $0}' data}' data 
good morning
 Have you eaten yet?

Output the characters in the first and fifth columns of each line in the text

// awk '{[pattern] action}' {filenames} # line matching statement awk '' can only use single quotation marks

[root@localhost ~]# cat test 
1 This is the header line.
2 This is the first data line.
3 This is the second data line.
8 This is the last line.
[root@localhost ~]# awk '{print $1,$5}' test
1 header
2 first
3 second
8 last

awk -F #-F is equivalent to the built-in variable FS, which specifies the split character

// Use multiple separators. First use spaces to split, and then use "," to split the split result
  awk -F '[ ,]'  '{print $1,$3}'   test.txt
[root@localhost ~]# awk -F'[ ,]' '{print $1,$3}' test
1 is
2 is
3 is
8 is

awk -v # set variable

// Set a variable a=2 and use the + operation
[root@localhost ~]# cat test 
1 This is the header line.
2 This is the first data line.
3 This is the second data line.
8 This is the last line.
[root@localhost ~]# awk -v a=2 '{print $1,$1+a}' test
1 3
2 4
3 5
8 10

Matching mechanism

The power of wk lies in the script command, which consists of two parts: matching rules and executing commands, as shown below:

  'Matching rules{Execute command}'

To specify that the script command can act on a specific line in the text content, which can be specified by string (for example, / demo /, which means to view the line containing demo string) or regular expression. In addition, it should be noted that the whole script command is enclosed in single quotation marks (''), and the execution command part needs to be enclosed in curly braces ({}).

// When the awk program is executed, if no execution command is specified, the matching lines will be output by default; if no matching rule is specified, all lines in the text will be matched by default.
[root@localhost ~]# cat test 
1 This is the header line.

2 This is the first data line.

3 This is the second data line.


8 This is the last line.
[root@localhost ~]# awk '/^$/ {print "spring breeze"}' test
 The spring breeze brushed my face
 The spring breeze brushed my face
 The spring breeze brushed my face
 The spring breeze brushed my face

Records and fields

awk treats each input line as a record, while words (i.e. columns) separated by spaces or tabs are used as fields (the characters used to separate fields are called separators).

[root@localhost ~]# echo 'wsnd hh zz' | awk '{print $1}'
wsnd

[root@localhost ~]# echo 'wsnd hh zz' | awk 'BEGIN{a=1;b=2}{print $(a+b)}'
zz

//Print ip
[root@localhost ~]# ip a | grep 'inet ' | grep -v '127.0.0.1' | awk -F'[ /]+' '{print $3}' 
192.168.200.145

Division of fields

awk can split fields in three ways
- The first method is to separate fields with white space characters. Set fs to a space. In this case, the leading and ending white space characters (spaces and / or tabs) of the record will be ignored. And the fields will be separated by spaces and / or tabs. Because the default value of FS is a space, this is also the usual method awk to divide records into fields.
- The second method is to use other single characters to separate fields. For example, awk programs often use ":" as the separator. When FS represents any single character, another field will be separated wherever this character appears. If two consecutive separators appear, the field value between them is an empty string.
- The method is that if you set more than one character as the field separator, it will be interpreted as a regular expression.
```
[root@localhost ~]# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
[root@localhost ~]# awk 'BEGIN{FS=":"}{print $3}' /etc/passwd
0
1
2
```

Logical operation

// Filter rows greater than 2
[root@localhost ~]# cat test 
1 This is the header line.

2 This is the first data line.

3 This is the second data line.


8 This is the last line.
[root@localhost ~]# awk '$1>2' test 
3 This is the second data line.
8 This is the last line.

//Filter rows equal to 2 and output the first and third columns
[root@localhost ~]# awk '$1==2 {print $1,$3}' test 
2 is
//Filter rows whose first column is greater than 1 and whose fifth column is equal to 'first'
[root@localhost ~]# awk '$1>1 && $5=="first" {print $1,$2,$3}' test 
2 This is

Mode inversion

// Take the row that does not contain data in column 6 and output columns 1 and 5
[root@localhost ~]# cat test 
1 This is the header line.

2 This is the first data line.

3 This is the second data line.


8 This is the last line.
[root@localhost ~]# awk '$6 !~ /data/ {print $1,$5}' test
1 header
 
 
 
 
8 last

OFS output field separation

[root@localhost ~]# cat passwd 
root:x:0:0 root:/root:/bin/bash
bin:x:1:1 bin:/bin:/sbin/nologin
daemon:x:2:2 daemon:/sbin:/sbin/nologin
[root@localhost ~]# awk 'BEGIN{FS=":"}{print $1,$6}' passwd 
root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin
[root@localhost ~]# awk 'BEGIN{FS=":";OFS="="}{print $1,$6}' passwd 
root=/bin/bash
bin=/sbin/nologin
daemon=/sbin/nologin

NF field quantity variable

//Query how many columns are there in each row of data
[root@localhost ~]# cat data 
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83
[root@localhost ~]# awk '{print NF}' data 
6
7
8

NR line number

//Output the content and line number of each line, and divide it with "." (DOT)
[root@localhost ~]# cat data 
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83
[root@localhost ~]# awk '{print NR "." $0}' data 
1.john 85 92 78 94 88
2.andrea 89 90 75 90 86 92
3.jasper 84 88 80 92 84 94 83

Record separator for RS input

// Take the newline character as the field separator, set the record separator to null, and change the output separator to colon to output the contents of the first column
[root@localhost ~]# cat test 
1 This is the header line.

2 This is the first data line.

3 This is the second data line.


8 This is the last line.
[root@localhost ~]# awk 'BEGIN{FS="\n";RS="";ORS=":"}{print $1}' test 
1 This is the header line.:2 This is the first data line.:3 This is the second data line.:8 This is the last line.:

END

[root@localhost ~]# cat data 
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83
[root@localhost ~]# awk 'BEGIN{print "hi"} {print $0} END{print "byby"}' data 
hi
john 85 92 78 94 88
andrea 89 90 75 90 86 92
jasper 84 88 80 92 84 94 83
byby

Keywords: regex perl

Added by WeddingLink on Wed, 22 Sep 2021 04:55:45 +0300

Programming VIP

Using the awk command

Using the awk command

1. What is awk

2. Format syntax

3. awk built in variables

4. Instance operation

Popular Keywords