awk of three swordsmen in text processing

1.awk working principle and basic usage

awk : Aho, Weinberger, Kernighan , report generator, formatted text output, GNU/Linux Published AWK is currently funded by the free software foundation( FSF )Development and maintenance, commonly known as GNU AWK

There are several versions:

AWK: AWK originally from at & T laboratory
NAWK: New awk, an upgraded version of AWK at & T labs
GAWK: GNU AWK. All GNU/Linux distributions come with GAWK, which is fully compatible with AWK and NAWK

GNU AWK User manual documentation

https://www.gnu.org/software/gawk/manual/gawk.html

gawk : mode scanning and processing language, which can realize the following functions:

text processing
Output formatted text report
Perform arithmetic operations
Perform string operations

Format:

awk [options]   'program' var=value   file...
awk [options]   -f programfile    var=value file...

explain:

program It is usually placed in single quotation marks and can be composed of three parts

BEGIN statement block
General statement block for pattern matching
END statement block

Common options:

-F "separator" indicates the field separator used in input. The default separator is several consecutive white space characters
-v var=value variable assignment

Program Format:

pattern{action statements;..}

pattern : determines when an action statement triggers an event, such as: BEGIN,END, Regular expressions, etc

action statements : process the data and put it in {} Common: print, printf

awk working process

Step 1: Execute BEGIN{action;... } Statements in a statement block

Step 2: import from file or standard (stdin) Read a row and execute Pattern {action;...} statement block, which scans the file line by line and repeats the process from the first line to the last line until all the files are read.

Step 3: when reading to the end of the input stream, execute END{action;...} Statement block

BEGIN Statement block in awk is executed before reading rows from the input stream. This is an optional statement block, such as variable initialization, printout table header and so on. Statements can usually be written in BEGIN In a statement block

END Statement block in awk is executed after reading all rows from the input stream, such as printing the analysis results of all rows. Such information is summarized in END Statement block, which is also an optional statement block

pattern The general commands in the statement block are the most important and optional. If not provided pattern Statement block, execute {print} by default That is, print each read line, awk The statement block is executed for each row read

Delimiters, fields, and records

Fields separated by separators (column, field) are marked with $1, ...$n is called domain ID, and $0 is all domains. Note: the meaning of variable $in shell is different from that of variable $in shell
Each line of the file is called record
If action is omitted, print $0 will be executed by default

frequently-used action classification

output statements: print,printf
Expressions: arithmetic, comparison expressions, etc
Compound statements: compound statements
Control statements: if, while, etc
input statements

awk Control statement

{statements;...} combined statements
if(condition) {statements;...}
if(condition) {statements;...} else {statements;...}
while(conditon) {statments;...}
do {statements;...} while(condition)
for(expr1;expr2;expr3) {statements;...}
break
continue
exit

2. Action print

format

print item1, item2, ...

explain:

GNU sed
The output item can be a string or a numeric value; The expression for the field, variable, or awk of the current record
If item is omitted, it is equivalent to print $0
Fixed characters need to be enclosed by "", while variables and numbers do not

Example: take out the top one with the largest number of website visits 3 individual IP

[root@VM_0_10_centos logs]# awk '{print $1}' nginx.access.log-20200428|sort | 
uniq -c |sort -nr|head -3
   5498 122.51.38.20
   2161 117.157.173.214
    953 211.159.177.120
[root@centos8 ~]#awk '{print $1}' access_log |sort |uniq -c|sort -nr|head 
   4870 172.20.116.228
   3429 172.20.116.208
   2834 172.20.0.222
   2613 172.20.112.14
   2267 172.20.0.227
   2262 172.20.116.179
   2259 172.20.65.65
   1565 172.20.0.76
   1482 172.20.0.200
   1110 172.20.28.145

Example: fetch partition utilization

[root@centos8 ~]# df | awk -F"[[:space:]]+|%" '{print $5}'
Use
0
0
1
0
3
19
1
0

Example: take the IP address in the ifconfig output result

[root@centos8 ~]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::20c:29ff:fe3d:d1e7  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:3d:d1:e7  txqueuelen 1000  (Ethernet)
        RX packets 24590  bytes 25224965 (24.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12793  bytes 4232673 (4.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@centos8 ~]# ifconfig eth0 | sed -n "2p"
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
[root@centos8 ~]# ifconfig eth0 | sed -n "2p" | awk '{print $2}'
10.0.0.85

[root@centos8 ~]# ifconfig eth0 | awk '/netmask/{print $2}'
10.0.0.85

[root@centos8 ~]# ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.85

3.awk variables

awk Variables in are divided into built-in and user-defined variables

3.1 common built-in variables

FS: enter the field separator, which is blank by default. The function is equivalent to - F

example:

[root@centos8 ~]#awk -v FS=":" '{print $1FS$3}' /etc/passwd |head -n3
root:0
bin:1
daemon:2

OFS: output field separator; blank character by default

example:

[root@centos8 ~]#awk -v FS=':'   '{print $1,$3,$7}'   /etc/passwd|head -n1
root 0 /bin/bash
[root@centos8 ~]#awk -v FS=':' -v OFS=':' '{print $1,$3,$7}'   
/etc/passwd|head -n1
root:0:/bin/bash

RS : enter record record Delimiter, specifying the newline character when entering

example:

awk -v RS=' ' '{print }' /etc/passwd

ORS : the output record separator, which replaces the newline character with the specified symbol

example:

awk -v RS=' ' -v ORS='###'  '{print $0}' /etc/passwd

NF: number of fields

example:

#When quoting a variable, you do not need to add before the variable$
[root@centos8 ~]#awk -F: '{print NF}' /etc/fstab 
[root@centos8 ~]#awk -F: '{print $(NF-1)}' /etc/passwd
[root@centos8 ~]#ls /misc/cd/BaseOS/Packages/*.rpm |awk -F"." '{print $(NF-
1)}'|sort |uniq -c
    389 i686
    208 noarch
   1060 x86_64

NR : record number

example:

[root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/centos-release
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.1.1911 (Core)

FNR : count each document separately and record the number

example:

awk '{print FNR}' /etc/fstab /etc/inittab
[root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.0.1905 (Core) 
[root@centos8 script40]#awk '{print FNR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
31 CentOS Linux release 8.0.1905 (Core)

FILENAME : current file name

example:

[root@centos8 ~]#awk '{print FILENAME}' /etc/fstab
[root@centos8 ~]#awk '{print FNR,FILENAME,$0}' /etc/issue /etc/redhat-release 
1 /etc/issue \S
2 /etc/issue Kernel \r on an \m
3 /etc/issue 
1 /etc/redhat-release CentOS Linux release 8.0.1905 (Core)

ARGC : number of command line arguments

example:

[root@centos8 ~]#awk '{print ARGC}' /etc/issue /etc/redhat-release 
3
3
3
3
[root@centos8 ~]#awk 'BEGIN{print ARGC}' /etc/issue /etc/redhat-release 
3

ARGV : array, which saves the parameters given by the command line. Each parameter: ARGV[0] ， ......

example:

[root@centos8 ~]#awk 'BEGIN{print ARGV[0]}' /etc/issue /etc/redhat-release 
awk
[root@centos8 ~]#awk 'BEGIN{print ARGV[1]}' /etc/issue /etc/redhat-release 
/etc/issue
[root@centos8 ~]#awk 'BEGIN{print ARGV[2]}' /etc/issue /etc/redhat-release 
/etc/redhat-release
[root@centos8 ~]#awk 'BEGIN{print ARGV[3]}' /etc/issue /etc/redhat-release 
[root@centos8 ~]#

3.2 user defined variables

Custom variables are case sensitive , Assign values in the following way

-v var=value
Directly defined in program

Example:

[root@centos8 ~]#awk -v test1=test2="hello,gawk" 'BEGIN{print test1,test2}'   
test2=hello,gawk 
[root@centos8 ~]#awk -v test1=test2="hello1,gawk" 
'BEGIN{test1=test2="hello2,gawk";print test1,test2}'   
hello2,gawk hello2,g

4. Action printf

printf Formatted output can be realized

Format:

printf "FORMAT", item1, item2, ...

explain:

FORMAT must be specified
No automatic line feed, line feed controller needs to be explicitly given \ n
FORMAT needs to specify FORMAT characters for each subsequent item

Formatter: and item One to one correspondence

%s : display string

%d, %i : display decimal integers

%f : display as floating point numbers

%e, %E : displays scientific count values

%c : display character ASCII code

%g, %G : displays values in scientific counting or floating-point form

%u : unsigned integer

%% : display % oneself

Modifier

#[.#] The first digit controls the width of the display; the second # Represents the precision after the decimal point, such as: %3.1f

- Left alignment (default right alignment), such as: %-15s

+ Displays the positive and negative symbols of the value For example: % + d

example:

awk -F:   '{printf "%s",$1}' /etc/passwd
awk -F:   '{printf "%s\n",$1}' /etc/passwd
awk -F:   '{printf "%20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s %10d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %s\n",$1}' /etc/passwd
awk -F:   '{printf "Username: %sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %25sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %-25sUID:%d\n",$1,$3}'

5. Operator

Arithmetic operators:

x+y, x-y, x*y, x/y, x^y, x%y

-x : convert to negative

+x : converts a string to a numeric value

String operator: unsigned operator, string concatenation

Assignment operator:

=, +=, -=, *=, /=, %=, ^= ， ++, --

example:

[root@centos8 ~]#awk 'BEGIN{i=0;print i++,i}'
0 1
[root@centos8 ~]#awk 'BEGIN{i=0;print ++i,i}'
1 1

Comparison operator:

==, !=, >, >=, <, <=

Example: odd, even lines

[root@centos8 ~]#seq 10 | awk 'NR%2==0'
2
4
6
8
10
[root@centos8 ~]#seq 10 | awk 'NR%2==1'
1
3
5
7
9

Pattern match:

~ Whether the left side matches the right side, including the relationship

!~ Mismatch

example:

[root@centos8 ~]#awk -F: '$0 ~ /root/{print $1}' /etc/passwd
[root@centos8 ~]#awk -F: '$0 ~ "^root"{print $1}' /etc/passwd
[root@centos8 ~]#awk '$0 !~ /root/'   /etc/passwd
[root@centos8 ~]#awk '/root/'   /etc/passwd
[root@centos8 ~]#awk -F: '/r/' /etc/passwd
[root@centos8 ~]#awk -F: '$3==0'     /etc/passwd
[root@centos8 ~]#df | awk -F"[[:space:]]+|%" '$0 ~ /^\/dev\/sd/{print $5}'
51
92
[root@centos8 ~]#ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.8

Logical operators:

And: && , and relationship

Or: || , or relationship

Non: ! , reverse

example:

[root@centos8 ~]#awk 'BEGIN{print !i}'
1
[root@centos8 ~]#awk -v i=10 'BEGIN{print !i}'
0
[root@centos8 ~]#awk -v i=-3 'BEGIN{print !i}'
0
[root@centos8 ~]#awk -v i=0 'BEGIN{print !i}'
1
[root@centos8 ~]#awk -v i=abc 'BEGIN{print !i}'
0

Conditional expression (ternary expression)

selector?if-true-expression:if-false-expression

6. PATTERN

PATTERN: according to pattern Condition, filter the matched rows, and then process them

If no: null pattern is specified, match each row

example :

[root@centos8 ~]#awk -F: '{print $1,$3}' /etc/passwd

/regular expression /: only the rows that can match the pattern are processed, and they need to be enclosed by / /

Relational expression: a relational expression whose result is true will be processed

True: the result is non-zero Value, non empty string

False: the result is an empty string or 0 value

line ranges: line ranges
It is not supported to use the line number directly, but you can use the variable NR to specify the line number indirectly

/ pat1/,/pat2 / do not support direct number format

BEGIN/END mode

BEGIN {}: execute only once before starting to process text in the file

END {}: execute only once after text processing is completed

7. Conditional judgment if else

Syntax:

if(condition){statement;...}[else statement]
if(condition1){statement1}else if(condition2){statement2}else if(condition3)
{statement3}...... else {statementN}

Usage scenarios: Yes awk Get the whole row or a field for conditional judgment

8. Condition judgment switch

Syntax:

switch(expression) {case VALUE1 or /REGEXP/: statement1; case VALUE2 or 
/REGEXP2/: statement2; ...; default: statementn}

9. Loop while

Syntax:

while (condition) {statement;...}

condition " really " , enter the cycle; condition " false " , exit the loop

Usage scenario:

Used for similar processing of multiple fields in a row one by one

Used when each element in the array is processed one by one

10. Cycle do while

Syntax:

do {statement;...}while(condition)

Meaning: whether true or false, execute the loop body at least once

do-while loop

Syntax: do {statement;...}while(condition)

Meaning: whether true or false, execute the loop body at least once

11. Cycle for

Syntax:

for(expr1;expr2;expr3) {statement;...}

Common usage:

for(variable assignment;condition;iteration process) {for-body}

Special usage: can traverse the elements in the array

for(var in array) {for-body}

12.continue and break

continue Interrupt this cycle

break Interrupt the entire cycle

Format:

continue [n]
break [n]

13.next

next You can end the processing of this line in advance and proceed directly to the next line( awk Self circulation)

14. Array

awk The array of is associative

format

array_name[index-expression]

index-expression

Using array to realize k/v function
Any string can be used; The string should be enclosed in double quotes
If an array element does not exist in advance, awk will automatically create this element when referencing and initialize its value to "empty string"
To determine whether an element exists in the array, use the "index in array" format for traversal

15.awk function

awk functions are divided into built-in and user-defined functions

Official documents

https://www.gnu.org/software/gawk/manual/gawk.html#Functions

15.1 common built-in functions

Numerical processing:

rand(): Returns a random number between 0 and 1
srand(): coordination rand() function,Generating seeds of random numbers
int(): Returns an integer

string manipulation:

length([s]): Returns the length of the specified string
sub(r,s,[t]): yes t String search r Represents the content of the pattern match and replaces the first match with s
gsub(r,s,[t]): yes t Search string r Represents the content of the pattern match, and all are replaced with s Content represented
split(s,array,[r]): with r Is the delimiter, cutting the string s，And save the cutting results to array In the array represented by, the
 One index value is 1,The second index value is 2,...

You can invoke the shell command in awk.

system('cmd')

The space is awk String connector in, if system Required in awk Variables in can be separated by spaces, or

except awk Use all variables except "" Quote

Time function

Official documents : Time function

https://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions

systime() The number of seconds from the current time to January 1, 1970
strftime() Specify time format

15.2 user defined functions

Custom function format:

function name ( parameter, parameter, ... ) {
   statements
   return expression
}

16.awk script

take awk The program is written as a script and called or executed directly

towards awk Script pass parameters

Format:

awkfile  var=value  var2=value2... Inputfile

Note: in BEGIN Not available during. Variables are not available until the first line of input is complete. Can pass -v Parameters, let awk is executing BEGIN Get the value of the variable before. One is required for each specified variable on the command line -v parameter

Keywords: Linux Operation & Maintenance bash

Added by robdavies on Tue, 04 Jan 2022 18:03:50 +0200

Programming VIP

awk of three swordsmen in text processing

1.awk working principle and basic usage

2. Action print

3.awk variables

3.1 common built-in variables

3.2 user defined variables

4. Action printf

5. Operator

6. PATTERN

7. Conditional judgment if else

8. Condition judgment switch

9. Loop while

10. Cycle do while

11. Cycle for

12.continue and break

13.next

14. Array

15.awk function

15.1 common built-in functions

15.2 user defined functions

16.awk script

Popular Keywords