awk of three swordsmen in text processing

1.awk working principle and basic usage

awk : Aho, Weinberger, Kernighan , report generator, formatted text output, GNU/Linux Published AWK is currently funded by the free software foundation( FSF )Development and maintenance, commonly known as GNU AWK
There are several versions:
  • AWK: AWK originally from at & T laboratory
  • NAWK: New awk, an upgraded version of AWK at & T labs
  • GAWK: GNU AWK. All GNU/Linux distributions come with GAWK, which is fully compatible with AWK and NAWK
GNU AWK User manual documentation
https://www.gnu.org/software/gawk/manual/gawk.html
gawk : mode scanning and processing language, which can realize the following functions:
  • text processing
  • Output formatted text report
  • Perform arithmetic operations
  • Perform string operations
Format:
awk [options]   'program' var=value   file...
awk [options]   -f programfile    var=value file...
explain:
program It is usually placed in single quotation marks and can be composed of three parts
  • BEGIN statement block
  • General statement block for pattern matching
  • END statement block

Common options:
  • -F "separator" indicates the field separator used in input. The default separator is several consecutive white space characters
  • -v var=value variable assignment
Program Format:
pattern{action statements;..}
pattern : determines when an action statement triggers an event, such as: BEGIN,END, Regular expressions, etc
action statements : process the data and put it in {} Common: print, printf
awk working process
Step 1: Execute BEGIN{action;... } Statements in a statement block
Step 2: import from file or standard (stdin) Read a row and execute Pattern {action;...} statement block, which scans the file line by line and repeats the process from the first line to the last line until all the files are read.
Step 3: when reading to the end of the input stream, execute END{action;...} Statement block
BEGIN Statement block in awk is executed before reading rows from the input stream. This is an optional statement block, such as variable initialization, printout table header and so on. Statements can usually be written in BEGIN In a statement block
END Statement block in awk is executed after reading all rows from the input stream, such as printing the analysis results of all rows. Such information is summarized in END Statement block, which is also an optional statement block
pattern The general commands in the statement block are the most important and optional. If not provided pattern Statement block, execute {print} by default That is, print each read line, awk The statement block is executed for each row read

Delimiters, fields, and records
  • Fields separated by separators (column, field) are marked with $1, ...$n is called domain ID, and $0 is all domains. Note: the meaning of variable $in shell is different from that of variable $in shell
  • Each line of the file is called record
  • If action is omitted, print $0 will be executed by default
frequently-used action classification
  • output statements: print,printf
  • Expressions: arithmetic, comparison expressions, etc
  • Compound statements: compound statements
  • Control statements: if, while, etc
  • input statements
awk Control statement
  • {statements;...} combined statements
  • if(condition) {statements;...}
  • if(condition) {statements;...} else {statements;...}
  • while(conditon) {statments;...}
  • do {statements;...} while(condition)
  • for(expr1;expr2;expr3) {statements;...}
  • break
  • continue
  • exit

2. Action print

format
print item1, item2, ...
explain:
  • GNU sed
  • The output item can be a string or a numeric value; The expression for the field, variable, or awk of the current record
  • If item is omitted, it is equivalent to print $0
  • Fixed characters need to be enclosed by "", while variables and numbers do not
Example: take out the top one with the largest number of website visits 3 individual IP
[root@VM_0_10_centos logs]# awk '{print $1}' nginx.access.log-20200428|sort | 
uniq -c |sort -nr|head -3
   5498 122.51.38.20
   2161 117.157.173.214
    953 211.159.177.120
[root@centos8 ~]#awk '{print $1}' access_log |sort |uniq -c|sort -nr|head 
   4870 172.20.116.228
   3429 172.20.116.208
   2834 172.20.0.222
   2613 172.20.112.14
   2267 172.20.0.227
   2262 172.20.116.179
   2259 172.20.65.65
   1565 172.20.0.76
   1482 172.20.0.200
   1110 172.20.28.145
Example: fetch partition utilization
[root@centos8 ~]# df | awk -F"[[:space:]]+|%" '{print $5}'
Use
0
0
1
0
3
19
1
0

Example: take the IP address in the ifconfig output result

[root@centos8 ~]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::20c:29ff:fe3d:d1e7  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:3d:d1:e7  txqueuelen 1000  (Ethernet)
        RX packets 24590  bytes 25224965 (24.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12793  bytes 4232673 (4.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[root@centos8 ~]# ifconfig eth0 | sed -n "2p"
        inet 10.0.0.85  netmask 255.255.255.0  broadcast 10.0.0.255
[root@centos8 ~]# ifconfig eth0 | sed -n "2p" | awk '{print $2}'
10.0.0.85

[root@centos8 ~]# ifconfig eth0 | awk '/netmask/{print $2}'
10.0.0.85

[root@centos8 ~]# ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.85

3.awk variables

awk Variables in are divided into built-in and user-defined variables

3.1 common built-in variables

  • FS: enter the field separator, which is blank by default. The function is equivalent to - F

example:

[root@centos8 ~]#awk -v FS=":" '{print $1FS$3}' /etc/passwd |head -n3
root:0
bin:1
daemon:2
  • OFS: output field separator; blank character by default

example:

[root@centos8 ~]#awk -v FS=':'   '{print $1,$3,$7}'   /etc/passwd|head -n1
root 0 /bin/bash
[root@centos8 ~]#awk -v FS=':' -v OFS=':' '{print $1,$3,$7}'   
/etc/passwd|head -n1
root:0:/bin/bash
  • RS : enter record record Delimiter, specifying the newline character when entering
example:
awk -v RS=' ' '{print }' /etc/passwd
  • ORS : the output record separator, which replaces the newline character with the specified symbol
example:
awk -v RS=' ' -v ORS='###'  '{print $0}' /etc/passwd
  • NF: number of fields
example:
#When quoting a variable, you do not need to add before the variable$
[root@centos8 ~]#awk -F: '{print NF}' /etc/fstab 
[root@centos8 ~]#awk -F: '{print $(NF-1)}' /etc/passwd
[root@centos8 ~]#ls /misc/cd/BaseOS/Packages/*.rpm |awk -F"." '{print $(NF-
1)}'|sort |uniq -c
    389 i686
    208 noarch
   1060 x86_64
  • NR : record number
example:
[root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/centos-release
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.1.1911 (Core)
  •  FNR : count each document separately and record the number
example:
awk '{print FNR}' /etc/fstab /etc/inittab
[root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
34 CentOS Linux release 8.0.1905 (Core) 
[root@centos8 script40]#awk '{print FNR,$0}' /etc/issue /etc/redhat-release 
1 \S
2 Kernel \r on an \m
31 CentOS Linux release 8.0.1905 (Core)
  • FILENAME : current file name
example:
[root@centos8 ~]#awk '{print FILENAME}' /etc/fstab
[root@centos8 ~]#awk '{print FNR,FILENAME,$0}' /etc/issue /etc/redhat-release 
1 /etc/issue \S
2 /etc/issue Kernel \r on an \m
3 /etc/issue 
1 /etc/redhat-release CentOS Linux release 8.0.1905 (Core)
  • ARGC : number of command line arguments
example:
[root@centos8 ~]#awk '{print ARGC}' /etc/issue /etc/redhat-release 
3
3
3
3
[root@centos8 ~]#awk 'BEGIN{print ARGC}' /etc/issue /etc/redhat-release 
3
  • ARGV : array, which saves the parameters given by the command line. Each parameter: ARGV[0] , ......
example:
[root@centos8 ~]#awk 'BEGIN{print ARGV[0]}' /etc/issue /etc/redhat-release 
awk
[root@centos8 ~]#awk 'BEGIN{print ARGV[1]}' /etc/issue /etc/redhat-release 
/etc/issue
[root@centos8 ~]#awk 'BEGIN{print ARGV[2]}' /etc/issue /etc/redhat-release 
/etc/redhat-release
[root@centos8 ~]#awk 'BEGIN{print ARGV[3]}' /etc/issue /etc/redhat-release 
[root@centos8 ~]#

3.2 user defined variables

Custom variables are case sensitive , Assign values in the following way
  • -v var=value
  • Directly defined in program
Example:
[root@centos8 ~]#awk -v test1=test2="hello,gawk" 'BEGIN{print test1,test2}'   
test2=hello,gawk 
[root@centos8 ~]#awk -v test1=test2="hello1,gawk" 
'BEGIN{test1=test2="hello2,gawk";print test1,test2}'   
hello2,gawk hello2,g

4. Action printf

printf Formatted output can be realized
Format:
printf "FORMAT", item1, item2, ...
explain:
  • FORMAT must be specified
  • No automatic line feed, line feed controller needs to be explicitly given \ n
  • FORMAT needs to specify FORMAT characters for each subsequent item
Formatter: and item One to one correspondence
%s : display string
%d, %i : display decimal integers
%f : display as floating point numbers
%e, %E : displays scientific count values
%c : display character ASCII code
%g, %G : displays values in scientific counting or floating-point form
%u : unsigned integer
%% : display % oneself
Modifier
#[.#] The first digit controls the width of the display; the second # Represents the precision after the decimal point, such as: %3.1f
- Left alignment (default right alignment), such as: %-15s
+   Displays the positive and negative symbols of the value For example: % + d
example:
awk -F:   '{printf "%s",$1}' /etc/passwd
awk -F:   '{printf "%s\n",$1}' /etc/passwd
awk -F:   '{printf "%20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s\n",$1}' /etc/passwd
awk -F:   '{printf "%-20s %10d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %s\n",$1}' /etc/passwd
awk -F:   '{printf "Username: %sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %25sUID:%d\n",$1,$3}' /etc/passwd
awk -F:   '{printf "Username: %-25sUID:%d\n",$1,$3}'

5. Operator

Arithmetic operators:

x+y, x-y, x*y, x/y, x^y, x%y
-x : convert to negative
+x : converts a string to a numeric value
String operator: unsigned operator, string concatenation
Assignment operator:
=, +=, -=, *=, /=, %=, ^= , ++, --
example:
[root@centos8 ~]#awk 'BEGIN{i=0;print i++,i}'
0 1
[root@centos8 ~]#awk 'BEGIN{i=0;print ++i,i}'
1 1
Comparison operator:
==, !=, >, >=, <, <=
Example: odd, even lines
[root@centos8 ~]#seq 10 | awk 'NR%2==0'
2
4
6
8
10
[root@centos8 ~]#seq 10 | awk 'NR%2==1'
1
3
5
7
9
Pattern match:
~ Whether the left side matches the right side, including the relationship
!~ Mismatch
example:
[root@centos8 ~]#awk -F: '$0 ~ /root/{print $1}' /etc/passwd
[root@centos8 ~]#awk -F: '$0 ~ "^root"{print $1}' /etc/passwd
[root@centos8 ~]#awk '$0 !~ /root/'   /etc/passwd
[root@centos8 ~]#awk '/root/'   /etc/passwd
[root@centos8 ~]#awk -F: '/r/' /etc/passwd
[root@centos8 ~]#awk -F: '$3==0'     /etc/passwd
[root@centos8 ~]#df | awk -F"[[:space:]]+|%" '$0 ~ /^\/dev\/sd/{print $5}'
51
92
[root@centos8 ~]#ifconfig eth0 | awk 'NR==2{print $2}'
10.0.0.8
Logical operators:
And: && , and relationship
Or: || , or relationship
Non: ! , reverse
example:
[root@centos8 ~]#awk 'BEGIN{print !i}'
1
[root@centos8 ~]#awk -v i=10 'BEGIN{print !i}'
0
[root@centos8 ~]#awk -v i=-3 'BEGIN{print !i}'
0
[root@centos8 ~]#awk -v i=0 'BEGIN{print !i}'
1
[root@centos8 ~]#awk -v i=abc 'BEGIN{print !i}'
0
Conditional expression (ternary expression)
selector?if-true-expression:if-false-expression

6. PATTERN

PATTERN: according to pattern Condition, filter the matched rows, and then process them
  • If no: null pattern is specified, match each row
example :
[root@centos8 ~]#awk -F: '{print $1,$3}' /etc/passwd
  • /regular expression /: only the rows that can match the pattern are processed, and they need to be enclosed by / /
  • Relational expression: a relational expression whose result is true will be processed
True: the result is non-zero Value, non empty string
False: the result is an empty string or 0 value
  • line ranges: line ranges
  • It is not supported to use the line number directly, but you can use the variable NR to specify the line number indirectly
/ pat1/,/pat2 / do not support direct number format
  • BEGIN/END mode
BEGIN {}: execute only once before starting to process text in the file
END {}: execute only once after text processing is completed

7. Conditional judgment if else

Syntax:
if(condition){statement;...}[else statement]
if(condition1){statement1}else if(condition2){statement2}else if(condition3)
{statement3}...... else {statementN}
Usage scenarios: Yes awk Get the whole row or a field for conditional judgment

8. Condition judgment switch

Syntax:
switch(expression) {case VALUE1 or /REGEXP/: statement1; case VALUE2 or 
/REGEXP2/: statement2; ...; default: statementn}

9. Loop while

Syntax:
while (condition) {statement;...}
condition " really " , enter the cycle; condition " false " , exit the loop
Usage scenario:
Used for similar processing of multiple fields in a row one by one
Used when each element in the array is processed one by one

10. Cycle do while

Syntax:
do {statement;...}while(condition)
Meaning: whether true or false, execute the loop body at least once
do-while loop
Syntax: do {statement;...}while(condition)
Meaning: whether true or false, execute the loop body at least once

11. Cycle for

Syntax:
for(expr1;expr2;expr3) {statement;...}
Common usage:
for(variable assignment;condition;iteration process) {for-body}

Special usage: can traverse the elements in the array

for(var in array) {for-body}

12.continue and break

continue Interrupt this cycle
break Interrupt the entire cycle
Format:
continue [n]
break [n]

13.next

next You can end the processing of this line in advance and proceed directly to the next line( awk Self circulation)

14. Array

awk The array of is associative
format
array_name[index-expression]
index-expression
  • Using array to realize k/v function
  • Any string can be used; The string should be enclosed in double quotes
  • If an array element does not exist in advance, awk will automatically create this element when referencing and initialize its value to "empty string"
  • To determine whether an element exists in the array, use the "index in array" format for traversal

15.awk function

awk functions are divided into built-in and user-defined functions

Official documents
https://www.gnu.org/software/gawk/manual/gawk.html#Functions

15.1 common built-in functions

  • Numerical processing:
rand(): Returns a random number between 0 and 1
srand(): coordination rand() function,Generating seeds of random numbers
int(): Returns an integer
  • string manipulation:
length([s]): Returns the length of the specified string
sub(r,s,[t]): yes t String search r Represents the content of the pattern match and replaces the first match with s
gsub(r,s,[t]): yes t Search string r Represents the content of the pattern match, and all are replaced with s Content represented
split(s,array,[r]): with r Is the delimiter, cutting the string s,And save the cutting results to array In the array represented by, the
 One index value is 1,The second index value is 2,...
  • You can invoke the shell command in awk.
system('cmd')
The space is awk String connector in, if system Required in awk Variables in can be separated by spaces, or
except awk Use all variables except "" Quote
  • Time function
Official documents : Time function
https://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions
systime() The number of seconds from the current time to January 1, 1970
strftime() Specify time format  

15.2 user defined functions

Custom function format:
function name ( parameter, parameter, ... ) {
   statements
   return expression
}

16.awk script

take awk The program is written as a script and called or executed directly
towards awk Script pass parameters
Format:
awkfile  var=value  var2=value2... Inputfile
Note: in BEGIN Not available during. Variables are not available until the first line of input is complete. Can pass -v Parameters, let awk is executing BEGIN Get the value of the variable before. One is required for each specified variable on the command line -v parameter

Keywords: Linux Operation & Maintenance bash

Added by robdavies on Tue, 04 Jan 2022 18:03:50 +0200