1.awk working principle and basic usage
awk
:
Aho, Weinberger, Kernighan
, report generator, formatted text output,
GNU/Linux
Published
AWK is currently funded by the free software foundation(
FSF
)Development and maintenance, commonly known as
GNU AWK
There are several versions:
- AWK: AWK originally from at & T laboratory
- NAWK: New awk, an upgraded version of AWK at & T labs
- GAWK: GNU AWK. All GNU/Linux distributions come with GAWK, which is fully compatible with AWK and NAWK
GNU AWK
User manual documentation
https://www.gnu.org/software/gawk/manual/gawk.html
gawk
: mode scanning and processing language, which can realize the following functions:
- text processing
- Output formatted text report
- Perform arithmetic operations
- Perform string operations
Format:
awk [options] 'program' var=value file... awk [options] -f programfile var=value file...
explain:
program
It is usually placed in single quotation marks and can be composed of three parts
- BEGIN statement block
- General statement block for pattern matching
- END statement block
Common options:
- -F "separator" indicates the field separator used in input. The default separator is several consecutive white space characters
- -v var=value variable assignment
Program
Format:
pattern{action statements;..}
pattern
: determines when an action statement triggers an event, such as:
BEGIN,END,
Regular expressions, etc
action statements
: process the data and put it in
{}
Common:
print, printf
awk
working process
Step 1: Execute
BEGIN{action;... }
Statements in a statement block
Step 2: import from file or standard
(stdin)
Read a row and execute
Pattern {action;...} statement block, which scans the file line by line and repeats the process from the first line to the last line until all the files are read.
Step 3: when reading to the end of the input stream, execute
END{action;...}
Statement block
BEGIN
Statement block in
awk is executed before reading rows from the input stream. This is an optional statement block, such as variable initialization, printout table header and so on. Statements can usually be written in
BEGIN
In a statement block
END
Statement block in
awk is executed after reading all rows from the input stream, such as printing the analysis results of all rows. Such information is summarized in
END
Statement block, which is also an optional statement block
pattern
The general commands in the statement block are the most important and optional. If not provided
pattern
Statement block, execute {print} by default
That is, print each read line,
awk
The statement block is executed for each row read
Delimiters, fields, and records
- Fields separated by separators (column, field) are marked with $1, ...$n is called domain ID, and $0 is all domains. Note: the meaning of variable $in shell is different from that of variable $in shell
- Each line of the file is called record
- If action is omitted, print $0 will be executed by default
frequently-used
action
classification
- output statements: print,printf
- Expressions: arithmetic, comparison expressions, etc
- Compound statements: compound statements
- Control statements: if, while, etc
- input statements
awk
Control statement
- {statements;...} combined statements
- if(condition) {statements;...}
- if(condition) {statements;...} else {statements;...}
- while(conditon) {statments;...}
- do {statements;...} while(condition)
- for(expr1;expr2;expr3) {statements;...}
- break
- continue
- exit
2. Action print
format
print item1, item2, ...
explain:
- GNU sed
- The output item can be a string or a numeric value; The expression for the field, variable, or awk of the current record
- If item is omitted, it is equivalent to print $0
- Fixed characters need to be enclosed by "", while variables and numbers do not
Example: take out the top one with the largest number of website visits
3
individual
IP
[root@VM_0_10_centos logs]# awk '{print $1}' nginx.access.log-20200428|sort | uniq -c |sort -nr|head -3 5498 122.51.38.20 2161 117.157.173.214 953 211.159.177.120 [root@centos8 ~]#awk '{print $1}' access_log |sort |uniq -c|sort -nr|head 4870 172.20.116.228 3429 172.20.116.208 2834 172.20.0.222 2613 172.20.112.14 2267 172.20.0.227 2262 172.20.116.179 2259 172.20.65.65 1565 172.20.0.76 1482 172.20.0.200 1110 172.20.28.145
Example: fetch partition utilization
[root@centos8 ~]# df | awk -F"[[:space:]]+|%" '{print $5}' Use 0 0 1 0 3 19 1 0
Example: take the IP address in the ifconfig output result
[root@centos8 ~]# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.85 netmask 255.255.255.0 broadcast 10.0.0.255 inet6 fe80::20c:29ff:fe3d:d1e7 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:3d:d1:e7 txqueuelen 1000 (Ethernet) RX packets 24590 bytes 25224965 (24.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 12793 bytes 4232673 (4.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@centos8 ~]# ifconfig eth0 | sed -n "2p" inet 10.0.0.85 netmask 255.255.255.0 broadcast 10.0.0.255 [root@centos8 ~]# ifconfig eth0 | sed -n "2p" | awk '{print $2}' 10.0.0.85 [root@centos8 ~]# ifconfig eth0 | awk '/netmask/{print $2}' 10.0.0.85 [root@centos8 ~]# ifconfig eth0 | awk 'NR==2{print $2}' 10.0.0.85
3.awk variables
awk
Variables in are divided into built-in and user-defined variables
3.1 common built-in variables
- FS: enter the field separator, which is blank by default. The function is equivalent to - F
example:
[root@centos8 ~]#awk -v FS=":" '{print $1FS$3}' /etc/passwd |head -n3 root:0 bin:1 daemon:2
- OFS: output field separator; blank character by default
example:
[root@centos8 ~]#awk -v FS=':' '{print $1,$3,$7}' /etc/passwd|head -n1 root 0 /bin/bash [root@centos8 ~]#awk -v FS=':' -v OFS=':' '{print $1,$3,$7}' /etc/passwd|head -n1 root:0:/bin/bash
-
RS : enter record record Delimiter, specifying the newline character when entering
example:
awk -v RS=' ' '{print }' /etc/passwd
-
ORS : the output record separator, which replaces the newline character with the specified symbol
example:
awk -v RS=' ' -v ORS='###' '{print $0}' /etc/passwd
-
NF: number of fields
example:
#When quoting a variable, you do not need to add before the variable$ [root@centos8 ~]#awk -F: '{print NF}' /etc/fstab [root@centos8 ~]#awk -F: '{print $(NF-1)}' /etc/passwd [root@centos8 ~]#ls /misc/cd/BaseOS/Packages/*.rpm |awk -F"." '{print $(NF- 1)}'|sort |uniq -c 389 i686 208 noarch 1060 x86_64
-
NR : record number
example:
[root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/centos-release 1 \S 2 Kernel \r on an \m 34 CentOS Linux release 8.1.1911 (Core)
-
FNR : count each document separately and record the number
example:
awk '{print FNR}' /etc/fstab /etc/inittab [root@centos8 ~]#awk '{print NR,$0}' /etc/issue /etc/redhat-release 1 \S 2 Kernel \r on an \m 34 CentOS Linux release 8.0.1905 (Core) [root@centos8 script40]#awk '{print FNR,$0}' /etc/issue /etc/redhat-release 1 \S 2 Kernel \r on an \m 31 CentOS Linux release 8.0.1905 (Core)
-
FILENAME : current file name
example:
[root@centos8 ~]#awk '{print FILENAME}' /etc/fstab [root@centos8 ~]#awk '{print FNR,FILENAME,$0}' /etc/issue /etc/redhat-release 1 /etc/issue \S 2 /etc/issue Kernel \r on an \m 3 /etc/issue 1 /etc/redhat-release CentOS Linux release 8.0.1905 (Core)
-
ARGC : number of command line arguments
example:
[root@centos8 ~]#awk '{print ARGC}' /etc/issue /etc/redhat-release 3 3 3 3 [root@centos8 ~]#awk 'BEGIN{print ARGC}' /etc/issue /etc/redhat-release 3
-
ARGV : array, which saves the parameters given by the command line. Each parameter: ARGV[0] , ......
example:
[root@centos8 ~]#awk 'BEGIN{print ARGV[0]}' /etc/issue /etc/redhat-release awk [root@centos8 ~]#awk 'BEGIN{print ARGV[1]}' /etc/issue /etc/redhat-release /etc/issue [root@centos8 ~]#awk 'BEGIN{print ARGV[2]}' /etc/issue /etc/redhat-release /etc/redhat-release [root@centos8 ~]#awk 'BEGIN{print ARGV[3]}' /etc/issue /etc/redhat-release [root@centos8 ~]#
3.2 user defined variables
Custom variables are case sensitive
,
Assign values in the following way
- -v var=value
- Directly defined in program
Example:
[root@centos8 ~]#awk -v test1=test2="hello,gawk" 'BEGIN{print test1,test2}' test2=hello,gawk [root@centos8 ~]#awk -v test1=test2="hello1,gawk" 'BEGIN{test1=test2="hello2,gawk";print test1,test2}' hello2,gawk hello2,g
4. Action printf
printf
Formatted output can be realized
Format:
printf "FORMAT", item1, item2, ...
explain:
- FORMAT must be specified
- No automatic line feed, line feed controller needs to be explicitly given \ n
- FORMAT needs to specify FORMAT characters for each subsequent item
Formatter: and
item
One to one correspondence
%s
: display string
%d, %i
: display decimal integers
%f
: display as floating point numbers
%e, %E
: displays scientific count values
%c
: display character
ASCII
code
%g, %G
: displays values in scientific counting or floating-point form
%u
: unsigned integer
%%
: display
%
oneself
Modifier
#[.#]
The first digit controls the width of the display; the second
#
Represents the precision after the decimal point, such as:
%3.1f
-
Left alignment (default right alignment), such as:
%-15s
+
Displays the positive and negative symbols of the value
For example:
%
+
d
example:
awk -F: '{printf "%s",$1}' /etc/passwd awk -F: '{printf "%s\n",$1}' /etc/passwd awk -F: '{printf "%20s\n",$1}' /etc/passwd awk -F: '{printf "%-20s\n",$1}' /etc/passwd awk -F: '{printf "%-20s %10d\n",$1,$3}' /etc/passwd awk -F: '{printf "Username: %s\n",$1}' /etc/passwd awk -F: '{printf "Username: %sUID:%d\n",$1,$3}' /etc/passwd awk -F: '{printf "Username: %25sUID:%d\n",$1,$3}' /etc/passwd awk -F: '{printf "Username: %-25sUID:%d\n",$1,$3}'
5. Operator
Arithmetic operators:
x+y, x-y, x*y, x/y, x^y, x%y
-x
: convert to negative
+x
: converts a string to a numeric value
String operator: unsigned operator, string concatenation
Assignment operator:
=, +=, -=, *=, /=, %=, ^=
,
++, --
example:
[root@centos8 ~]#awk 'BEGIN{i=0;print i++,i}' 0 1 [root@centos8 ~]#awk 'BEGIN{i=0;print ++i,i}' 1 1
Comparison operator:
==, !=, >, >=, <, <=
Example: odd, even lines
[root@centos8 ~]#seq 10 | awk 'NR%2==0' 2 4 6 8 10 [root@centos8 ~]#seq 10 | awk 'NR%2==1' 1 3 5 7 9
Pattern match:
~
Whether the left side matches the right side, including the relationship
!~
Mismatch
example:
[root@centos8 ~]#awk -F: '$0 ~ /root/{print $1}' /etc/passwd [root@centos8 ~]#awk -F: '$0 ~ "^root"{print $1}' /etc/passwd [root@centos8 ~]#awk '$0 !~ /root/' /etc/passwd [root@centos8 ~]#awk '/root/' /etc/passwd [root@centos8 ~]#awk -F: '/r/' /etc/passwd [root@centos8 ~]#awk -F: '$3==0' /etc/passwd [root@centos8 ~]#df | awk -F"[[:space:]]+|%" '$0 ~ /^\/dev\/sd/{print $5}' 51 92 [root@centos8 ~]#ifconfig eth0 | awk 'NR==2{print $2}' 10.0.0.8
Logical operators:
And:
&&
, and relationship
Or:
||
, or relationship
Non:
!
, reverse
example:
[root@centos8 ~]#awk 'BEGIN{print !i}' 1 [root@centos8 ~]#awk -v i=10 'BEGIN{print !i}' 0 [root@centos8 ~]#awk -v i=-3 'BEGIN{print !i}' 0 [root@centos8 ~]#awk -v i=0 'BEGIN{print !i}' 1 [root@centos8 ~]#awk -v i=abc 'BEGIN{print !i}' 0
Conditional expression (ternary expression)
selector?if-true-expression:if-false-expression
6. PATTERN
PATTERN:
according to
pattern
Condition, filter the matched rows, and then process them
- If no: null pattern is specified, match each row
example
:
[root@centos8 ~]#awk -F: '{print $1,$3}' /etc/passwd
- /regular expression /: only the rows that can match the pattern are processed, and they need to be enclosed by / /
- Relational expression: a relational expression whose result is true will be processed
True: the result is non-zero
Value, non empty string
False: the result is an empty string or 0
value
- line ranges: line ranges
- It is not supported to use the line number directly, but you can use the variable NR to specify the line number indirectly
/ pat1/,/pat2 / do not support direct number format
- BEGIN/END mode
BEGIN {}: execute only once before starting to process text in the file
END {}: execute only once after text processing is completed
7. Conditional judgment if else
Syntax:
if(condition){statement;...}[else statement] if(condition1){statement1}else if(condition2){statement2}else if(condition3) {statement3}...... else {statementN}
Usage scenarios: Yes
awk
Get the whole row or a field for conditional judgment
8. Condition judgment switch
Syntax:
switch(expression) {case VALUE1 or /REGEXP/: statement1; case VALUE2 or /REGEXP2/: statement2; ...; default: statementn}
9. Loop while
Syntax:
while (condition) {statement;...}
condition
"
really
"
, enter the cycle; condition
"
false
"
, exit the loop
Usage scenario:
Used for similar processing of multiple fields in a row one by one
Used when each element in the array is processed one by one
10. Cycle do while
Syntax:
do {statement;...}while(condition)
Meaning: whether true or false, execute the loop body at least once
do-while
loop
Syntax:
do {statement;...}while(condition)
Meaning: whether true or false, execute the loop body at least once
11. Cycle for
Syntax:
for(expr1;expr2;expr3) {statement;...}
Common usage:
for(variable assignment;condition;iteration process) {for-body}
Special usage: can traverse the elements in the array
for(var in array) {for-body}
12.continue and break
continue
Interrupt this cycle
break
Interrupt the entire cycle
Format:
continue [n] break [n]
13.next
next
You can end the processing of this line in advance and proceed directly to the next line(
awk
Self circulation)
14. Array
awk
The array of is associative
format
array_name[index-expression]
index-expression
- Using array to realize k/v function
- Any string can be used; The string should be enclosed in double quotes
- If an array element does not exist in advance, awk will automatically create this element when referencing and initialize its value to "empty string"
- To determine whether an element exists in the array, use the "index in array" format for traversal
15.awk function
awk functions are divided into built-in and user-defined functions
Official documents
https://www.gnu.org/software/gawk/manual/gawk.html#Functions
15.1 common built-in functions
- Numerical processing:
rand(): Returns a random number between 0 and 1 srand(): coordination rand() function,Generating seeds of random numbers int(): Returns an integer
- string manipulation:
length([s]): Returns the length of the specified string sub(r,s,[t]): yes t String search r Represents the content of the pattern match and replaces the first match with s gsub(r,s,[t]): yes t Search string r Represents the content of the pattern match, and all are replaced with s Content represented split(s,array,[r]): with r Is the delimiter, cutting the string s,And save the cutting results to array In the array represented by, the One index value is 1,The second index value is 2,...
- You can invoke the shell command in awk.
system('cmd')
The space is
awk
String connector in, if
system
Required in
awk
Variables in can be separated by spaces, or
except
awk
Use all variables except
""
Quote
- Time function
Official documents
:
Time function
https://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions
systime() The number of seconds from the current time to January 1, 1970 strftime() Specify time format
15.2 user defined functions
Custom function format:
function name ( parameter, parameter, ... ) { statements return expression }
16.awk script
take
awk
The program is written as a script and called or executed directly
towards
awk
Script pass parameters
Format:
awkfile var=value var2=value2... Inputfile
Note: in
BEGIN
Not available during. Variables are not available until the first line of input is complete. Can pass
-v
Parameters, let
awk is executing BEGIN
Get the value of the variable before. One is required for each specified variable on the command line
-v
parameter