awk report generator for Linux text processing three swordsmen


1, awk

1. General

2. Working principle:

3.awk built in variables

4. Other built-in variables

2, Examples

awk operation:

Fuzzy matching:

Comparison between numeric value and string:

Logical operation & & and |:

3, Advanced usage of awk

1. Define reference variables

2. Conditional statements

4, Other examples

1, awk

1. General

Origin: it was born in Bell Labs in the 1970s. Now centos7 uses gawk. It is called AWK because it takes the first character of the Family Name of three founders, Alfred Aho, Peter weinberger and Brian Kernighan.

Overview: AWK is a language for processing text files and a powerful text analysis tool. The programming language specially designed for text processing is also line processing software. It is usually used for scanning, filtering and statistical summary. The data can come from standard input, pipeline or file

2. Working principle:

When the first row is read, match the conditions, then execute the specified action, and then read the second row for data processing, which will not be output by default; If no matching condition is defined, the default is to match all data rows. awk implies a loop, and the action will be executed as many times as the condition is matched

Read the text line by line. By default, it is separated by space or tab key. Save the separated fields to the built-in variable, and execute the editing command according to the mode or condition.

sed Commands are often used to process a whole line;
awk Prefer to divide a row into multiple""field"Then deal with it.
awk The information is also read in line by line, and the execution result can be passed print The function of will print and display the field data.
in use awk During command,You can use logical operators
"||": or
"! ": wrong 
It can also carry out simple mathematical operations
 as+,I*,/,%,^Denotes addition, subtraction, multiplication, division, remainder and power respectively.

Command format:

awk  option  'Mode or condition{operation}'  File 1 file 2  ...
awk -f  Script file 1 file 2 ..

Format: awk keyword option command part '{XXX}' file name

3.awk built in variables

FS:Specifies the field separator for each line of text, which defaults to spaces or tab stops
NF:Number of fields in the currently processed row [number of columns]
NR:The line number of the currently processed line(Ordinal number)[Number of rows]
$0:Entire line content of the currently processed line [entire line]
$1:First column
$2:Second column
$n:The second row of the current processing line n Fields(The first n column)
FILENAME:File name to be processed

RS:Line separator. awk When reading data from a file,Will be based on Rs The definition cuts data into many records, and awk Read only one record at a time,For processing. The default is"\n'

Jane said:Data record separation, default to\n,That is, one record for each behavior

[root@localhost ~] # Awk '{print}' a (file name)

[root@localhost ~] # awk ' {print $1} ' A
#awk treats this row as a column by default, because it is not separated by spaces. awk separates by spaces or tab keys by default

4. Other built-in variables

Usage of other built-in variables FS,OFS,NR,FNR,RS,ORS
OFS:Defines how the output is separated
RS:Specify what line breaks to use;The specified character must be the character existing in the original text
ORS:Merge multiple lines into one line output

awk 'BEGIN{FS=":"}{print $1}' pass.txt
#Define the field separator as a colon before printing

awk 'BEGIN{FS=":";OFS="---"}{print $1,$2}' pass.txt
#OFS defines how to separate the output. The $1 $2 should be separated by commas,
Because commas are mapped to by default DFS Variable, which is a space by default

awk 'BEGIN{RS=":" }{(print $0}' /etc/passwd
#RS: specifies what to use as the newline character. Here, the colon is specified
 The specified character must be the character existing in the original text

awk 'BEGIN{ORS=" "}{print $0}' /etc/passwd
#Combine multiple lines into one line for output. When outputting, customize to separate each line with spaces,
The default is enter

2, Examples

awk -F: '{print $5}' /etc/passwd/
#The custom colon displays the fifth column after the separator

awk -Fx '{print $1} ' /etc/passwd/
use x As separator

awk '{print $1 $2}' A

awk -F: '{print $1""$2}' zz
#A space is displayed. The space needs to be enclosed in double quotation marks. If quotation marks are not used, it is treated as a variable by default. If it is a constant, it needs to be enclosed in double quotation marks
awk '{print $1,$2}' zz│/Comma has space effect

awk -F: '{print $1"\t"$2}'/etc/passwd
#Output with tab as separator

awk -F[:/] '{print $9}' zz
 Define multiple separators. As long as you see one of them, it will be regarded as a separator

awk -F: '/root/{print $0}' pass.txt
#Print the entire line containing root

awk -F: '/root/{print $1}' pass.txt

awk -F: '/root/{print $1,$6}' pass.txt

awk '/root/' /etc/passwd

awk -F[:/] '{print NF} ' zz
#Print the number of columns per row

awk -F[:/] '{print NR}' zz
#set number 

awk -F: '{print NR} ' pass.txt
#Print lines
awk -F: '{print NR,$0 } ' pass.txt
#Print the number of lines and the corresponding content of each line

awk 'NR==2'/etc/passwd
#Print the second line without print. The default is print

awk 'NR==2{print}' /etc/passwd
#Ditto effect
awk -F:'NR==2{print $1}' /etc/passwd
#Print the first column of the second row
awk -F: '{print $NF}' /etc/passwd
#Print last column

awk 'END{print NR}' /etc/passwd
#Print total rows
awk 'END{print $0}' /etc/passwd
#Print the last line of the file
awk -F: '{print "Current line has"NF"column" }' zz
 The current row has seven columns

awk -F: '{print "The first"NR"Yes"NF"column"}' /etc/passwd
#Which row has several columns
 Row 1 has seven columns
 Row 2 has seven columns
 Row 3 has seven columns

Expand production:Network card ip,flow
ifconfig ens33 | awk '/netmask/{print "Native ip the address is"$2}'
#The ip address of this machine is

ifconfig ens33 | awk '/RX p/{print $5"byte"}
8341053 byte            (string lookup)

Available quantity of root partition
df -h | awk 'NR==2{print $4}'

Execute line by line what task to execute before the start and what task to execute after the END. BEGIN and END are generally used for initialization operation. BEGIN is only executed once before reading the data record. END is generally used for summary operation and only once after reading the data record

awk operation:

awk 'BEGIN{x=10;print x}'
#If you don't use quotation marks, awk will be output as a variable, so you don't need to add $
awk 'BEGIN{x=10;print x+1}'
BEGIN Before processing the file, it will not be affected if it is not followed by the file name
awk 'BEGIN{x=10; x++;print x}'
awk 'BEGIN{print x+1)'
#If the initial value is not specified, the initial value is 0. If it is a string, it is empty by default
awk 'BEGIN{print 2.5+3.5}'
#Decimals can also be calculated
awk 'BEGIN{print 2-1}'
awk 'BEGIN(print 3*4}'
awk 'BEGIN(print 3**2 }'
awk 'BEGIN{print 2^3}'
#^And * * are power operations 1788
awk 'BEGIN{print 1/2 }'

Fuzzy matching:

Fuzzy matching, using~Means contains,!~Indicates that it does not contain
awk -F: '$1~/root/' /etc/passwd
awk -F: '$1~/ro/'/etc/passwd
#Fuzzy matching, as long as there is ro, it will match
awk -F: '$7!~/nologin$/{print $1,$7}'/etc/passwd

Comparison between numeric value and string:

Comparison symbol:==   !=   <=   >=    <   >
awk 'NR==5{print}' /etc/passwd
#Print the fifth line
awk 'NR==5' /etc/passwd
awk 'NR<5' /etc/passwd
awk -F: '$3==o' /etc/passwd
#Print the row with the third column 0
awk -F: '$1==root' /etc/passwd
awk -F: '$1=="root"' /etc/passwd

Logical operation & & and |:

awk -F: '$3<10 || $3>=1000' /etc/passwd
awk -F: '$3>10 && $3<1000' /etc/passwd
awk -F: 'NR>4 && NR<10' /etc/passwd
#Print lines 4-9
#Print all integer numbers between 1 and 200 that can be divided by 7 and contain the number 7
seq 200 | awk '$1%7==0 && $1~/7/'

3, Advanced usage of awk

1. Define reference variables

awk -v b="$a" 'BEGIN(print b)'
#The system variable a is assigned to variable b in awk, and then the variable b is called.
awk 'BEGIN(print "'$a'")'
#If you call directly, you need to use double quotation marks first and then single quotation marks
awk -vc=1 'BEGIN{print c}'
#awk directly defines variables and references them

Call function getline,When reading a row of data, you do not get the current row, but the next row of the current row
df -h | awk 'BEGIN{getline}/root/{print $0}'
/dev/mapper/ centos-root 50G  5.2G  45G  11%/

seq 10 | awk '{getline;print $0}'
#Show even rows
seq 10 | awk '{print $0;getline}'
#Show odd rows

2. Conditional statements

if sentence: 
awk of if Statements are also divided into single branch, double branch and multi branch
 Single branch is if(){}
Double branch is if(){}else{}
Multiple branches are if(){}else if(){}else{}

awk -F: '{if($3<10){print $0}}' /etc/passwd
#Print the entire row if the third column is less than 10

awk -F: '{if($3<10){print $3}else{print $1)}'/etc/passwd
#If the third column is less than 10, print the third column, otherwise print the first column
awk Also support for Circulation while Loops, functions, arrays, etc. 331

4, Other examples

awk 'BEGIN{x=0};/\/bin\/bash$/{x++;print x,$0};END{print x} ' /etc/passwd
#Count the number of lines ending in / bin/bash
 Equivalent to
grep -c " /bin/bash$" /etc/passwd

BEGIN The pattern indicates that it needs to be executed before processing the specified text BEGIN Action specified in mode;
awk Reprocess the specified text before execution END The action specified in the mode, END{}Statements such as print results are often placed in the statement block
awk -F ":" '! ($3<200){print}' /etc/passwd
#Output the row whose value of the third field is not less than 200

awk 'BEGIN{FS=":"} ;{if($3>=1000)(print}} ' /etc/passwd
#First process the content of BEGIN, and then print the content in the text

awk -F ":" '{max=($3>=$4) ?$3:$4;{print max}}' /etc/passwd
#($3>$4)?$ 3: $4 ternary operator, if the value of the third field is greater than or equal to the value of the fourth field,
Then assign the value of the third field to max,Otherwise, the value of the fourth field is assigned to max

awk -F ":" '{print NR, $0}'/etc/passwd 
#Output the content and number of each line. After each record is processed, the NR value is increased by 1
sed -n '=;p' /etc/passwd

awk -F":" '$7~"bash"{print $1}' /etc/passwd
#The output is colon delimited and the seventh field contains the first field of the row of / bash
awk -F: '/bash/{print $1}' /etc/passwd

awk -F":"'($1~"root") && (NF==7) {print $1,$2,$NF}' /etc/passwo
#The first field contains the first and second fields of the row with root and 7 fields

awk -F ":"'($7!="/bin/bash") && ($7!="/sbin/nologin") {print}' /etc/passwd
#Output the seventh field, which is neither / bin/bash nor / sbin/nologin

awk -F:'($NF !="/bin/bash")&&($NF !="/sbin/nologin") {print NR,$0}' passwd

Called through pipe and double quotation marks shell command:
echo $PATH | awk 'BEGIN{RS=":"};END{print NR)'
#Count the number of text paragraphs separated by colons. In the ENDA {} statement block, statements such as print results are often placed
echo $PATH | awk 'BEGIN{RS=":"}; {print NR,$0};END{print NR}'

awk -F: '/bash$/{print | "wc -l"}' /etc/paswd
#Call the wc -l command to count the number of users using bash, which is equivalent to
grep -c "bash$" /etc/passwd
awk -F: '/bash$/{print}'passwd | wc -l

free -m |awk '/Mem:/ {print int($3/($3+$4)*100)"%"}'
#View current memory usage percentage
free -m | awk '/Mem:/ {print $3/$2 }'
free -m | awk /Mem:/ {print $3/$2*100}'
free -m | awk '/Mem :/ {print int$3/$2*100}
free -m | awk '/Mem:/ {print int($3/$2*100)}'
free -m | awk '/Mem:/ {print int($3/$2*100)"%"}'
free -m | awk '/Mem:/ {print $3/$2*100}' | awk -F. '{print $1 "%" }'

top -b -n1 | grep Cpu | awk -F ',' '{print $4}' | awk '{print $1}'
#View the current CPU idle rate (- b -n1 indicates that only one output is required)

date -d "$(awk -F "." '{print $1}' /proc/uptime) second ago"+"%F %H:%M:%S"
#Display the last system restart time, which is equivalent to uptime; 
second ago To show how many seconds ago,+$"Election day:3N:3s Equivalent to+"3Y-n-%d
3 day:8N:8s"Time format for

date -d "s (awk -F "." '{iprint $1}' /proc/uptime) second ago"+"number F%H:%M:%S"
#Display the last system restart time, which is equivalent to uptime;second ago is the time before the display, and + "E-day: n:: s" is equivalent to the time format of + "&1-tm-d day: 38:8s"

awk 'BEGIN {n=0 ; while ("w" | getline) n++ ;{print n-2}}'
#Call the w command and use it to count the number of online users

awk 'BEGIN { "hostname" | getline ; {print $0}}'
#Call hostname and output the current hostname

When getline No redirection left or right"<"or"I"When, awk First read the first line, which is 1, and then getline,You get the second line below 1, which is 2, because getline After that, awk Will change the corresponding NF,NR,FNR and $0 And other internal variables, so at this time $0 The value of is no longer 1, but 2, and then print it out.

When getline There are redirection characters on the left and right"<"or"T"When, getline It works on the directional input file. Because the file has just been opened, it has not been deleted awk Read in a line, just getline Read in, then getline The first line of the file is returned, not interlaced.
seq 10 | awk ' {getline;print $0}'
seq 10 | awk ' {print $0;getline}'

CPU Utilization rate:
cpu_us='top -b -n1 | grep Cpu | awk '{print $2}'
cpu_sy='top -b -n1 | grep Cpu | awk -F ',' '{print $2}' | awk '{print $1}'
echo $cpu_sum

echo "A B C D" | awk '{OFS="|";print $0;$1=$1;print $0}'
#$1 = $1 is used to activate the reassignment of $0, that is: field $1 And field number NF will cause awk to recalculate the value of $o, usually when it needs to output $0 after changing OFS

echo "A B C D" | awk 'BEGIN{OFS="| "};{print $0;$1=$1;print $0}'
echo "A B C D" | awk 'BEGIN{OFS="|"};{print $0;$1=$1;print $1,$2}'
echo "A B C D" | awk 'BEGIN{OFS="|"};{$2=$2;print $1,$2}'

awk 'BEGIN{a[0]=10;a[1]=20;print a[1]}'
awk 'BEGIN{a[0]=10;a[1]=20;print a[0]}'
awk 'BEGIN{a["abc"]=10;a["xyz"]=20;print a["abc"}'
awk 'BEGIN{a["abc"]=10;a["xyz"]=20;print a["xyz"]}'
awk 'BEGIN{a["abc"]="aabbcc";a["xyz"]="xxyyzz"; print a[ "xyz"]}'
awk 'BEGIN{a[0]=10;a[1]=20;a[2]=30;for(i in a){print i,a[i]}}'
Ps1:BEGIN The command in is executed only once
Ps2:awk In addition to numbers, the subscript of the array can also use strings. The strings need to use double quotation marks

Statistics tcp Number of connections:
netstat -an | awk '/^tcp/ {print $NF}' |sort |uniq -c
netstat -nat | awk '/^tcp/{arr[$6]++}END{for(state in arr){print arr[state] ": " state}}'

use awk Statistics httpd Access each client in the log IP Number of occurrences of?
awk '{ip[$1]++}END{for(i in ip){print ip[i],i}}' /var/log/httpd/accesslog | sort -r

remarks:Define an array with the name ip,The subscript of the number is column 1 of the log file(That is, the of the client Ie address),++The purpose of is to count the clients IP The counter is incremented by 1 once it appears. END The instructions in are executed after reading the file, and all statistical information is output through a loop, for Loop through the array name ip Subscript of.

awk '/Failed password/{print $0} ' /var/log/secure
awk '/Failed password/{print $11} ' /var/log/secure
awk '/Failed/{ip[$11]++}END{for(i in ip) {print i", "ip[i]}}' /var/log/secure
awk '/Failed/{ip[$11]++}END{for(i in ip){print i","ip[i]}}' /var/log/ secure

awk '/Failed password/{ip[$11]++}END{for(i in ip) {print i", "ip[i]})' /var/log/secure

x= `awk '/Failed password/{ip[$11]++}END{for(i in ip){print i","ip[i]}}' /var/log/secure `
for j in $x
ip=`echo $j | awk -F "," '{print $1}'
num=`echo $j | awk -F "," '{print $2}'`
if [ $num -ge 3 ];then
echo "warning! $ip Failed to access this machine $num Once, please deal with it quickly!"

Keywords: Linux Operation & Maintenance server

Added by ploppy on Mon, 21 Feb 2022 15:49:31 +0200