Awk entry to mastery series - awk quick start

brief introduction

Awk is an excellent text processing tool and one of the most powerful data processing engines in Linux and Unix environments. The greatest function of this programming and data manipulation language (whose name comes from the initials of the surnames of its founders Alfred aihou, Peter Weinberg and Brian collinhan) depends on one's knowledge. Awk is a new version of nawk and gawk generated after improvement. Now gawk is used by default under Linux system. You can check the source of the awk being applied with the command (ls -l /bin/awk)

A simple example

#Create a file
vim awk.txt
Beth 4.00 0
Dan  3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
 Explain field: Name hourly wage working hours

#Print the name and remuneration of each employee
awk '$3>0 {print $1,$2*$3}' awk.txt
$3>0 It's a pattern
print $1,$2*$3 It's action
#Want to know which employees are lazy
awk '$3==0 {print $1}' awk.txt

Execution result diagram:

AWK program structure

  • AWK program execution process
  awk The basic operation of is in a sequence composed of input lines, Scan each line one after another, Search can be"pattern"matching(match) Line of.Execute if match"action",Continue until all inputs are read
  • Mode action analysis
(1)pattern-All actions exist awk '$3==0 {print $1}' awk.txt
(2)Mode exists, action does not exist awk '$3==0' awk.txt
(3)Mode does not exist, action exists awk '{print $1}' awk.txt
(4)No, neither exists(Cannot run)

Operation format of AWK command

#Followed by documents
awk '$3==0 {print $1}' awk.txt  Followed by a file
awk '$3==0 {print $1}' awk.txt awk02.txt  Followed by two files
#Waiting for input
awk '$3==0 {print $1}' There is no file after it. Wait for input before judging
#Put the awk program into a file
 cat program
 $3==0 {print $1}
 Execute command: awk -f program awk.txt

Output format of AWK

  • Data type
  • ​Numbers and strings
  • Rows and fields
awk Read one line at a time from its input,Decompose rows into fields(By default, fields are treated as a sequence of non whitespace characters).
The first field of the current input line is called $1,The second is $2,And so on,A whole line is recorded as $0,The number of fields in each row may be different.
  • Case
#Print each line
awk '{print}' awk.txt or awk '{print $0}' awk.txt
#Print some fields
awk '{print $1,$3}' awk.txt
#Print the number of fields per line (built-in variable NF)
awk '{print NF}' awk.txt
#Print the first and last fields
awk '{print $1,$NF}' awk.txt
#Calculation and printing
awk '{print $1,$2 * $3}' awk.txt
#Print line number (NR)
awk '{print NR,$0}' awk.txt
#Splice strings and fields
awk '{print $1,"Today's income is",$2 * $3}' awk.txt
#Format output
awk '{ printf("%s Today's income is $%.2f\n",$1,$2*$3) }' awk.txt
 Fixed width output
awk '{ printf("%-8s Today's income is $%6.2f\n",$1,$2*$3) }' awk.txt
 Output sorting
awk '{ printf("%6.2f,%-8s Today's income is $%6.2f\n",$2*$3,$1,$2*$3) }' awk.txt |sort -nk3 -t,

AWK pattern matching

  • Single mode
#Records with hourly wage greater than 5
awk '$2>5 {print $0}' awk.txt
#Employees paid more than 50
awk '$2*$3>50 {print $1,$2*$3}' awk.txt
#Query the record named Mark
awk '$1=="Mark" {print $0}' awk.txt
#Regular expressions match records with names with Mar
awk '/Mar/ {print $0}' awk.txt
  • Pattern combination
#Print lines where $2 is at least 4, or $3 is at least 20
awk '$2>=4||$3>=20 {print $0}' awk.txt
awk '!($2<4&&$3<20) {print $0}' awk.txt
#Print lines where $2 is at least 4 and $3 is at least 20
awk '$2>=4 && $3>=20 {print $0}' awk.txt
  • BEGIN and END
Special mode BEGIN Is matched before the first line of the first input file, END Match after the last line of the last input file is processed.
awk 'BEGIN {*********} END{***********}'
awk 'BEGIN {print "NAME  RATE   HOURS"} {print} END{print "END"}' awk.txt
awk 'BEGIN {print "NAME  RATE   HOURS";print "------"} {print} END{print "------";print "END"}' awk.txt

Calculated with AWK

  • Calculate sum

Total number of employees # working more than 15 hours
Awk '$3 > 15 {EMP = EMP + 1} end {print EMP, "number of employees working more than 15 hours"}' awk txt

  • Calculate average

# calculate the average salary of employees
awk '{pay=pay+,*} END{print NR, "total number of employees"; print "total salary", pay; print "average salary", pay / NR}' awk txt

  • Find maximum

# find the employee with the highest hourly wage
Awk '$2 > maxrate {maxrate = $2; maxemp = $1} end {print "the employee with the highest hourly salary is:", maxemp, "salary is:", maxrate}' awk txt

  • Print last line

 awk '{last=$0} END{print last}' awk.txt

String splicing

#Add a space between name s
awk '{names=names $1 " "} END{print names}' awk.txt

 

Built in function

#length find the length of the string
Calculate the length of the name
awk '{print $1,length($1)}' awk.txt

#Calculate the number of lines, total fields and total bytes of text
awk '{nc=nc+length($0)+1;nw=nw+NF} END{print NR,"lines,",nw,"words,",nc,"characters"}' awk.txt

Process control statement

  • If else statement

# find total and average compensation for employees who earn more than $6.00 per hour
awk '$2>6 {n=n+1;pay=pay+$2*$3} END{if(n>0) print n,"employees,total pay is",pay,"average pay is",pay/n;else print "not exit"}' awk.txt

  • while statement

Calculate the sum of 1 to 100
awk 'BEGIN{ test=100; total=0; while(i<=test) { total+=i; i++; } print total; }' 5050

#shell script
#!/bin/bash
total=0
i=0
while [ $i -le 100 ]
do
let total+=$i
let i++
done
echo $total

  • for statement

# calculate the sum of 1 to 100
awk 'BEGIN{ total=0; for(i=0;i<=100;i++) { total+=i; } print total; }'

  • Array

# print each line of record upside down
awk '{line[NR] = $0} END {i=NR; while (i>0){ print line[i];i=i-1}}' awk.txt
awk '{line[NR] = $0} END{for(i=NR;i>0;i--){print line[i]}}' awk.txt

AWK production case

# enter the total number of rows
awk 'END{print NR}' awk.txt
#Print line 2
awk 'NR==2 {print $0}' awk.txt
#Print the last field of each line
awk '{print $NF}' awk.txt
#Print the last field of the last line
awk '{field=$NF} END{print field}' awk.txt
#Print input lines with more than 2 fields
awk 'NF>2 {print $0}' awk.txt
#Print the last input line with a field value greater than 4
awk '$NF>4{print $0}' awk.txt

Keywords: Linux Unix shell debian

Added by Grande on Mon, 14 Feb 2022 08:58:59 +0200