#Course objectives
- Familiar with the command line mode and basic syntax structure of awk
- Familiar with awk related internal variables
- Familiar with awk common print function print
- Able to match regular expressions in awk and print relevant lines
1, awk introduction
1. awk overview
-
awk is a programming language, which is mainly used to process text and data under linux/unix. It is a tool under linux/unix. The data can come from standard input, one or more files, or the output of other commands.
-
awk's way of processing text and data: scan the file line by line, from the first line to the last line by default, find the lines matching the specific pattern, and do the operations you want on these lines.
-
awk stands for the first letter of the author's last name. Because its authors are Alfred Aho, Brian Kernighan and Peter Weinberger.
-
gawk is the GNU version of awk, which provides some extensions to Bell Labs and GNU.
-
The awk described below takes the gawk of GNU as an example. Awk has been linked to gawk in linux system, so all the following are introduced in awk.
2. What can awk do?
- awk is used to process files and data. It is not only a tool under unix, but also a programming language
- It can be used for statistics, such as website visits, IP visits, etc
- Support condition judgment and for and while loops
#2, awk usage
1. Use command line mode
I. grammatical structure
awk option 'Command part' file name Special note: quote shell Variables need to be enclosed in double quotes
###II. Introduction to common options
- -F defines the field separator. The default separator is a space
- -v define variables and assign values
###III. description of naming part
- Regular expression, address location
'/root/{awk sentence}' sed Medium: '/root/p' 'NR==1,NR==5{awk sentence}' sed Medium: '1,5p' '/^root/,/^ftp/{awk sentence}' sed Medium:'/^root/,/^ftp/p'
- {awk statement1 * *; awk statement2; * *...}
'{print $0;print $1}' sed Medium:'p' 'NR==5{print $0}' sed Medium:'5p' Note: awk Semicolon spacing between command statements
- BEGIN...END...
'BEGIN{awk sentence};{Processing};END{awk sentence}' 'BEGIN{awk sentence};{Processing}' '{Processing};END{awk sentence}'
2. Using script mode
I. scripting
#!/ bin/awk -f Define magic characters Here is awk The list of commands in quotation marks. Do not use quotation marks to protect commands. Multiple commands are separated by semicolons BEGIN{FS=":"} NR==1,NR==3{print $1"\t"$NF} ...
II. Script execution
Method 1: awk option -f awk Script file for the text file to be processed awk -f awk.sh filename sed -f sed.sh -i filename Method 2: ./awk Script file for(Or absolute path) Text file to process ./awk.sh filename ./sed.sh filename
#3, awk internal related variables
variable | Variable description | remarks |
---|---|---|
$0 | All records of the current processing line | |
$1,$2,$3...$n | Different fields in the file that are separated by an interval symbol for each line | awk -F: '{print $1,$3}' |
NF | Number of fields (columns) of the current record | awk -F: '{print NF}' |
$NF | Last column | $(NF-1) indicates the penultimate column |
FNR/NR | Line number | |
FS | Define spacer | 'BEGIN{FS=":"};{print $1,$3}' |
OFS | Define output field separator, default space | 'BEGIN{OFS="\t"};print $1,$3}' |
RS | Enter the record separator, and the default is line feed | 'BEGIN{RS="\t"};{print $0}' |
ORS | Output record separator, default line break | 'BEGIN{ORS="\n\n"};{print $1,$3}' |
FILENAME | Currently entered file name |
1. Examples of common built-in variables
# awk -F: '{print $1,$(NF-1)}' 1.txt # awk -F: '{print $1,$(NF-1),$NF,NF}' 1.txt # awk '/root/{print $0}' 1.txt # awk '/root/' 1.txt # awk -F: '/root/{print $1,$NF}' 1.txt root /bin/bash # awk -F: '/root/{print $0}' 1.txt root:x:0:0:root:/root:/bin/bash # awk 'NR==1,NR==5' 1.txt # awk 'NR==1,NR==5{print $0}' 1.txt # awk 'NR==1,NR==5;/^root/{print $0}' 1.txt root:x:0:0:root:/root:/bin/bash root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
2. Examples of built-in variable separators
FS and OFS: # awk 'BEGIN{FS=":"};/^root/,/^lp/{print $1,$NF}' 1.txt # awk -F: 'BEGIN{OFS="\t\t"};/^root/,/^lp/{print $1,$NF}' 1.txt root /bin/bash bin /sbin/nologin daemon /sbin/nologin adm /sbin/nologin lp /sbin/nologin # awk -F: 'BEGIN{OFS="@@@"};/^root/,/^lp/{print $1,$NF}' 1.txt root@@@/bin/bash bin@@@/sbin/nologin daemon@@@/sbin/nologin adm@@@/sbin/nologin lp@@@/sbin/nologin [root@server shell07]# RS and ORS: Add tabs and contents in the first 2 lines of the modified source file: vim 1.txt root:x:0:0:root:/root:/bin/bash hello world bin:x:1:1:bin:/bin:/sbin/nologin test1 test2 # awk 'BEGIN{RS="\t"};{print $0}' 1.txt # awk 'BEGIN{ORS="\t"};{print $0}' 1.txt
#4, awk working principle
awk -F: '{print $1,$3}' /etc/passwd
-
awk uses a line as input and assigns this line to the internal variable $0. Each line can also be called a record and ends with a newline character (RS)
-
Each line is broken down into fields (or fields) by the separator * *: * * (default is space or tab), and each field is stored in a numbered variable, starting with $1
Q: how does awk know how to separate fields with spaces?
A: because there is an internal variable fs to determine the field separator. Initially, FS is assigned as a space
-
awk uses the print function to print fields. The printed fields are separated by spaces because there is a comma between $1 and $3. Comma is special. It is mapped to another internal variable, which is called output field separator OFS. OFS defaults to space
-
After awk processes one line, it will get another line from the file and store it in $0, overwrite the original content, and then separate the new string into fields and process it. This process will continue until all rows are processed
5, awk using advanced
1. Format the output print and printf
print function similar echo "hello world" # date |awk '{print "Month: "$2 "\nYear: "$NF}' # awk -F: '{print "username is: " $1 "\t uid is: "$3}' /etc/passwd printf function similar echo -n # awk -F: '{printf "%-15s %-10s %-15s\n", $1,$2,$3}' /etc/passwd # awk -F: '{printf "|%15s| %10s| %15s|\n", $1,$2,$3}' /etc/passwd # awk -F: '{printf "|%-15s| %-10s| %-15s|\n", $1,$2,$3}' /etc/passwd awk 'BEGIN{FS=":"};{printf "%-15s %-15s %-15s\n",$1,$6,$NF}' a.txt %s Character type strings %-20s %d value type 15 characters - Indicates left alignment. The default is right alignment printf By default, there is no automatic line wrap at the end of the line, plus\n
2. awk variable definition
# awk -v NUM=3 -F: '{ print $NUM }' /etc/passwd # awk -v NUM=3 -F: '{ print NUM }' /etc/passwd # awk -v num=1 'BEGIN{print num}' 1 # awk -v num=1 'BEGIN{print $num}' be careful: awk The variables defined in the call do not need to be added. $
##3. BEGIN... END in awk
① BEGIN: it means to execute before the program starts
② END: it means to execute after all files are processed
③ usage: 'BEGIN {before processing}; {processing}; END {after processing} '
I. example 1
Print the last and penultimate columns (login shell and home directory)
awk -F: 'BEGIN{ print "Login_shell\t\tLogin_home\n*******************"};{print $NF"\t\t"$(NF-1)};END{print "************************"}' 1.txt awk 'BEGIN{ FS=":";print "Login_shell\tLogin_home\n*******************"};{print $NF"\t"$(NF-1)};END{print "************************"}' 1.txt Login_shell Login_home ************************ /bin/bash /root /sbin/nologin /bin /sbin/nologin /sbin /sbin/nologin /var/adm /sbin/nologin /var/spool/lpd /bin/bash /home/redhat /bin/bash /home/user01 /sbin/nologin /var/named /bin/bash /home/u01 /bin/bash /home/YUNWEI ************************************
II. Examples 2
Print the user name, home directory and login shell in / etc/passwd
u_name h_dir shell *************************** *************************** awk -F: 'BEGIN{OFS="\t\t";print"u_name\t\th_dir\t\tshell\n***************************"};{printf "%-20s %-20s %-20s\n",$1,$(NF-1),$NF};END{print "****************************"}' # awk -F: 'BEGIN{print "u_name\t\th_dir\t\tshell" RS "*****************"} {printf "%-15s %-20s %-20s\n",$1,$(NF-1),$NF}END{print "***************************"}' /etc/passwd Format output: echo print echo -n printf {printf "%-15s %-20s %-20s\n",$1,$(NF-1),$NF}
###4. Comprehensive application of awk and regularization
operator | explain |
---|---|
== | be equal to |
!= | Not equal to |
> | greater than |
< | less than |
>= | Greater than or equal to |
<= | Less than or equal to |
~ | matching |
!~ | Mismatch |
! | Logical non |
&& | Logic and |
|| | Logical or |
One example
Match from the first line to lp Opening line awk -F: 'NR==1,/^lp/{print $0 }' passwd From line 1 to line 5 awk -F: 'NR==1,NR==5{print $0 }' passwd From lp The first line matches to line 10 awk -F: '/^lp/,NR==10{print $0 }' passwd From root Lines beginning with match to lines beginning with lp First line awk -F: '/^root/,/^lp/{print $0}' passwd Print to root Begin or begin with lp First line awk -F: '/^root/ || /^lp/{print $0}' passwd awk -F: '/^root/;/^lp/{print $0}' passwd Display 5-10 that 's ok awk -F':' 'NR>=5 && NR<=10 {print $0}' /etc/passwd awk -F: 'NR<10 && NR>5 {print $0}' passwd Print 30-39 Line to bash End: [root@MissHou shell06]# awk 'NR>=30 && NR<=39 && $0 ~ /bash$/{print $0}' passwd stu1:x:500:500::/home/stu1:/bin/bash yunwei:x:501:501::/home/yunwei:/bin/bash user01:x:502:502::/home/user01:/bin/bash user02:x:503:503::/home/user02:/bin/bash user03:x:504:504::/home/user03:/bin/bash [root@MissHou shell06]# awk 'NR>=3 && NR<=8 && /bash$/' 1.txt stu7:x:1007:1007::/rhome/stu7:/bin/bash stu8:x:1008:1008::/rhome/stu8:/bin/bash stu9:x:1009:1009::/rhome/stu9:/bin/bash Print file 1-5 And with root First line [root@MissHou shell06]# awk 'NR>=1 && NR<=5 && $0 ~ /^root/{print $0}' 1.txt root:x:0:0:root:/root:/bin/bash [root@MissHou shell06]# awk 'NR>=1 && NR<=5 && $0 !~ /^root/{print $0}' 1.txt bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin understand;Number and||Meaning of: [root@MissHou shell06]# awk 'NR>=3 && NR<=8 || /bash$/' 1.txt [root@MissHou shell06]# awk 'NR>=3 && NR<=8;/bash$/' 1.txt Print IP address # ifconfig eth0|awk 'NR>1 {print $2}'|awk -F':' 'NR<2 {print $2}' # ifconfig eth0|grep Bcast|awk -F':' '{print $2}'|awk '{print $1}' # ifconfig eth0|grep Bcast|awk '{print $2}'|awk -F: '{print $2}' # ifconfig eth0|awk NR==2|awk -F '[ :]+' '{print $4RS$6RS$8}' # ifconfig eth0|awk -F"[ :]+" '/inet addr:/{print $4}'
4. Classroom exercises
- Display all information of users who can log in to the operating system, match from column 7 and end with bash, and output the whole row (all columns of the current row)
[root@MissHou ~] awk '/bash$/{print $0}' /etc/passwd [root@MissHou ~] awk '/bash$/{print $0}' /etc/passwd [root@MissHou ~] awk '/bash$/' /etc/passwd [root@MissHou ~] awk -F: '$7 ~ /bash/' /etc/passwd [root@MissHou ~] awk -F: '$NF ~ /bash/' /etc/passwd [root@MissHou ~] awk -F: '$0 ~ /bash/' /etc/passwd [root@MissHou ~] awk -F: '$0 ~ /\/bin\/bash/' /etc/passwd
- Displays the user name that can log in to the system
# awk -F: '$0 ~ /\/bin\/bash/{print $1}' /etc/passwd
- Print out the UID and user name of ordinary users in the system
500 stu1 501 yunwei 502 user01 503 user02 504 user03 # awk -F: 'BEGIN{print "UID\tUSERNAME"} {if($3>=500 && $3 !=65534 ) {print $3"\t"$1} }' /etc/passwdUID USERNAME # awk -F: '{if($3 >= 500 && $3 != 65534) print $1,$3}' a.txt redhat 508 user01 509 u01 510 YUNWEI 511
##5. Script programming of awk
A flow control statement
① if structure
if sentence: if [ xxx ];then xxx fi Format: awk option 'Regular, address location{awk sentence}' file name { if(expression){Statement 1;Statement 2;...}} awk -F: '{if($3>=500 && $3<=60000) {print $1,$3} }' passwd # awk -F: '{if(==0) {print ' is administrator '}}' passwd root It's an administrator # awk 'BEGIN{if('$(id -u)'==0) {print "admin"} }' admin
② if... else structure
if...else sentence: if [ xxx ];then xxxxx else xxx fi Format: {if(expression){sentence;sentence;...}else{sentence;sentence;...}} awk -F: '{ if($3>=500 && $3 != 65534) {print $1"It's an ordinary user"} else {print $1,"Not an ordinary user"}}' passwd awk 'BEGIN{if( '$(id -u)'>=500 && '$(id -u)' !=65534 ) {print "It's an ordinary user"} else {print "Not an ordinary user"}}'
③ if... elif... else structure
if [xxxx];then xxxx elif [xxx];then xxx .... else ... fi if...else if...else sentence: Format: { if(Expression 1){sentence;sentence;...}else if(Expression 2){sentence;sentence;...}else if(Expression 3){sentence;sentence;...}else{sentence;sentence;...}} awk -F: '{ if($3==0) {print $1,":It's an administrator"} else if($3>=1 && $3<=499 || $3==65534 ) {print $1,":Is a system user"} else {print $1,":It's an ordinary user"}}' awk -F: '{ if($3==0) {i++} else if($3>=1 && $3<=499 || $3==65534 ) {j++} else {k++}};END{print "The number of administrators is:"i "\n The number of system users is:"j"\n The number of ordinary users is:"k }' # awk -F: '{if($3==0) {print $1,"is admin"} else if($3>=1 && $3<=499 || $3==65534) {print $1,"is sys users"} else {print $1,"is general user"} }' a.txt root is admin bin is sys users daemon is sys users adm is sys users lp is sys users redhat is general user user01 is general user named is sys users u01 is general user YUNWEI is general user awk -F: '{ if($3==0) {print $1":administrators"} else if($3>=1 && $3<500 || $3==65534 ) {print $1":Is a system user"} else {print $1":It's an ordinary user"}}' /etc/passwd awk -F: '{if($3==0) {i++} else if($3>=1 && $3<500 || $3==65534){j++} else {k++}};END{print "The number of administrators is:" i RS "The number of system users is:"j RS "The number of ordinary users is:"k }' /etc/passwd The number of administrators is:1 The number of system users is:28 The number of ordinary users is:27 # Awk - F: '{if ($3 = = 0) {print $1 ": Administrator"} else if ($3 > = 500 & & $3! = 65534) {print $1 ": ordinary user"} else {print $1 ": system user"}}' passwd awk -F: '{if($3==0){i++} else if($3>=500){k++} else{j++}} END{print i; print k; print j}' /etc/passwd awk -F: '{if($3==0){i++} else if($3>999){k++} else{j++}} END{print "Number of administrators: "i; print "Number of ordinary: "k; print "System user: "j}' /etc/passwd If you are an ordinary user, print the default shell,If it is a system user, print the user name # awk -F: '{if($3>=1 && $3<500 || $3 == 65534) {print $1} else if($3>=500 && $3<=60000 ) {print $NF} }' /etc/passwd
Two loop statement
① for loop
Print 1~5 for ((i=1;i<=5;i++));do echo $i;done # awk 'BEGIN { for(i=1;i<=5;i++) {print i} }' Print 1~10 Odd number in # for ((i=1;i<=10;i+=2));do echo $i;done|awk '{sum+=$0};END{print sum}' # awk 'BEGIN{ for(i=1;i<=10;i+=2) {print i} }' # awk 'BEGIN{ for(i=1;i<=10;i+=2) print i }' Calculation 1-5 Sum of # awk 'BEGIN{sum=0;for(i=1;i<=5;i++) sum+=i;print sum}' # awk 'BEGIN{for(i=1;i<=5;i++) (sum+=i);{print sum}}' # awk 'BEGIN{for(i=1;i<=5;i++) (sum+=i);print sum}'
② while loop
Print 1-5 # i=1;while (($i<=5));do echo $i;let i++;done # awk 'BEGIN { i=1;while(i<=5) {print i;i++} }' Print 1~10 Odd number in # awk 'BEGIN{i=1;while(i<=10) {print i;i+=2} }' Calculation 1-5 Sum of # awk 'BEGIN{i=1;sum=0;while(i<=5) {sum+=i;i++}; print sum }' # awk 'BEGIN {i=1;while(i<=5) {(sum+=i) i++};print sum }'
③ Nested loop
Nested loop: #!/bin/bash for ((y=1;y<=5;y++)) do for ((x=1;x<=$y;x++)) do echo -n $x done echo done awk 'BEGIN{ for(y=1;y<=5;y++) {for(x=1;x<=y;x++) {printf x} ;print } }' # awk 'BEGIN { for(y=1;y<=5;y++) { for(x=1;x<=y;x++) {printf x};print} }' 1 12 123 1234 12345 # awk 'BEGIN{ y=1;while(y<=5) { for(x=1;x<=y;x++) {printf x};y++;print}}' 1 12 123 1234 12345 Try printing the 99 formula table in three ways: #awk 'BEGIN{for(y=1;y<=9;y++) { for(x=1;x<=y;x++) {printf x"*"y"="x*y"\t"};print} }' #awk 'BEGIN{for(y=1;y<=9;y++) { for(x=1;x<=y;x++) printf x"*"y"="x*y"\t";print} }' #awk 'BEGIN{i=1;while(i<=9){for(j=1;j<=i;j++) {printf j"*"i"="j*i"\t"};print;i++ }}' #awk 'BEGIN{for(i=1;i<=9;i++){j=1;while(j<=i) {printf j"*"i"="i*j"\t";j++};print}}' Cycle control: break Interrupt the loop when the conditions are met continue Skip the loop when the condition is met # awk 'BEGIN{for(i=1;i<=5;i++) {if(i==3) break;print i} }' 1 2 # awk 'BEGIN{for(i=1;i<=5;i++){if(i==3) continue;print i}}' 1 2 4 5
##6. awk arithmetic operation
+ - * / %(model) ^(Power 2^3) You can perform calculations in mode, awk Will perform arithmetic operations as floating-point numbers # awk 'BEGIN{print 1+1}' # awk 'BEGIN{print 1**1}' # awk 'BEGIN{print 2**3}' # awk 'BEGIN{print 2/3}'
6, awk statistical case
1. Various types of shell s in the statistical system
# awk -F: '{ shells[$NF]++ };END{for (i in shells) {print i,shells[i]} }' /etc/passwd books[linux]++ books[linux]=1 shells[/bin/bash]++ shells[/sbin/nologin]++ /bin/bash 5 /sbin/nologin 6 shells[/bin/bash]++ a shells[/sbin/nologin]++ b shells[/sbin/shutdown]++ c books[linux]++ books[php]++
2. Statistics site access status
# ss -antp|grep 80|awk '{states[$1]++};END{for(i in states){print i,states[i]}}' TIME_WAIT 578 ESTABLISHED 1 LISTEN 1 # ss -an |grep :80 |awk '{states[$2]++};END{for(i in states){print i,states[i]}}' LISTEN 1 ESTAB 5 TIME-WAIT 25 # ss -an |grep :80 |awk '{states[$2]++};END{for(i in states){print i,states[i]}}' |sort -k2 -rn TIME-WAIT 18 ESTAB 8 LISTEN 1
3. Count the number of each IP accessing the website
# netstat -ant |grep :80 |awk -F: '{ip_count[$8]++};END{for(i in ip_count){print i,ip_count[i]} }' |sort # ss -an |grep :80 |awk -F":" '!/LISTEN/{ip_count[$(NF-1)]++};END{for(i in ip_count){print i,ip_count[i]}}' |sort -k2 -rn |head
4. Count the amount of PV in the website log
Statistics Apache/Nginx Of a day in the log PV amount <Statistical log> # grep '27/Jul/2017' mysqladmin.cc-access_log |wc -l 14519 Statistics Apache/Nginx A day in the log is different IP Number of visits <Statistical log> # grep '27/Jul/2017' mysqladmin.cc-access_log |awk '{ips[$1]++};END{for(i in ips){print i,ips[i]} }' |sort -k2 -rn |head # grep '07/Aug/2017' access.log |awk '{ips[$1]++};END{for(i in ips){print i,ips[i]} }' |awk '$2>100' |sort -k2 -rn
Explanation of terms:
Website views (PV)
Noun: PV=PageView
Description: refers to the number of page views, which is used to measure the number of web pages visited by website users. If the same page is opened multiple times, the total number of views is accumulated. The PV is recorded once every time the user opens a page.
Noun: VV = Visit View
Note: all pages from visitors coming to your website to the final closing of the website are counted as one visit. If the visitor does not open and refresh the page for 30 consecutive minutes, or the visitor closes the browser, it will be counted as the end of this visit.
Unique visitors (UV)
Noun: UV= Unique Visitor
Note: only 1 UV is calculated when the same visitor visits your website multiple times in 1 day.
Independent IP (IP)
Noun: IP = number of independent IPS
Note: refers to the number of users who use different IP addresses to visit the website within 1 day. No matter how many pages the same IP accesses, the number of independent IPS is 1
#7, Homework after class
Activity 1:
1. Write a script to automatically detect the disk usage. When the disk usage reaches more than 90%, you need to send an email to relevant personnel
2. Write a script to monitor system memory and swap partition usage
Assignment 2:
Enter an IP address and use the script to judge its legitimacy:
It must comply with the ip address specification. The 1st and 4th bits cannot start with 0, cannot be greater than 255, and cannot be less than 0
#8, Practical cases of enterprises
1. Mandate / background
There are a total of 9 machines in the web server cluster, on which Apache services are deployed. Due to the continuous growth of business, a large number of access logs will be generated on each machine every day. Now it is necessary to keep the Apache access logs on each web server for the last three days, and dump the logs three days ago to a special log server for subsequent analysis. How to keep logs on each server for less than 3 days?
2. Specific requirements
- The log of each web server is in the corresponding directory of the log server. For example: web1 - > web1 Log (on log server)
- The access logs of the last three days are kept on each web server, and the logs of the previous three days are dumped to the log server at 5:03 a.m. every day
- If the script dump fails, the operation and maintenance personnel need to manually clean the log through the menu of the springboard machine
3. Knowledge points involved
- Basic syntax structure of shell
- File synchronization rsync
- File lookup command find
- Schedule task crontab
- apache log cutting
- other