Using the awk command

Using the awk command

1. What is awk

AWK, a data filtering tool (similar to grep and more powerful than grep), is a data processing engine. It checks the input text based on pattern matching, processes and outputs it line by line. It is usually used in Shell scripts to obtain the specified data. When used alone, it can make statistics on text data

2. Format syntax

1. Format

Format 1: pre command | awk [options] condition {edit instruction} '

Format 2: awk [options] condition {edit instruction} 'file

When editing instructions contain multiple statements, they can be separated by semicolons. When processing text, if no separator is specified, spaces, tabs, etc. are used as separators by default. print is the most common instruction

2. Options

-F fsSpecifies the input file split separator. fs is a string or a positive side expression, such as - F
-vAssign a user-defined variable
-fRead awk command from script
-WRun awk in compatibility mode. Therefore, gawk behaves exactly like the standard awk, and all awk extensions are ignored
' 'Reference code block
//Matching code blocks, which can be strings or regular expressions
{}Command code block containing one or more commands
;Multiple commands are separated by semicolons
BEGINAt the beginning of the awk program, it is executed before reading any data. Actions after BEGIN are executed only once at the beginning of the program
ENDWhen the awk program has finished processing all data and is about to END, execute? The action after END is executed only once at the END of the program
  • BEGIN is mainly the initialization code block. Before processing each line, the initialization code mainly refers to the global variable and sets the FS separator
  • END is mainly the END code block. The code block executed after processing each line is mainly used for final calculation or output END summary information

3. awk built in variables

$0Represents the entire current row
$1 ~ $nThe nth field of the current record
FSEnter field separator (default is space)
RSEnter the record separator, and the default line feed character (that is, the text is entered line by line)
NFThe number of fields is the number of columns
NRThe number of records, which is the line number, starts from 1 by default
FNRSimilar to NR, but multiple file records are not incremented, and each file starts with 1
OFSOutput field separator, default space
ORSOutput record separator, default line break
\nNewline character
~Matching regular expressions
!~Mismatch regular expression
= += -= *= /= %= ^= **=assignment
&&Logic and
< <= > >= != ==< <= > >= != ==Relational operator
$Field reference
* / %Multiplication, division and remainder

4. Instance operation

  • BEGIN

    [root@localhost ~]# cat data
     Have you eaten yet?
    [root@localhost ~]# awk 'BEGIN{print "good morning"} {print{print $0}' data}' data 
    good morning
     Have you eaten yet?
    
  • Output the characters in the first and fifth columns of each line in the text

    // awk '{[pattern] action}' {filenames} # line matching statement awk '' can only use single quotation marks
    
    [root@localhost ~]# cat test 
    1 This is the header line.
    2 This is the first data line.
    3 This is the second data line.
    8 This is the last line.
    [root@localhost ~]# awk '{print $1,$5}' test
    1 header
    2 first
    3 second
    8 last
    
  • awk -F #-F is equivalent to the built-in variable FS, which specifies the split character

    // Use multiple separators. First use spaces to split, and then use "," to split the split result
      awk -F '[ ,]'  '{print $1,$3}'   test.txt
    [root@localhost ~]# awk -F'[ ,]' '{print $1,$3}' test
    1 is
    2 is
    3 is
    8 is
    
  • awk -v # set variable

    // Set a variable a=2 and use the + operation
    [root@localhost ~]# cat test 
    1 This is the header line.
    2 This is the first data line.
    3 This is the second data line.
    8 This is the last line.
    [root@localhost ~]# awk -v a=2 '{print $1,$1+a}' test
    1 3
    2 4
    3 5
    8 10
    
  • Matching mechanism

    The power of wk lies in the script command, which consists of two parts: matching rules and executing commands, as shown below:

      'Matching rules{Execute command}'
    

    To specify that the script command can act on a specific line in the text content, which can be specified by string (for example, / demo /, which means to view the line containing demo string) or regular expression. In addition, it should be noted that the whole script command is enclosed in single quotation marks (''), and the execution command part needs to be enclosed in curly braces ({}).

    // When the awk program is executed, if no execution command is specified, the matching lines will be output by default; if no matching rule is specified, all lines in the text will be matched by default.
    [root@localhost ~]# cat test 
    1 This is the header line.
    
    2 This is the first data line.
    
    3 This is the second data line.
    
    
    8 This is the last line.
    [root@localhost ~]# awk '/^$/ {print "spring breeze"}' test
     The spring breeze brushed my face
     The spring breeze brushed my face
     The spring breeze brushed my face
     The spring breeze brushed my face
    
  • Records and fields

    awk treats each input line as a record, while words (i.e. columns) separated by spaces or tabs are used as fields (the characters used to separate fields are called separators).

    [root@localhost ~]# echo 'wsnd hh zz' | awk '{print $1}'
    wsnd
    
    [root@localhost ~]# echo 'wsnd hh zz' | awk 'BEGIN{a=1;b=2}{print $(a+b)}'
    zz
    
    //Print ip
    [root@localhost ~]# ip a | grep 'inet ' | grep -v '127.0.0.1' | awk -F'[ /]+' '{print $3}' 
    192.168.200.145
    
  • Division of fields

    awk can split fields in three ways

    • The first method is to separate fields with white space characters. Set fs to a space. In this case, the leading and ending white space characters (spaces and / or tabs) of the record will be ignored. And the fields will be separated by spaces and / or tabs. Because the default value of FS is a space, this is also the usual method awk to divide records into fields.
    • The second method is to use other single characters to separate fields. For example, awk programs often use ":" as the separator. When FS represents any single character, another field will be separated wherever this character appears. If two consecutive separators appear, the field value between them is an empty string.
    • The method is that if you set more than one character as the field separator, it will be interpreted as a regular expression.
    [root@localhost ~]# cat /etc/passwd
    root:x:0:0:root:/root:/bin/bash
    bin:x:1:1:bin:/bin:/sbin/nologin
    daemon:x:2:2:daemon:/sbin:/sbin/nologin
    [root@localhost ~]# awk 'BEGIN{FS=":"}{print $3}' /etc/passwd
    0
    1
    2
    
  • Logical operation

    // Filter rows greater than 2
    [root@localhost ~]# cat test 
    1 This is the header line.
    
    2 This is the first data line.
    
    3 This is the second data line.
    
    
    8 This is the last line.
    [root@localhost ~]# awk '$1>2' test 
    3 This is the second data line.
    8 This is the last line.
    
    //Filter rows equal to 2 and output the first and third columns
    [root@localhost ~]# awk '$1==2 {print $1,$3}' test 
    2 is
    //Filter rows whose first column is greater than 1 and whose fifth column is equal to 'first'
    [root@localhost ~]# awk '$1>1 && $5=="first" {print $1,$2,$3}' test 
    2 This is
    
    
  • Mode inversion

    // Take the row that does not contain data in column 6 and output columns 1 and 5
    [root@localhost ~]# cat test 
    1 This is the header line.
    
    2 This is the first data line.
    
    3 This is the second data line.
    
    
    8 This is the last line.
    [root@localhost ~]# awk '$6 !~ /data/ {print $1,$5}' test
    1 header
     
     
     
     
    8 last
    
  • OFS output field separation

    [root@localhost ~]# cat passwd 
    root:x:0:0 root:/root:/bin/bash
    bin:x:1:1 bin:/bin:/sbin/nologin
    daemon:x:2:2 daemon:/sbin:/sbin/nologin
    [root@localhost ~]# awk 'BEGIN{FS=":"}{print $1,$6}' passwd 
    root /bin/bash
    bin /sbin/nologin
    daemon /sbin/nologin
    [root@localhost ~]# awk 'BEGIN{FS=":";OFS="="}{print $1,$6}' passwd 
    root=/bin/bash
    bin=/sbin/nologin
    daemon=/sbin/nologin
    
  • NF field quantity variable

    //Query how many columns are there in each row of data
    [root@localhost ~]# cat data 
    john 85 92 78 94 88
    andrea 89 90 75 90 86 92
    jasper 84 88 80 92 84 94 83
    [root@localhost ~]# awk '{print NF}' data 
    6
    7
    8
    
  • NR line number

    //Output the content and line number of each line, and divide it with "." (DOT)
    [root@localhost ~]# cat data 
    john 85 92 78 94 88
    andrea 89 90 75 90 86 92
    jasper 84 88 80 92 84 94 83
    [root@localhost ~]# awk '{print NR "." $0}' data 
    1.john 85 92 78 94 88
    2.andrea 89 90 75 90 86 92
    3.jasper 84 88 80 92 84 94 83
    
  • Record separator for RS input

    // Take the newline character as the field separator, set the record separator to null, and change the output separator to colon to output the contents of the first column
    [root@localhost ~]# cat test 
    1 This is the header line.
    
    2 This is the first data line.
    
    3 This is the second data line.
    
    
    8 This is the last line.
    [root@localhost ~]# awk 'BEGIN{FS="\n";RS="";ORS=":"}{print $1}' test 
    1 This is the header line.:2 This is the first data line.:3 This is the second data line.:8 This is the last line.:
    
  • END

    [root@localhost ~]# cat data 
    john 85 92 78 94 88
    andrea 89 90 75 90 86 92
    jasper 84 88 80 92 84 94 83
    [root@localhost ~]# awk 'BEGIN{print "hi"} {print $0} END{print "byby"}' data 
    hi
    john 85 92 78 94 88
    andrea 89 90 75 90 86 92
    jasper 84 88 80 92 84 94 83
    byby
    

Keywords: regex perl

Added by WeddingLink on Wed, 22 Sep 2021 04:55:45 +0300