Skills: awk tutorial linux command

preface

AWK is an interpretive programming language. For text processing, its first name comes from the last names of its three authors: Alfred Aho, Peter Weinberger and Brian Kernighan

  • awk program structure
  • Run awk file script
  • awk basic syntax
  • Built in variables commonly used in awk program

Pay attention to the official account and exchange with each other. Search by WeChat: Sneak ahead.

Program structure

awk command mode:

  • awk ' BEGIN {awk-commands} /pattern/ {awk-commands} END {awk-commands}' fileName
  • If there is a matching pattern, you need to use / include
  • Awk commands program code blocks must be curly braces
  • BEGIN statement block BEGIN {AWK commands}, optional. It is executed only once, where variables can be initialized. BEGIN is the keyword of AWK and must be capitalized
  • BODY statement block / pattern / {awk commands}. The commands in the BODY statement block will execute on each line of input. This behavior can be controlled by providing a mode
  • END statement block END {AWK commands}, optional. The END statement block is executed at the END of the program. END is the keyword of AWK and must be capitalized

awk workflow

BODY statement block execution parsing

Script command: awk '{[code statement 1] [code statement 2]}; If there is no fileName or other input stream and there is a BODY statement block, the BODY statement block will enter an endless loop; The code statement expression ends with a semicolon or a newline character

  • 1: Read a row of data and fill $0 in the data of this row; The data of each column is filled in variables such as $1, $2
  • 2: Execute code statements
  • 3: If there are subsequent rows of data, repeat steps 1 ~ 2 above until each data is read

Run awk file script

  • Awk file script ends with awk suffix
  • Option [- F]: awk - f command awk marks. txt

awk basic syntax

  • awk variables do not need to be defined in advance, and the type must not be specified
awk 'BEGIN{sum=1;print sum}'
1
  • Process control
#--------Pseudo code 1---------
if ({condition})
   Code logic...
else if({condition})
   Code logic...
else
   Code logic...
#--------Pseudo code 2---------
for ({initialization}; {condition};{Subsequent logic}){
   Code logic...
}   
#--------Pseudo code 3---------
while ({condition}){
   Code logic...
}
#--------Pseudo code 4---------
do{
   Code logic...
}while ({condition})    
  • Operators are basically the same as java programming language. Here are a few operators
SymbolexplainExamples
^Exponential operatora = a ^ 2
-/+unary operatora = -10; a = +a;
condition ? action : actionternary operator (a > b) ? max = a : max = b;
&& / ||Logical operatorif (num >= 0 && num <= 7)
== / !=Equal to or not equal toif (a == b)
awk 'BEGIN{sum=1;sum++; if(sum==2) print sum}'
2
  • Array. AWK supports associative arrays, that is, you can use not only arrays with numeric indexes, but also strings as indexes; To delete an array element, use the delete statement to delete arr[0]
$ awk 'BEGIN {arr["lwl"] = 1; arr["csc"] = 2; for (i in arr) printf "arr[%s] = %d\n", i, arr[i]}'
arr[lwl] = 1
arr[csc] = 2
  • String operation
---- If spaces are used to splice characters, tease HA is used as the splice character by default ----
awk 'BEGIN { str1 = "csc, "; str2 = "lwl"; str3 = str1 str2; print str3 }'
csc, lwl
  • String related built-in functions
index(str, sub) #Get the start index of sub in str
length(str) #Get str length
match(str, regex) #Does str match the regex pattern
split(str, arr, regex)
sub(regex, sub, string)
substr(str, start, l)
tolower(str)
toupper(str)

regular expression

  • Matches: ~ and~ Represents a match and a mismatch, respectively
$ awk '$0 !~ 9' marks.txt
1) Amit     Physics   80
3) Shyam    Biology   87
  • Matches and regular expressions
# log.txt content file
1 csc world
2 lwl hello
----------The second column of the output contains lwl Line of------------------------------
$ awk '$2 ~ /lwl/ {print $2,$3}' log.txt
lwl hello
------Output contains csc Line of---------------------------
$ awk '/csc/ {print $0}' log.txt
1 csc world

Built in variables commonly used in awk program

variabledescribe
$nThe nth field of the current record, separated by FS
$0Complete input record
ARGCNumber of command line arguments
ARGVArray containing command line arguments
ENVIRONenvironment variable
ERRNODescription of the last system error
FILENAMECurrent file name
FSField separator (default is any space)
IGNORECASEMatch regardless of case
NFNumber of fields in a record
NRThe number of records that have been read out is the line number, starting from 1
FNRSimilar to NR, but if there are multiple input files, FNR the line number of the current file
OFSOutput field separator
ORSOutput line separator
RLENGTHThe length of the string matched by the match function
RSRecord separator (default is a newline character)
RSTARTThe first position of the string matched by the match function
ARGINDThe index of the ARGV currently being processed when data is processed in a loop
PROCINFOAssociative array containing process information, such as UID, process ID, etc
  • Number of ARGV command line parameters
$ awk 'BEGIN { 
   for (i = 0; i < ARGC - 1; ++i) { 
      printf "ARGV[%d] = %s\n", i, ARGV[i] 
   } 
}' csc lwl 
ARGV[0] = csc
ARGV[1] = lwl
  • Environment variable Ron
$ awk 'BEGIN { print ENVIRON["USER"] }'
csc
  • FILENAME current FILENAME
$ awk 'END {print FILENAME}' test.txt
test.txt
  • RSTART, the first position of the string matched by the match function
$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }
9

Welcome refers to the error in the text

Reference articles

Keywords: Java Linux Programmer awk

Added by stickynote427 on Mon, 31 Jan 2022 02:36:35 +0200