preface
AWK is an interpretive programming language. For text processing, its first name comes from the last names of its three authors: Alfred Aho, Peter Weinberger and Brian Kernighan
- awk program structure
- Run awk file script
- awk basic syntax
- Built in variables commonly used in awk program
Pay attention to the official account and exchange with each other. Search by WeChat: Sneak ahead.
Program structure
awk command mode:
- awk ' BEGIN {awk-commands} /pattern/ {awk-commands} END {awk-commands}' fileName
- If there is a matching pattern, you need to use / include
- Awk commands program code blocks must be curly braces
- BEGIN statement block BEGIN {AWK commands}, optional. It is executed only once, where variables can be initialized. BEGIN is the keyword of AWK and must be capitalized
- BODY statement block / pattern / {awk commands}. The commands in the BODY statement block will execute on each line of input. This behavior can be controlled by providing a mode
- END statement block END {AWK commands}, optional. The END statement block is executed at the END of the program. END is the keyword of AWK and must be capitalized
awk workflow
BODY statement block execution parsing
Script command: awk '{[code statement 1] [code statement 2]}; If there is no fileName or other input stream and there is a BODY statement block, the BODY statement block will enter an endless loop; The code statement expression ends with a semicolon or a newline character
- 1: Read a row of data and fill $0 in the data of this row; The data of each column is filled in variables such as $1, $2
- 2: Execute code statements
- 3: If there are subsequent rows of data, repeat steps 1 ~ 2 above until each data is read
Run awk file script
- Awk file script ends with awk suffix
- Option [- F]: awk - f command awk marks. txt
awk basic syntax
- awk variables do not need to be defined in advance, and the type must not be specified
awk 'BEGIN{sum=1;print sum}' 1
- Process control
#--------Pseudo code 1--------- if ({condition}) Code logic... else if({condition}) Code logic... else Code logic... #--------Pseudo code 2--------- for ({initialization}; {condition};{Subsequent logic}){ Code logic... } #--------Pseudo code 3--------- while ({condition}){ Code logic... } #--------Pseudo code 4--------- do{ Code logic... }while ({condition})
- Operators are basically the same as java programming language. Here are a few operators
Symbol | explain | Examples |
---|---|---|
^ | Exponential operator | a = a ^ 2 |
-/+ | unary operator | a = -10; a = +a; |
condition ? action : action | ternary operator | (a > b) ? max = a : max = b; |
&& / || | Logical operator | if (num >= 0 && num <= 7) |
== / != | Equal to or not equal to | if (a == b) |
awk 'BEGIN{sum=1;sum++; if(sum==2) print sum}' 2
- Array. AWK supports associative arrays, that is, you can use not only arrays with numeric indexes, but also strings as indexes; To delete an array element, use the delete statement to delete arr[0]
$ awk 'BEGIN {arr["lwl"] = 1; arr["csc"] = 2; for (i in arr) printf "arr[%s] = %d\n", i, arr[i]}' arr[lwl] = 1 arr[csc] = 2
- String operation
---- If spaces are used to splice characters, tease HA is used as the splice character by default ---- awk 'BEGIN { str1 = "csc, "; str2 = "lwl"; str3 = str1 str2; print str3 }' csc, lwl
- String related built-in functions
index(str, sub) #Get the start index of sub in str length(str) #Get str length match(str, regex) #Does str match the regex pattern split(str, arr, regex) sub(regex, sub, string) substr(str, start, l) tolower(str) toupper(str)
regular expression
- Matches: ~ and~ Represents a match and a mismatch, respectively
$ awk '$0 !~ 9' marks.txt 1) Amit Physics 80 3) Shyam Biology 87
- Matches and regular expressions
# log.txt content file 1 csc world 2 lwl hello ----------The second column of the output contains lwl Line of------------------------------ $ awk '$2 ~ /lwl/ {print $2,$3}' log.txt lwl hello ------Output contains csc Line of--------------------------- $ awk '/csc/ {print $0}' log.txt 1 csc world
Built in variables commonly used in awk program
variable | describe |
---|---|
$n | The nth field of the current record, separated by FS |
$0 | Complete input record |
ARGC | Number of command line arguments |
ARGV | Array containing command line arguments |
ENVIRON | environment variable |
ERRNO | Description of the last system error |
FILENAME | Current file name |
FS | Field separator (default is any space) |
IGNORECASE | Match regardless of case |
NF | Number of fields in a record |
NR | The number of records that have been read out is the line number, starting from 1 |
FNR | Similar to NR, but if there are multiple input files, FNR the line number of the current file |
OFS | Output field separator |
ORS | Output line separator |
RLENGTH | The length of the string matched by the match function |
RS | Record separator (default is a newline character) |
RSTART | The first position of the string matched by the match function |
ARGIND | The index of the ARGV currently being processed when data is processed in a loop |
PROCINFO | Associative array containing process information, such as UID, process ID, etc |
- Number of ARGV command line parameters
$ awk 'BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %s\n", i, ARGV[i] } }' csc lwl ARGV[0] = csc ARGV[1] = lwl
- Environment variable Ron
$ awk 'BEGIN { print ENVIRON["USER"] }' csc
- FILENAME current FILENAME
$ awk 'END {print FILENAME}' test.txt test.txt
- RSTART, the first position of the string matched by the match function
$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } } 9