Linux three swordsmen -- grep command

regular expression

preface

The reason why regular expressions are introduced is that the grep command will use regular expressions, so they are explained at the beginning

1. What is a regular expression?

A regular expression is a way of describing a collection of strings. The representation of regular expressions is the combination of some special symbols, and each symbol represents some specific meaning. Matching combination defines a set of rules and methods, whose main function is to match qualified lines from a large number of text.

2. Usage scenarios of regular expressions

In Linux, the main use scenario of regular expressions is the three swordsmen of text processing. grep,sed,awk . In addition, the vi instruction also supports regular expressions.

3. Regular expression character representation

Regular expressions can be divided into basic regular expressions and extended regular expressions. The main differences are:

  • Basic regular expressions only recognize metacharacters, which mainly include: ^ $ [] *, see the table below for specific meanings
  • The extended regular expression adds () {}? +| Equal symbol

The following are the meanings of each metacharacter

Characters supported in extended regular

predefined character classes

4. Differences between them

We mentioned above that regularity includes basic regularity and extended regularity, but what is the difference between them? Where is it used? Next, we will mainly explain the differences among the three Linux swordsmen (grep,sed,awk)

  • Grep: in grep, if you only use the grep command, you can use the regular or predefined character class of the original character. If you want to use the characters contained in the extended regular, you must add the parameter - E after grep.

grep command

effect:

  • Used to print lines that match a given pattern

Syntax:

grep [options] PATTERN [FILE...]

grep [options] [-e PATTERN | -f FILE] [FILE...]

explain:

The grep instruction is used to search the contents of the FILE file of the given PATTERN (PATTERN). If the FILE content of the PATTERN is found from the FILE content, grep will display the matching line. If you do not specify any FILE, or if you give a FILE name of -, grep reads from standard input.

Alternatively, two variants can be used egrep and fgrep .  Egrep And grep -E Same. Fgrep And grep -F Same.

options: options

Note: the following NUM It represents a number and the number of rows
-A NUM perhaps --after-context=NUM
 In addition to the line that meets the criteria, and after the line is displayed NUM Content of line
-a perhaps--text
 Treat a binary file as a text file; It and--binary-files=text Options are equivalent.

-B NUM perhaps--before-context=NUM
 In addition to the line that meets the criteria, and before the line is displayed NUM The contents of the line.

-C NUM perhaps--context=NUM
 In addition to the line that meets the criteria, the lines before and after the line are displayed NUM Content of line

-b perhaps--byte-offset
 The byte offset of the current line in the input file is printed in front of each line of the output at the same time.

--colour[=WHEN] perhaps --color[=WHEN]
In the matched lines, the matched string is shaded. WHEN Can be never,always,or auto. 

-c perhaps--count
 Calculate the number of eligible rows

-d ACTION perhaps --directories=ACTION
 If the input file is a directory, use the action ACTION To deal with it.
By default, actions ACTION yes read,This means that the directory will be read as an ordinary file.
If action ACTION yes skip ,The directory will be skipped without processing.
If action ACTION yes recurse,grep All files in each directory will be read recursively. Doing so and-r Options are equivalent.

-E perhaps --extended-regexp
 take E The latter pattern is used as a regular expression.

-e PATTERN perhaps --regexp=PATTERN
 use PATTERN As a pattern for finding file contents(Support regular),However, multiple commands can be used in a single command-e option

-F perhaps --fixed-strings
 Will mode PATTERN Treat the list as a fixed string with a new line (newlines) Separate, just match one of them.

-f FILE perhaps--file=FILE
 From file FILE Get the mode in, one for each line. The empty file contains 0 patterns, so it doesn't match anything.

-G perhaps--basic-regexp
 Will mode PATTERN As a basic regular expression, this is the default.

-H perhaps --with-filename
 Print the file name for each match.

-h perhaps --no-filename
 When searching for multiple files, it is forbidden to prefix the output with the file name.

-i perhaps --ignore-case
 Ignore case differences

-L perhaps --files-without-match
 The matching file name cannot be found in the print file content

-l perhaps --files-with-matches
 Print out the file name after finding a match in the file content

-m NUM perhaps --max-count=NUM
 Find in NUM After a matching line, the file is no longer read.

-n perhaps --line-number
 Precede each line of output with its line number in the file it is in.

-o perhaps --only-matching
 Show only matching rows with PATTERN Matching parts.

--label=LABEL
 Treat matching output from standard input as if it were from an input file LABEL Value of

--line-buffering
 Using line buffering, it can be a performance penality.

-q, --quiet, --silent
 No information is displayed.

-R, -r, --recursive
 Recursively read all files in each directory. Doing so and -d recurse Options are equivalent.

--include=PATTERN
 Just search for matches PATTERN Search recursively in the directory when searching for files.

--exclude=PATTERN
 Search recursively in the directory, but skip matching PATTERN File.

-s perhaps --no-messages
 Disable the output of error messages about files that do not exist or are unreadable.

-u perhaps --unix-byte-offsets
 report Unix Byte offset of style. This switch makes grep When reporting byte offsets, treat files as Unix
 Style of text file, that is to say, will CR Character removal. This will be produced with in one machine Unix Run on host grep Exactly the same result. Unless used at the same time-b Option, otherwise this option is invalid. This option is in MS-DOS and MS-Windows Invalid in system other than.

-V perhaps --version
 Print to standard error output grep Version number of the.

-v perhaps invert-match
 Displays all rows that do not contain matching patterns.

-w perhaps --word-regexp
 Select only rows that contain matches that make up a complete word. The judgment method is that the matching substring must be the beginning of a line, or in an impossible

-x perhaps --line-regexp
 Exactly.

-Z, --null
 Show all file contents,Different fonts are marked by color

a key

Although we can see from the above that there are many options in grep, most of them are unavailable in work. Let's focus here.

Common parameters

example

The file info is used and filtered through grep. The file contents of info are as follows:

1. Find the contents of ccc in the file info and print the number of lines

grep -n "ccc" info


2. Find and print the characters that contain ggg and ignore case in the file info,

grep -i "ggg" info


3. Filter out rows containing ccc

grep -v "ccc" info


4. Find the row containing DDD, EEE and FFF (Note: the following matching uses regular)

grep -E "ddd|eee|fff" info

5. Find lines starting with c

grep ^c info


6. Find lines that begin and end with ccx

grep ^ccx$ info

7. Find a line that can be any character before the d character

grep .d info

8. Find a line that contains one or more d characters

grep -E d{1} info


9. Display lines containing characters a, B and C; Displays lines that do not contain a,b,c characters

contain: grep -i ^[abc] info
 Excluding: grep -i [^abc] info (Yes, all characters are excluded a or b or c)


10. Display one or more lines containing the a character

grep -E a+ info


11. Find a line that starts with cc and contains c,x,ld characters

grep -E "cc(c|x|ld)" info


12. Find all uppercase characters in the file

grep [[:upper:]] info


13. Match any alphanumeric character

grep [[:alnum:]] info

Keywords: Linux

Added by sandy1028 on Fri, 31 Dec 2021 21:08:51 +0200