java regular expression

What is a regular expression

Regular Expression, also known as Regular Expression. (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept of computer science. Regular expressions are usually used to retrieve and replace text that conforms to a certain pattern (rule). (source: Baidu Encyclopedia)

How to use regular expressions in java

There are five steps to using regular expressions in java

  1. Prepare the regular expression to match
  2. Create regular expression object
  3. Create matcher
  4. Traversal expression
  5. Get matching results

demo:

public class RegTheory {
    public static void main(String[] args) {
        String content = "1998 December 8, 2008, second generation Java Enterprise version of the platform J2EE release. In June 1999, Sun The company released the second generation Java Platform (referred to as Java2)3 versions of: J2ME(Java2 Micro Edition,Java2 Micro version of the platform), applied to mobile, wireless and limited resource environment; J2SE(Java 2 Standard Edition,Java 2 The standard version of the platform), which is applied to the desktop environment; J2EE(Java 2Enterprise Edition,Java 2 Platform based Enterprise Edition), applied to Java Application server. Java 2 The release of the platform is Java The most important milestone in the development process marks Java The application of began to popularize.";
        // \d represents any number from 0 to 9
        String regStr = "(\\d\\d)(\\d\\d)";
        //Create regular expression object
        Pattern compile = Pattern.compile(regStr);
        //Create matcher
        Matcher matcher = compile.matcher(content);
        /**
         * matcher.find()
         * 1,Locate the substring that satisfies the rule according to the specified rule (for example, 1998)
         * 2,When found, record the string start index in the attribute group
         * 3,Record the end index + 1 in the group
         * 4,At the same time, record the oldLast value as the end index + 1 (the position to be searched next time)
         * match.group();
         *
         * public String group(int group) {
         *         if (first < 0)
         *             throw new IllegalStateException("No match found");
         *         if (group < 0 || group > groupCount())
         *             throw new IndexOutOfBoundsException("No group " + group);
         *         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
         *             return null;
         *         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
         *     }
         *  1,Intercept according to the recording position from group [0] to group [1], intercept the substring from content and return
         */
        //Traversal expression
        while (matcher.find()){
            //Output results
            System.out.println(matcher.group(0));
        }

     }
}

result
1998
1999
Process finished with exit code 0

regular expression syntax

Escape character: (two slashes)\

Marks certain special characters as strings that can be processed normally.
Escape characters are part of the formal grammar of many programming languages, data formats and communication protocols.
The escape characters required are as follows:

. * + ( ) $ / \ ? [ ] ^ { }

Regular matching character

Aggregator

SymbolinterpretationExampleexplain
[ ]List of acceptable strings[abcd]Receive any character in abcd
[^]List of unacceptable strings[^efg]Receive any character except efg
-HyphenA-ZAny capital letter

Match character

SymbolinterpretationExampleexplainMatching results
.Matches characters other than \ na...bAny 4-bit string starting with a and ending with baaab,abbb,accb,a**b
\dMatch a single number, such as 0-9\d{3}{\d}?A string containing 3-4 numbers123,1234
\DMatch non single number\D{\d}*Start with a non number followed by any numbera,a123
\\wMatch individual numbers or uppercase and lowercase letters\d{3}\\w{4}A 7-length alphanumeric string starting with three letters234bacd,12345b7
\\WMatch single non numeric, non uppercase and lowercase letters\\W+\d{2}A string that begins with at least one non numeric letter and ends with two numeric characters#22,#@#10
\\sMatch a single empty string\d{2}\\s\d{2}Space between two numbers11 22
\\SMatch non single empty string\d{2}\\S\d{2}There is no space between two numbers11322

qualifier

SymbolmeaningexampleexplainMatch input
*Repeat 0 to more than once(abc)*Contains more than one abc stringabc,abcabc
+Repeat one or more timesm+(abc)*At least one m starts and 0 or more abc endsm,mmmabcabc
?Repeat 0 or 1 timesm+abc?Start with at least one m and end with ab or abc

Locator

SymbolmeaningExampleexplainmatching
^Start character^ [0-9]+[a-z]*A string that begins with at least one number followed by any lowercase letter123,6ss,333sd
$End character^ [0-9]+[a-z]$A string that begins with at least one number and ends with a lowercase letter1a
\bThe boundary of the target stringcd\bSpace at endabcd efcdg
\BThe target string is not boundedcd\bNo space at endabcd efcdg

Parentheses:

Represents a string that can be matched. For example, (abc) represents the matching string "abc", (A-C) represents only the string "a-c", and the matching characters "a", "B", "C" can be expressed through (a|b|c).

Brackets:

Indicates the range of matching characters. For example, [abcd] indicates any one of the matching characters a, B, C and D, which is equivalent to [a-d] and (a|b|c|d).

Note that in [], the occurrence of | will be recognized as the symbol "|" for matching.

Braces:

Indicates the number of matching occurrences, {n} occurs n times, {n,} occurs at least N times, {n,m} occurs at least N times and at most m times.

Case

When creating a Pattern, add the parameter Pattern CASE_ INSENSITIVE:

public class RegExp3 {
    public static void main(String[] args) {
        String context = "abcABCaBcAbC";
        String regStr = "abc";
        Pattern compile = Pattern.compile(regStr,Pattern.CASE_INSENSITIVE);
        Matcher matcher = compile.matcher(context);
        while (matcher.find()){
            System.out.println(matcher.group(0));
        }
    }
}

Greed and laziness

Greedy matching:

When {m,n}, try to match to n
When x+ / x *, try to match to the maximum value
This is greedy matching
demo:

public class RegExp7 {
    public static void main(String[] args) {
        String content = "1111111";
        String regStr = "\\d{2,3}";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println(matcher.group());
        }
    }
}

111
111

Lazy matching:

A greedy match is formed by adding a "? After the value of modification quantity (*, +, {m,n}):
demo:

public class RegExp7 {
    public static void main(String[] args) {
        String content = "1111111";
        String regStr = "\\d{2,3}?";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println(matcher.group());
        }
    }
}

Output results:

11
11
11

grouping

A () is a group
The order is sorted from left to right by left parentheses
Use matcher Group (n) gets the nth group.
demo:

public class RegExp4 {
    public static void main(String[] args) {
        String content = "huangshiping s77872 nn2213han";
        String regStr = "(\\d(\\d))(\\d\\d)";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println(matcher.group(0));
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
            System.out.println(matcher.group(3));
        }
    }
}

Execution results:

7787
77
7
87
2213
22
2
13

Named group

You can get groups by naming them:
demo:

public class RegExp4 {
    public static void main(String[] args) {
        String content = "huangshiping s77872 nn2213han";
//        Named group
        String regStr = "(?<g1>\\d(\\d))(\\d\\d)";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()){
            System.out.println("g1: "+matcher.group("g1"));
        }
    }
}

Execution results:

g1: 77
g1: 22

Non splicing capture

Non splicing capture is used to quickly generate expressions
content = "teacher jerry, classmate jerry, Professor jerry";

SymbolfilterOutput result effect (bold is print result)
?:Jerry (teacher and classmate)Teacher jerry, classmate jerry, Professor jerry
?=jerry(? = teacher | classmate)Teacher jerry, classmate jerry, Professor jerry
?!jerry(! Teacher | classmate)Teacher jerry, classmate jerry, Professor jerry

Pattern common methods

matches global matching

public class PatternTest {
    public static void main(String[] args) {
        String content = "In real product design scenarios";
        String regStr = "stay.*";
        boolean matches = Pattern.matches(regStr,content);
        System.out.println(matches);
    }
}

true

public class PatternTest {
    public static void main(String[] args) {
        String content = "In real product design scenarios";
        String regStr = "stay";
        boolean matches = Pattern.matches(regStr,content);
        System.out.println(matches);
    }
}

false

Added by jimbo_head on Thu, 30 Dec 2021 00:06:35 +0200