What is a regular expression
Regular Expression, also known as Regular Expression. (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept of computer science. Regular expressions are usually used to retrieve and replace text that conforms to a certain pattern (rule). (source: Baidu Encyclopedia)
How to use regular expressions in java
There are five steps to using regular expressions in java
- Prepare the regular expression to match
- Create regular expression object
- Create matcher
- Traversal expression
- Get matching results
demo:
public class RegTheory { public static void main(String[] args) { String content = "1998 December 8, 2008, second generation Java Enterprise version of the platform J2EE release. In June 1999, Sun The company released the second generation Java Platform (referred to as Java2)3 versions of: J2ME(Java2 Micro Edition,Java2 Micro version of the platform), applied to mobile, wireless and limited resource environment; J2SE(Java 2 Standard Edition,Java 2 The standard version of the platform), which is applied to the desktop environment; J2EE(Java 2Enterprise Edition,Java 2 Platform based Enterprise Edition), applied to Java Application server. Java 2 The release of the platform is Java The most important milestone in the development process marks Java The application of began to popularize."; // \d represents any number from 0 to 9 String regStr = "(\\d\\d)(\\d\\d)"; //Create regular expression object Pattern compile = Pattern.compile(regStr); //Create matcher Matcher matcher = compile.matcher(content); /** * matcher.find() * 1,Locate the substring that satisfies the rule according to the specified rule (for example, 1998) * 2,When found, record the string start index in the attribute group * 3,Record the end index + 1 in the group * 4,At the same time, record the oldLast value as the end index + 1 (the position to be searched next time) * match.group(); * * public String group(int group) { * if (first < 0) * throw new IllegalStateException("No match found"); * if (group < 0 || group > groupCount()) * throw new IndexOutOfBoundsException("No group " + group); * if ((groups[group*2] == -1) || (groups[group*2+1] == -1)) * return null; * return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString(); * } * 1,Intercept according to the recording position from group [0] to group [1], intercept the substring from content and return */ //Traversal expression while (matcher.find()){ //Output results System.out.println(matcher.group(0)); } } }
result
1998
1999
Process finished with exit code 0
regular expression syntax
Escape character: (two slashes)\
Marks certain special characters as strings that can be processed normally.
Escape characters are part of the formal grammar of many programming languages, data formats and communication protocols.
The escape characters required are as follows:
. * + ( ) $ / \ ? [ ] ^ { }
Regular matching character
Aggregator
Symbol | interpretation | Example | explain |
---|---|---|---|
[ ] | List of acceptable strings | [abcd] | Receive any character in abcd |
[^] | List of unacceptable strings | [^efg] | Receive any character except efg |
- | Hyphen | A-Z | Any capital letter |
Match character
Symbol | interpretation | Example | explain | Matching results |
---|---|---|---|---|
. | Matches characters other than \ n | a...b | Any 4-bit string starting with a and ending with b | aaab,abbb,accb,a**b |
\d | Match a single number, such as 0-9 | \d{3}{\d}? | A string containing 3-4 numbers | 123,1234 |
\D | Match non single number | \D{\d}* | Start with a non number followed by any number | a,a123 |
\\w | Match individual numbers or uppercase and lowercase letters | \d{3}\\w{4} | A 7-length alphanumeric string starting with three letters | 234bacd,12345b7 |
\\W | Match single non numeric, non uppercase and lowercase letters | \\W+\d{2} | A string that begins with at least one non numeric letter and ends with two numeric characters | #22,#@#10 |
\\s | Match a single empty string | \d{2}\\s\d{2} | Space between two numbers | 11 22 |
\\S | Match non single empty string | \d{2}\\S\d{2} | There is no space between two numbers | 11322 |
qualifier
Symbol | meaning | example | explain | Match input |
---|---|---|---|---|
* | Repeat 0 to more than once | (abc)* | Contains more than one abc string | abc,abcabc |
+ | Repeat one or more times | m+(abc)* | At least one m starts and 0 or more abc ends | m,mmmabcabc |
? | Repeat 0 or 1 times | m+abc? | Start with at least one m and end with ab or abc |
Locator
Symbol | meaning | Example | explain | matching |
---|---|---|---|---|
^ | Start character | ^ [0-9]+[a-z]* | A string that begins with at least one number followed by any lowercase letter | 123,6ss,333sd |
$ | End character | ^ [0-9]+[a-z]$ | A string that begins with at least one number and ends with a lowercase letter | 1a |
\b | The boundary of the target string | cd\b | Space at end | abcd efcdg |
\B | The target string is not bounded | cd\b | No space at end | abcd efcdg |
Parentheses:
Represents a string that can be matched. For example, (abc) represents the matching string "abc", (A-C) represents only the string "a-c", and the matching characters "a", "B", "C" can be expressed through (a|b|c).
Brackets:
Indicates the range of matching characters. For example, [abcd] indicates any one of the matching characters a, B, C and D, which is equivalent to [a-d] and (a|b|c|d).
Note that in [], the occurrence of | will be recognized as the symbol "|" for matching.
Braces:
Indicates the number of matching occurrences, {n} occurs n times, {n,} occurs at least N times, {n,m} occurs at least N times and at most m times.
Case
When creating a Pattern, add the parameter Pattern CASE_ INSENSITIVE:
public class RegExp3 { public static void main(String[] args) { String context = "abcABCaBcAbC"; String regStr = "abc"; Pattern compile = Pattern.compile(regStr,Pattern.CASE_INSENSITIVE); Matcher matcher = compile.matcher(context); while (matcher.find()){ System.out.println(matcher.group(0)); } } }
Greed and laziness
Greedy matching:
When {m,n}, try to match to n
When x+ / x *, try to match to the maximum value
This is greedy matching
demo:
public class RegExp7 { public static void main(String[] args) { String content = "1111111"; String regStr = "\\d{2,3}"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println(matcher.group()); } } }
111
111
Lazy matching:
A greedy match is formed by adding a "? After the value of modification quantity (*, +, {m,n}):
demo:
public class RegExp7 { public static void main(String[] args) { String content = "1111111"; String regStr = "\\d{2,3}?"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println(matcher.group()); } } }
Output results:
11
11
11
grouping
A () is a group
The order is sorted from left to right by left parentheses
Use matcher Group (n) gets the nth group.
demo:
public class RegExp4 { public static void main(String[] args) { String content = "huangshiping s77872 nn2213han"; String regStr = "(\\d(\\d))(\\d\\d)"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println(matcher.group(0)); System.out.println(matcher.group(1)); System.out.println(matcher.group(2)); System.out.println(matcher.group(3)); } } }
Execution results:
7787
77
7
87
2213
22
2
13
Named group
You can get groups by naming them:
demo:
public class RegExp4 { public static void main(String[] args) { String content = "huangshiping s77872 nn2213han"; // Named group String regStr = "(?<g1>\\d(\\d))(\\d\\d)"; Pattern pattern = Pattern.compile(regStr); Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("g1: "+matcher.group("g1")); } } }
Execution results:
g1: 77
g1: 22
Non splicing capture
Non splicing capture is used to quickly generate expressions
content = "teacher jerry, classmate jerry, Professor jerry";
Symbol | filter | Output result effect (bold is print result) |
---|---|---|
?: | Jerry (teacher and classmate) | Teacher jerry, classmate jerry, Professor jerry |
?= | jerry(? = teacher | classmate) | Teacher jerry, classmate jerry, Professor jerry |
?! | jerry(! Teacher | classmate) | Teacher jerry, classmate jerry, Professor jerry |
Pattern common methods
matches global matching
public class PatternTest { public static void main(String[] args) { String content = "In real product design scenarios"; String regStr = "stay.*"; boolean matches = Pattern.matches(regStr,content); System.out.println(matches); } }
true
public class PatternTest { public static void main(String[] args) { String content = "In real product design scenarios"; String regStr = "stay"; boolean matches = Pattern.matches(regStr,content); System.out.println(matches); } }
false