878. Regular quick start
// Pattern object
Pattern pattern = Pattern.compile("[0-9]+");
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {
System.out.println(matcher.group(0));
}
879. Demand issues
880. Regular underlying implementation 1
881. Regular underlying implementation 2
// Matching rules
String regExp = "\\d\\d\\d\\d";
// Pattern object
Pattern pattern = Pattern.compile(regExp);
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {
System.out.println(matcher.group(0));
}
/* return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString() */
- According to the specified regular expression, the substrings that meet the matching requirements are located successively from the given string
- When locating a substring, record the first character subscript and the last character subscript + 1 of the substring into the group[0] and group[1] of the attribute int[] groups of the matcher object
- At the same time, record the value of the attribute oldLast of the matcher object as group[1], and the next matching starts from oldLast
882. Regular underlying implementation 3
- Grouping: in a regular expression, a pair of parentheses represents a group, from 1 ·····························································
String regExp = "(\\d)(\\d)(\\d)(\\d)";
- After grouping, the 0 and 1 elements of the attribute groups array of the matcher object still record the subscript of the first character of the substring and the subscript of the last character + 1; The subscript is the element of the groups array starting from 2. Every two adjacent elements record the subscript value of the first character subscript and the last character subscript + 1 of the group in turn. For example, if 2020 appears at positions 323 ~ 326 of a given string, we divide the string into two groups:
groups[0] = 323,groups[1] = 327;
groups[2] = 323,groups[3] = 325;
groups[4] = 325,groups[5] = 327
883. Regular escape character
- Regular expression syntax - Meta character
- qualifier
- Select Match
- Grouping, combining, and backreferencing
- Special characters
- Character matching character
- Locator
- Escape symbols \: when we need to use regular expressions to retrieve some special characters, we need to use escape symbols, otherwise we can't retrieve the results. In Java regular expressions, two \ \, represent one \, in other languages. The characters that need escape characters mainly include the following:. *+ ( ) $ / \ ? [ ] ^ { }
884. Regular character matching
Symbol | significance | Example | explain |
---|
[ ] | List of acceptable characters | [efgh] | e. One character in f, g, h |
[^] | List of characters not received | [^abc] | Any character except a, b, c |
- | Hyphen | A-Z | Any single capital letter |
. | Any character except \ n | a...b | A string beginning with a and ending with b, with a length of 4 |
\ \d | [0-9] | \ \d{3}(\ \d)? | A numeric string of length 3 or 4 |
\ \D | [^0-9] | \ \D(\ \d)* | Non numeric start followed by any numeric character |
\ \w | [0-9a-zA-Z_] | \ \d{3}\ \w{4} | A alphanumeric string with a length of 7 starting with 3 numeric characters |
\ \W | [^0-9a-zA-Z_] | \ \W+\ \d{2} | At least one non numeric letter begins and two numeric characters end |
\ \s | Any white space character | \ \d\ \ s \ \D | A numeric character begins with a blank character and ends with a non numeric character |
\ \S | Any non white space characters | \ \S | Match all non white space characters |
885. Character matching case 1
- Java regular expressions are implemented in two case insensitive ways:
1. (?i)abc: abc Are case insensitive; a(?i)bc: bc Case insensitive
2. Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);
886. Character matching case 2
887. Select Match
Symbol | significance | Example | explain |
---|
| | Match strings before or after | | ab|cd | ab or cd |
888. Regular qualifier
Symbol | significance | Example | explain |
---|
* | Characters appear 0 or n times | (abc)* | String containing any abc |
+ | The character appears 1 or n times | m+(abc)* | At least 1 m followed by any abc |
? | Characters appear 0 or 1 times | m+abc? | Start with at least one m, followed by ab or abc |
{n} | Specify length | [abcd]{3} | Substring of length 3 in abcd |
{n,} | Length ≥ n | [abcd]{3,} | Substrings with length greater than or equal to 3 in abcd |
{n,m} | Length ≥ n ≤ m | [abcd]{3,5} | Substrings with length greater than or equal to 3 and less than or equal to 5 in abcd |
- The Java matching pattern defaults to greedy matching and tries to match strings with a long length. For example: str = aaaa regExp = "\ \ {3,4}", the result is aaaa
889. Regular locator
Symbol | significance | Example | explain |
---|
^ | Specify starting character | ^[0-9]+[a-z] | At least 1 number followed by any lowercase letter |
$ | Specify end character | ^[0-9]\ \ -[a-z]+$ | At least 1 number at the beginning, clip -, and ensure the end of lowercase letters |
\ \b | Target string boundary | han\ \b | There are spaces at the end of a string or after it, and parentheses are not required |
\ \B | String non boundary | han\ \B | The string is not followed by a space or is not an ending string |
890. Capture packets
String regExp = "(\\d\\d)(\\d)(\\d)";
// matcher.group[0] = \\d\\d\\d\\d
// matcher.group[1] = \\d\\d
// matcher.group[2] = \\d
// matcher.group[3] = \\d
- Named grouping: captures matching substrings into a group name or number name. The string for name cannot contain any punctuation and cannot begin with a number. Single quotation marks can be used instead of angle brackets
String regExp = "(?<name>\\d\\d)(?<name>\\d\\d)";
// matcher.group[0] = \\d\\d\\d\\d
// matcher.group["one"] = \\d\\d
// matcher.group["two"] = \\d\\d
891. Non capture packets
- Mather. Cannot be used Group [1] or matcher Group [2 ····] get results
"industr(?:y|ies)" <=> "industry|industries"
"windows(?=95|98|2000)": from windows95 || windows98 || windows2000 Match out windows
"windows(?!95|98|2000)": Not from windows95 || windows98 || windows2000 Match out windows
892. Non greedy matching
// Non greedy matching, matching the string as short as possible
String regExp = "1+?";
893. Regular application cases
// Matching basic Chinese characters
String regExp = "[\u4E00-\u9FA5]";
894. Regular validation complex URL
// Is the match a web address
String regExp = "(((https|http)?://)?([a-z0-9]+[.])|(www.))\\w+[.|\\/]([a-z0-9]{0,})?[[.]([a-z0-9]{0,})]+((/[\\S&&[^,;\u4E00-\u9FA5]]+)+)?([.][a-z0-9]{0,}+|/?)";
895. Pattern class
// Overall matching: judge whether the incoming content meets the requirements of regExp regular expression
boolean isMatch = Pattern.matches(regExp,content);
896. Matcher class
Method name | function |
---|
int start() | Returns the starting index of the successfully matched string |
int end() | Returns the end index of the matching string + 1 |
String replaceAll(String) | Replace the string matched by the regular expression with parameters and return a new string |
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* @author Spring-_-Bear
* @version 2021-11-11 20:28
*/
public class RegularExpression {
public static void main(String[] args) {
String content = "hello hell hello";
String regExp = "hello";
Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.print(matcher.start() + "\t");
System.out.println(matcher.end());
}
// The original string content remains unchanged
String newString = matcher.replaceAll("Li chunxiong");
System.out.println(newString);
}
}
897. Back reference
- Grouping: we can use parentheses to form a more complex matching pattern, so the part of each parenthesis can be regarded as a grouping (also known as a sub expression)
- Capture: save the contents of regular expressions grouped and matched to groups numbered or explicitly named in memory for later reference. From left to right, marked by the left bracket of the group, the group number of the first group is 1, the second is 2, and so on. The group numbered 0 represents the entire regular expression
- After the contents of parentheses are captured, they can be used after the parentheses, so as to write a more practical matching pattern, which we call backreference. This reference can be inside or outside the regular expression. Regular expression internal backreference \ \ group number, regular expression external backreference $group number
898. Back reference cases
// Match 2 consecutive identical numbers
String regExp = "(\\d)\\1";
// Match 5 consecutive identical numbers
String regExp = "(\\d)\\1{4}";
// Match the number of palindromes with 4 digits, and refer back to group 2 and group 1 respectively
String regExp = "(\\d)(\\d)\\2\\1";
// Match similar to 12321-333999111
String regExp = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";
899. Stuttering and de duplication cases
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Stuttering and weight removal
*
* @author Spring-_-Bear
* @version 2021-11-11 20:28
*/
public class RegularExpression {
public static void main(String[] args) {
String content = "I...I...I...Yes, yes, yes..Yes, yes.learn....Java!";
// 1. Replace and adjust all first
String regExp = "\\.";
Pattern pattern = Pattern.compile(regExp);
Matcher matcher = pattern.matcher(content);
content = matcher.replaceAll("");
System.out.println(content);
// 2. Match the repeated Chinese characters and repeat them 1 to n times
regExp = "(.)\\1+";
// content = Pattern.compile(regExp).matcher(content).replaceAll("$1");
pattern = Pattern.compile(regExp);
matcher = pattern.matcher(content);
// 3. Back reference the content in the group to replace the content matched by the regular expression: group[1] = "I" - > "I"
content = matcher.replaceAll("$1");
System.out.println(content);
}
}
900. Replace split matching
- Regular expressions are used in the String class
- Replacement function: public String replaceAll(String regex,String replacement)
- Judgment function: public boolean matches(String regex)
- Split function: public String[] split(String regex)
901. Exercise 1 of this chapter
902. Exercise 2 of this chapter
903. Exercise 3 of this chapter
904. Regular content sorting