Step by step learning Java chapter 27 regular expressions

Chapter 27 regular expressions

878. Regular quick start

// Pattern object
Pattern pattern = Pattern.compile("[0-9]+");
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {

879. Demand issues

880. Regular underlying implementation 1

881. Regular underlying implementation 2

// Matching rules
String regExp = "\\d\\d\\d\\d";
// Pattern object
Pattern pattern = Pattern.compile(regExp);
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {
/* return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString() */
  • According to the specified regular expression, the substrings that meet the matching requirements are located successively from the given string
  • When locating a substring, record the first character subscript and the last character subscript + 1 of the substring into the group[0] and group[1] of the attribute int[] groups of the matcher object
  • At the same time, record the value of the attribute oldLast of the matcher object as group[1], and the next matching starts from oldLast

882. Regular underlying implementation 3

  • Grouping: in a regular expression, a pair of parentheses represents a group, from 1 ·····························································
String regExp = "(\\d)(\\d)(\\d)(\\d)";
  • After grouping, the 0 and 1 elements of the attribute groups array of the matcher object still record the subscript of the first character of the substring and the subscript of the last character + 1; The subscript is the element of the groups array starting from 2. Every two adjacent elements record the subscript value of the first character subscript and the last character subscript + 1 of the group in turn. For example, if 2020 appears at positions 323 ~ 326 of a given string, we divide the string into two groups:

groups[0] = 323,groups[1] = 327;
groups[2] = 323,groups[3] = 325;
groups[4] = 325,groups[5] = 327

883. Regular escape character

  • Regular expression syntax - Meta character
  1. qualifier
  2. Select Match
  3. Grouping, combining, and backreferencing
  4. Special characters
  5. Character matching character
  6. Locator
  • Escape symbols \: when we need to use regular expressions to retrieve some special characters, we need to use escape symbols, otherwise we can't retrieve the results. In Java regular expressions, two \ \, represent one \, in other languages. The characters that need escape characters mainly include the following:. *+ ( ) $ / \ ? [ ] ^ { }

884. Regular character matching

[ ]List of acceptable characters[efgh]e. One character in f, g, h
[^]List of characters not received[^abc]Any character except a, b, c
-HyphenA-ZAny single capital letter
.Any character except \ na...bA string beginning with a and ending with b, with a length of 4
\ \d[0-9]\ \d{3}(\ \d)?A numeric string of length 3 or 4
\ \D[^0-9]\ \D(\ \d)*Non numeric start followed by any numeric character
\ \w[0-9a-zA-Z_]\ \d{3}\ \w{4}A alphanumeric string with a length of 7 starting with 3 numeric characters
\ \W[^0-9a-zA-Z_]\ \W+\ \d{2}At least one non numeric letter begins and two numeric characters end
\ \sAny white space character\ \d\ \ s \ \DA numeric character begins with a blank character and ends with a non numeric character
\ \SAny non white space characters\ \SMatch all non white space characters

885. Character matching case 1

  • Java regular expressions are implemented in two case insensitive ways:
   1. (?i)abc: abc Are case insensitive; a(?i)bc: bc Case insensitive
   2. Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);

886. Character matching case 2

887. Select Match

|Match strings before or after |ab|cdab or cd

888. Regular qualifier

*Characters appear 0 or n times(abc)*String containing any abc
+The character appears 1 or n timesm+(abc)*At least 1 m followed by any abc
?Characters appear 0 or 1 timesm+abc?Start with at least one m, followed by ab or abc
{n}Specify length[abcd]{3}Substring of length 3 in abcd
{n,}Length ≥ n[abcd]{3,}Substrings with length greater than or equal to 3 in abcd
{n,m}Length ≥ n ≤ m[abcd]{3,5}Substrings with length greater than or equal to 3 and less than or equal to 5 in abcd
  • The Java matching pattern defaults to greedy matching and tries to match strings with a long length. For example: str = aaaa regExp = "\ \ {3,4}", the result is aaaa

889. Regular locator

^Specify starting character^[0-9]+[a-z]At least 1 number followed by any lowercase letter
$Specify end character^[0-9]\ \ -[a-z]+$At least 1 number at the beginning, clip -, and ensure the end of lowercase letters
\ \bTarget string boundaryhan\ \bThere are spaces at the end of a string or after it, and parentheses are not required
\ \BString non boundaryhan\ \BThe string is not followed by a space or is not an ending string

890. Capture packets

  • Unnamed group:
String regExp = "(\\d\\d)(\\d)(\\d)";
//[0] = \\d\\d\\d\\d
//[1] = \\d\\d
//[2] = \\d
//[3] = \\d
  • Named grouping: captures matching substrings into a group name or number name. The string for name cannot contain any punctuation and cannot begin with a number. Single quotation marks can be used instead of angle brackets
String regExp = "(?<name>\\d\\d)(?<name>\\d\\d)";
//[0] = \\d\\d\\d\\d
//["one"] = \\d\\d
//["two"] = \\d\\d

891. Non capture packets

  • Mather. Cannot be used Group [1] or matcher Group [2 ····] get results
"industr(?:y|ies)" <=> "industry|industries"
"windows(?=95|98|2000)": from windows95 || windows98 || windows2000 Match out windows
"windows(?!95|98|2000)": Not from windows95 || windows98 || windows2000 Match out windows

892. Non greedy matching

// Non greedy matching, matching the string as short as possible
String regExp = "1+?";

893. Regular application cases

// Matching basic Chinese characters
String regExp = "[\u4E00-\u9FA5]";

894. Regular validation complex URL

// Is the match a web address
String regExp = "(((https|http)?://)?([a-z0-9]+[.])|(www.))\\w+[.|\\/]([a-z0-9]{0,})?[[.]([a-z0-9]{0,})]+((/[\\S&&[^,;\u4E00-\u9FA5]]+)+)?([.][a-z0-9]{0,}+|/?)";

895. Pattern class

// Overall matching: judge whether the incoming content meets the requirements of regExp regular expression
boolean isMatch = Pattern.matches(regExp,content);

896. Matcher class

Method namefunction
int start()Returns the starting index of the successfully matched string
int end()Returns the end index of the matching string + 1
String replaceAll(String)Replace the string matched by the regular expression with parameters and return a new string
import java.util.regex.Matcher;
import java.util.regex.Pattern;

* @author Spring-_-Bear
* @version 2021-11-11 20:28
public class RegularExpression {
   public static void main(String[] args) {
       String content = "hello hell hello";
       String regExp = "hello";

       Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);
       Matcher matcher = pattern.matcher(content);
       while (matcher.find()) {
           System.out.print(matcher.start() + "\t");

       // The original string content remains unchanged
       String newString = matcher.replaceAll("Li chunxiong");

897. Back reference

  • Grouping: we can use parentheses to form a more complex matching pattern, so the part of each parenthesis can be regarded as a grouping (also known as a sub expression)
  • Capture: save the contents of regular expressions grouped and matched to groups numbered or explicitly named in memory for later reference. From left to right, marked by the left bracket of the group, the group number of the first group is 1, the second is 2, and so on. The group numbered 0 represents the entire regular expression
  • After the contents of parentheses are captured, they can be used after the parentheses, so as to write a more practical matching pattern, which we call backreference. This reference can be inside or outside the regular expression. Regular expression internal backreference \ \ group number, regular expression external backreference $group number

898. Back reference cases

// Match 2 consecutive identical numbers
String regExp = "(\\d)\\1";
// Match 5 consecutive identical numbers
String regExp = "(\\d)\\1{4}";
// Match the number of palindromes with 4 digits, and refer back to group 2 and group 1 respectively
String regExp = "(\\d)(\\d)\\2\\1";
// Match similar to 12321-333999111
String regExp = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";

899. Stuttering and de duplication cases

import java.util.regex.Matcher;
import java.util.regex.Pattern;

* Stuttering and weight removal
* @author Spring-_-Bear
* @version 2021-11-11 20:28
public class RegularExpression {
   public static void main(String[] args) {
       String content = "I...I...I...Yes, yes, yes..Yes, yes.learn....Java!";

       // 1. Replace and adjust all first
       String regExp = "\\.";
       Pattern pattern = Pattern.compile(regExp);
       Matcher matcher = pattern.matcher(content);
       content = matcher.replaceAll("");

       // 2. Match the repeated Chinese characters and repeat them 1 to n times
       regExp = "(.)\\1+";
//      content = Pattern.compile(regExp).matcher(content).replaceAll("$1");
       pattern = Pattern.compile(regExp);
       matcher = pattern.matcher(content);

       // 3. Back reference the content in the group to replace the content matched by the regular expression: group[1] = "I" - > "I"
       content = matcher.replaceAll("$1");

900. Replace split matching

  • Regular expressions are used in the String class
  1. Replacement function: public String replaceAll(String regex,String replacement)
  2. Judgment function: public boolean matches(String regex)
  3. Split function: public String[] split(String regex)

901. Exercise 1 of this chapter

902. Exercise 2 of this chapter

903. Exercise 3 of this chapter

904. Regular content sorting

Keywords: Java Back-end regex

Added by whatever on Tue, 28 Dec 2021 23:06:17 +0200