Step by step learning Java chapter 27 regular expressions

Chapter 27 regular expressions

878. Regular quick start

// Pattern object
Pattern pattern = Pattern.compile("[0-9]+");
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {
   System.out.println(matcher.group(0));
}

879. Demand issues

880. Regular underlying implementation 1

881. Regular underlying implementation 2

// Matching rules
String regExp = "\\d\\d\\d\\d";
// Pattern object
Pattern pattern = Pattern.compile(regExp);
// Matcher object
Matcher matcher = pattern.matcher(content);
// Loop matching
while (matcher.find()) {
   System.out.println(matcher.group(0));
}
/* return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString() */

According to the specified regular expression, the substrings that meet the matching requirements are located successively from the given string
When locating a substring, record the first character subscript and the last character subscript + 1 of the substring into the group[0] and group[1] of the attribute int[] groups of the matcher object
At the same time, record the value of the attribute oldLast of the matcher object as group[1], and the next matching starts from oldLast

882. Regular underlying implementation 3

Grouping: in a regular expression, a pair of parentheses represents a group, from 1 ·····························································

String regExp = "(\\d)(\\d)(\\d)(\\d)";

After grouping, the 0 and 1 elements of the attribute groups array of the matcher object still record the subscript of the first character of the substring and the subscript of the last character + 1; The subscript is the element of the groups array starting from 2. Every two adjacent elements record the subscript value of the first character subscript and the last character subscript + 1 of the group in turn. For example, if 2020 appears at positions 323 ~ 326 of a given string, we divide the string into two groups:

groups[0] = 323,groups[1] = 327；
groups[2] = 323,groups[3] = 325；
groups[4] = 325,groups[5] = 327

883. Regular escape character

Regular expression syntax - Meta character

qualifier
Select Match
Grouping, combining, and backreferencing
Special characters
Character matching character
Locator

Escape symbols \: when we need to use regular expressions to retrieve some special characters, we need to use escape symbols, otherwise we can't retrieve the results. In Java regular expressions, two \ \, represent one \, in other languages. The characters that need escape characters mainly include the following:. *+ ( ) $ / \ ? [ ] ^ { }

884. Regular character matching

Symbol	significance	Example	explain
[ ]	List of acceptable characters	[efgh]	e. One character in f, g, h
[^]	List of characters not received	[^abc]	Any character except a, b, c
-	Hyphen	A-Z	Any single capital letter
.	Any character except \ n	a...b	A string beginning with a and ending with b, with a length of 4
\ \d	[0-9]	\ \d{3}(\ \d)?	A numeric string of length 3 or 4
\ \D	[^0-9]	\ \D(\ \d)*	Non numeric start followed by any numeric character
\ \w	[0-9a-zA-Z_]	\ \d{3}\ \w{4}	A alphanumeric string with a length of 7 starting with 3 numeric characters
\ \W	[^0-9a-zA-Z_]	\ \W+\ \d{2}	At least one non numeric letter begins and two numeric characters end
\ \s	Any white space character	\ \d\ \ s \ \D	A numeric character begins with a blank character and ends with a non numeric character
\ \S	Any non white space characters	\ \S	Match all non white space characters

885. Character matching case 1

Java regular expressions are implemented in two case insensitive ways:

   1. (?i)abc: abc Are case insensitive; a(?i)bc: bc Case insensitive
   2. Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);

886. Character matching case 2

887. Select Match

Symbol	significance	Example	explain
\|	Match strings before or after \|	ab\|cd	ab or cd

888. Regular qualifier

Symbol	significance	Example	explain
*	Characters appear 0 or n times	(abc)*	String containing any abc
+	The character appears 1 or n times	m+(abc)*	At least 1 m followed by any abc
?	Characters appear 0 or 1 times	m+abc?	Start with at least one m, followed by ab or abc
{n}	Specify length	[abcd]{3}	Substring of length 3 in abcd
{n,}	Length ≥ n	[abcd]{3,}	Substrings with length greater than or equal to 3 in abcd
{n,m}	Length ≥ n ≤ m	[abcd]{3,5}	Substrings with length greater than or equal to 3 and less than or equal to 5 in abcd

The Java matching pattern defaults to greedy matching and tries to match strings with a long length. For example: str = aaaa regExp = "\ \ {3,4}", the result is aaaa

889. Regular locator

Symbol	significance	Example	explain
^	Specify starting character	^[0-9]+[a-z]	At least 1 number followed by any lowercase letter
$	Specify end character	^[0-9]\ \ -[a-z]+$	At least 1 number at the beginning, clip -, and ensure the end of lowercase letters
\ \b	Target string boundary	han\ \b	There are spaces at the end of a string or after it, and parentheses are not required
\ \B	String non boundary	han\ \B	The string is not followed by a space or is not an ending string

890. Capture packets

Unnamed group:

String regExp = "(\\d\\d)(\\d)(\\d)";
// matcher.group[0] = \\d\\d\\d\\d
// matcher.group[1] = \\d\\d
// matcher.group[2] = \\d
// matcher.group[3] = \\d

Named grouping: captures matching substrings into a group name or number name. The string for name cannot contain any punctuation and cannot begin with a number. Single quotation marks can be used instead of angle brackets

String regExp = "(?<name>\\d\\d)(?<name>\\d\\d)";
// matcher.group[0] = \\d\\d\\d\\d
// matcher.group["one"] = \\d\\d
// matcher.group["two"] = \\d\\d

891. Non capture packets

Mather. Cannot be used Group [1] or matcher Group [2 ····] get results

"industr(?:y|ies)" <=> "industry|industries"
"windows(?=95|98|2000)": from windows95 || windows98 || windows2000 Match out windows
"windows(?!95|98|2000)": Not from windows95 || windows98 || windows2000 Match out windows

892. Non greedy matching

// Non greedy matching, matching the string as short as possible
String regExp = "1+?";

893. Regular application cases

// Matching basic Chinese characters
String regExp = "[\u4E00-\u9FA5]";

894. Regular validation complex URL

// Is the match a web address
String regExp = "(((https|http)?://)?([a-z0-9]+[.])|(www.))\\w+[.|\\/]([a-z0-9]{0,})?[[.]([a-z0-9]{0,})]+((/[\\S&&[^,;\u4E00-\u9FA5]]+)+)?([.][a-z0-9]{0,}+|/?)";

895. Pattern class

// Overall matching: judge whether the incoming content meets the requirements of regExp regular expression
boolean isMatch = Pattern.matches(regExp,content);

896. Matcher class

Method name	function
int start()	Returns the starting index of the successfully matched string
int end()	Returns the end index of the matching string + 1
String replaceAll(String)	Replace the string matched by the regular expression with parameters and return a new string

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* @author Spring-_-Bear
* @version 2021-11-11 20:28
*/
public class RegularExpression {
   public static void main(String[] args) {
       String content = "hello hell hello";
       String regExp = "hello";

       Pattern pattern = Pattern.compile(regExp, Pattern.CASE_INSENSITIVE);
       Matcher matcher = pattern.matcher(content);
       while (matcher.find()) {
           System.out.print(matcher.start() + "\t");
           System.out.println(matcher.end());
       }

       // The original string content remains unchanged
       String newString = matcher.replaceAll("Li chunxiong");
       System.out.println(newString);
   }
}

897. Back reference

Grouping: we can use parentheses to form a more complex matching pattern, so the part of each parenthesis can be regarded as a grouping (also known as a sub expression)
Capture: save the contents of regular expressions grouped and matched to groups numbered or explicitly named in memory for later reference. From left to right, marked by the left bracket of the group, the group number of the first group is 1, the second is 2, and so on. The group numbered 0 represents the entire regular expression
After the contents of parentheses are captured, they can be used after the parentheses, so as to write a more practical matching pattern, which we call backreference. This reference can be inside or outside the regular expression. Regular expression internal backreference \ \ group number, regular expression external backreference $group number

898. Back reference cases

// Match 2 consecutive identical numbers
String regExp = "(\\d)\\1";
// Match 5 consecutive identical numbers
String regExp = "(\\d)\\1{4}";
// Match the number of palindromes with 4 digits, and refer back to group 2 and group 1 respectively
String regExp = "(\\d)(\\d)\\2\\1";
// Match similar to 12321-333999111
String regExp = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";

899. Stuttering and de duplication cases

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* Stuttering and weight removal
*
* @author Spring-_-Bear
* @version 2021-11-11 20:28
*/
public class RegularExpression {
   public static void main(String[] args) {
       String content = "I...I...I...Yes, yes, yes..Yes, yes.learn....Java!";

       // 1. Replace and adjust all first
       String regExp = "\\.";
       Pattern pattern = Pattern.compile(regExp);
       Matcher matcher = pattern.matcher(content);
       content = matcher.replaceAll("");
       System.out.println(content);

       // 2. Match the repeated Chinese characters and repeat them 1 to n times
       regExp = "(.)\\1+";
//      content = Pattern.compile(regExp).matcher(content).replaceAll("$1");
       pattern = Pattern.compile(regExp);
       matcher = pattern.matcher(content);

       // 3. Back reference the content in the group to replace the content matched by the regular expression: group[1] = "I" - > "I"
       content = matcher.replaceAll("$1");
       System.out.println(content);
   }
}

900. Replace split matching

Regular expressions are used in the String class

Replacement function: public String replaceAll(String regex,String replacement)
Judgment function: public boolean matches(String regex)
Split function: public String[] split(String regex)

901. Exercise 1 of this chapter

902. Exercise 2 of this chapter

903. Exercise 3 of this chapter

904. Regular content sorting

Keywords: Java Back-end regex

Added by whatever on Tue, 28 Dec 2021 23:06:17 +0200

Programming VIP

Step by step learning Java chapter 27 regular expressions

Chapter 27 regular expressions

878. Regular quick start

879. Demand issues

880. Regular underlying implementation 1

881. Regular underlying implementation 2

882. Regular underlying implementation 3

883. Regular escape character

884. Regular character matching

885. Character matching case 1

886. Character matching case 2

887. Select Match

888. Regular qualifier

889. Regular locator

890. Capture packets

891. Non capture packets

892. Non greedy matching

893. Regular application cases

894. Regular validation complex URL

895. Pattern class

896. Matcher class

897. Back reference

898. Back reference cases

899. Stuttering and de duplication cases

900. Replace split matching

901. Exercise 1 of this chapter

902. Exercise 2 of this chapter

903. Exercise 3 of this chapter

904. Regular content sorting

Popular Keywords