Summary of basic knowledge points of Java regular expression

regular expression

quick get start

How to find out all the English letters in the following paragraph?

Due to development Oak Language, there is no hardware platform to run bytecode, so in order to carry out experimental research on this language during development, they built a running platform with software based on the existing hardware and software platform and according to their own specified specifications. The whole system is not only better than C++There is no big difference beyond being simpler.
//Judge whether the letters are too cumbersome and complex by the range of the letter ASCll code 
char a = 'a';
int aint = (int)a;
//Using regular expressions
public static void main(String[] args) {
        String content  = "Due to development Oak Language, there is no hardware platform to run bytecode," +
                "Therefore, in order to conduct experimental research on this language during development, they are based on the existing hardware and software" +
                "Based on the software platform and according to the specifications specified by ourselves, an operation platform is built with software," +
                "The whole system except C++There is no big difference beyond being simpler.";

        Pattern pattern = Pattern.compile("[a-zA-Z]+");//Pattern object regular expression object matching letters
    	//Pattern pattern = Pattern.compile("[0-9]+"); // Match number
    	//Pattern pattern = Pattern.compile("([a-zA-Z]+)|([0-9]+)");// Numbers and letters
        Matcher matcher = pattern.matcher(content);//Matcher
        while(matcher.find()){
            System.out.println("Found:"+matcher.group(0));
        }
    }

Found: Oak
 Found: C

If you want to find Baidu hot search?

 <a target="_blank" title="Gansu cross country race accident set up a joint investigation team" href="/s?rsv_idx
  Pattern pattern = Pattern.compile("<a target=\"_blank\" title=\"(\\S*)\"");
  Matcher matcher = pattern.matcher(content);
  while(matcher.find()){
            System.out.println("Found:"+matcher.group(1));
  }

What about the ip address?

A Class 10.0.0.0--10.255.255.255
B Class 172.16.0.0--172.31.255.255
C Class 192.168.0.0--192.168.255.255
Pattern pattern = Pattern.compile("\\d+\\.\\d+\\.\\d+\\.\\d+");//ip address
Matcher matcher = pattern.matcher(content);//Matcher
while(matcher.find()){
    System.out.println("Found:"+matcher.group(0));
}

Found: 10.0.0.0
 Found: 10.255.255.255
 Found: 172.16.0.0
 Found: 172.31.255.255
 Found: 192.168.0.0
 Found: 192.168.255.255

Analyze the matcher Find() and matcher Underlying logic of group (0)

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-7jucBlos-1622336963819)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210525210519094. PNG)]

 //Locate the substring that meets the rules according to the specified rules, and record the start index position and end index position + 1 of the substring that meets the conditions to int[] groups; Property,
 // groups[0]=xxx,groups[1]=xxx, and record oldLast as the end index position + 1, representing when to start the next matching
//matcher. The source code of the group (0) method will intercept the corresponding string according to the value of the groups array
public String group(int group) {
    if (first < 0)
        throw new IllegalStateException("No match found");
    if (group < 0 || group > groupCount())
        throw new IndexOutOfBoundsException("No group " + group);
    if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
        return null;
    return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
}

Let's look at the difference between group(0) group(1) group(2)

Applied to () groups, where a () represents a group

 String contents = "1998 December 8, 2008, the second generation Java Enterprise version of the platform J2EE release. In June 1999, Sun The company released the second generation Java Platform (referred to as Java2)3 versions of";
        Pattern pattern = Pattern.compile("(\\d\\d)(\\d\\d)");
        Matcher matcher = pattern.matcher(contents);//Matcher
        while(matcher.find()){
            System.out.println("Found:"+matcher.group(0));//The overall string index position is stored in the 0 and 1 subscripts in the groups array. Note that the end index position of the string at position 1 is + 1
            System.out.println("Find the first group(): "+matcher.group(1));//The first group is placed in the 2 and 3 subscripts in the groups array
            System.out.println("Find the second group(): "+matcher.group(2));//The second group is placed in the 4 and 5 subscripts in the groups array
            //If there is no error in the third group [- 1], it will be intercepted, but if there is no error in the third group, -
        }

Found: 1998
 Find the first group(): 19
 Find the second group(): 98
 Found: 1999
 Find the first group(): 19
 Find the second group(): 99

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-RnAhaUDS-1622336963824)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210525212404899. PNG)]

Basic grammar

Three common classes

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-Uvyj05Ut-1622336963832)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210527203405831. PNG)]

Full match pattern matches

boolean matches = Pattern.matches(regstr, content);//Full match
System.out.println(matches);

//Underlying source code
public static boolean matches(String regex, CharSequence input) {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);
    return m.matches();
}

Basic methods of Matcher class

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-2eQWC1kc-1622336963834)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210527210451759. PNG)]

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-vycw0DEs-1622336963836)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-202105272105531656. PNG)]

Grouping, capturing, backreferencing

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-qqc6yQZl-1622336963837)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052721123434. PNG)]

grouping

[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-0MjZjNyL-1622336963838)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052621121167. PNG)]

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG mjaslbaw-1622336963839) (C: \ users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210526212054045. PNG)]

//The two are equivalent. The second one is shorter and more economical. Note that the second one has nothing to do with the capture group matching and cannot be taken through group(1)
regstr = "abc1|abc2|abc3";
regstr = "abc(?:1|2|3)";

capture

Back reference

regstr = "(\\d)\\1";//Two consecutive numbers
regstr = "(\\d)(\\d)\\2\\1";//1221 2112

Metacharacter

Some characters need to be escaped. *+ ( ) $ / \ ? [] ^ {} these need to be added with \ \ transfer to match

qualifier

Used to specify how many consecutive occurrences of characters and combination items precede them

[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-JpH4oh2g-1622336963840)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526194755470. PNG)]

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-dISIlMq2-1622336963842)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526214019946. PNG)]

Select Match

When matching a string, there is selectivity. You can match this and that. You need to use the selection matching character | such as "ab|cd" to match AB or cd

[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-rC2YC5jM-1622336963843)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526195140912. PNG)]

String regstr = "a{3,4}";//You can aaa aaaa, but java is greedy matching. By default, it matches multiple aaaa
regstr = "1+";//Matching one or more 1s is also a greedy matching result 11111
regstr = "a1?";//Match a1 a, default a1

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-z1tIoggE-1622336963844)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052621443123. PNG)]

Grouping, combining and backreferencing

Special characters

[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-HicDpFXv-1622336963846)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526214509945. PNG)]

Character matching character

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-I7OEPyO3-1622336963847)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210525214220287. PNG)]

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-YgAsL2k9-1622336963848)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210525214329328. PNG)]

\\w Match single alphanumeric underscores; There is an error in the figure above
\\s Match space tabs
\\S Match any non whitespace characters
. Matching Division\n To match all characters except.Character required\\. Escape
String contents = "fsldfjsld09sdfABC";
String regstr1 = "[a-z]";//Any character between a-z
String regstr2 = "[A-Z]";//Any character between A-Z
String regstr3 = "abc";//Match abc default case sensitivity
String regstr4 = "(?i)abc";//Match abc case insensitive
String regstr5 = "[0-9]";//Any character between 0-9
Pattern pattern = Pattern.compile(regstr3,Pattern.CASE_INSENSITIVE);//Case insensitive
Matcher matcher = pattern.matcher(contents);//Matcher

Locator

Specify where the string to match appears, such as at the beginning or end of the string

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-QQrLGxXP-1622336963849)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526204410088. PNG)]

 content = "1a11111aaaa";
 regstr = "^[0-9]+[a-z]+$";//Start with at least one number and end with at least one lowercase letter The content on this match cannot match. It should be
        //End with a number followed by a lowercase letter
regstr = "^[0-9]+\\-[a-z]+$";//Intermediate connection

Application examples

  1. Matching Chinese characters
content = "I love learning.";
regstr = "^[\u0391-\uffe5]+$";
Pattern pattern = Pattern.compile(regstr);//pattern. compile
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    System.out.println("find:"+matcher.group());
}
find:I love learning.
  1. Postal Code

Is a six digit number starting from 0-9

content = "123456";
regstr = "^[0-9]\\d{5}$";
Pattern pattern = Pattern.compile(regstr);//pattern. compile
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    System.out.println("find:"+matcher.group());
}
content = "123456";
regstr = "^[0-9]\\d{5}$";//You can add $or not
Pattern pattern = Pattern.compile(regstr);//pattern. compile
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    System.out.println("find:"+matcher.group());
}
  1. qq number

A 5-10 digit number beginning with a 1-9

regstr = "^[0-9]\\d{4,9}$";
  1. phone number

Must be 11 digits starting with 13, 14, 15, 18

 regstr = "^1[3|4|5|8]\\d{9}$"; 
  1. url
content = "https://dtu.s.e/y/bi/my/0-9/index.html#39?fd=3e&sdf=23";
//Note: [.?] All characters in [] only represent their own meaning
regstr = "^((https|http)://)([\\w-]+\\.)+([\\w-])+(\\/[?.\\w-&#/=]*)?";
  1. Classic stutter program

I... I want to... Learn... Program java!

I want to learn programming java!

content = "I...i want...Learn to learn...programming java!";
Pattern pattern = Pattern.compile("\\.");//pattern. compile
Matcher matcher = pattern.matcher(content);
content = matcher.replaceAll("");//Remove all
content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1");//Back reference replaces the contents of () with the following repeated words
System.out.println(content);

Regular expressions are used in the String class

  1. Add jdk1.0 to the document 3 jdk1. 4 replace with JDK
content = "jdk1.3 jdk1.4";
String s = content.replaceAll("jdk1\\.3|jdk1\\.4", "jdk");
  1. Verify a mobile phone number, which must start with 138 or 139
content = "13888887777";
boolean matches = content.matches("1(38|39)\\d{8}");
System.out.println(matches);
  1. The split string is split according to # - number ~
content = "java-jkd#sm~fd1212fd";
String[] split = content.split("#|-|~|\\d+");
for (String a : split){
    System.out.println(a);
}

Exercises in this chapter

  1. Match email address

Only one @@ The former is the user name, which can be A-Z, A-Z, 0-9-_ Character@ Followed by the domain name, and the domain name can only be English letters, such as Sohu com tsingsf. org. cn

content = "sou@sou.com";
String regexx = "[\\w-]+@([a-zA-Z]+\\.)+[a-zA-Z]+";
System.out.println(content.matches(regexx));

content.matches() is a global match

The final call is matcher matches method in Java

  1. It is required to verify whether it is an integer or decimal

Slightly complicated...

content = "+34.232";
regstr = "^[-+]?([1-9]\\d*|0)(\\.\\d+)?$";

Start with the optional - + sign, 1-9 plus several numbers or only one 0; With dispensable End with at least one number

  1. Parse a url

agreement; Domain name; Port; file name

content = "http://www.sohu.com:8080/abc/index.html";
regstr = "^(http|https)://([\\w.]+):(\\d+)[\\w/]+/([\\w.]+)$";
Pattern pattern = Pattern.compile(regstr);//pattern. compile
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    System.out.println("find:"+matcher.group());
    System.out.println("Find agreement:"+matcher.group(1));
    System.out.println("Domain name found:"+matcher.group(2));
    System.out.println("Port found:"+matcher.group(3));
    System.out.println("File name found:"+matcher.group(4));
}

find:http://www.sohu.com:8080/abc/index.html
 Find agreement:http
 Domain name found:www.sohu.com
 Port found:8080
 File name found:index.html

Complete collection of java regular expressions

number

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-6HpqHIvP-1622336963850)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215557407. PNG)]

chinese characters

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-6zPFU9oa-1622336963851)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215628117. PNG)]

Special characters

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-C3vxSBxI-1622336963853)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215659594. PNG)]

[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-yjMXD3uP-1622336963854)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215725452. PNG)]

Keywords: Java regex

Added by Impact on Tue, 08 Feb 2022 02:40:10 +0200