regular expression
quick get start
How to find out all the English letters in the following paragraph?
Due to development Oak Language, there is no hardware platform to run bytecode, so in order to carry out experimental research on this language during development, they built a running platform with software based on the existing hardware and software platform and according to their own specified specifications. The whole system is not only better than C++There is no big difference beyond being simpler.
//Judge whether the letters are too cumbersome and complex by the range of the letter ASCll code char a = 'a'; int aint = (int)a;
//Using regular expressions public static void main(String[] args) { String content = "Due to development Oak Language, there is no hardware platform to run bytecode," + "Therefore, in order to conduct experimental research on this language during development, they are based on the existing hardware and software" + "Based on the software platform and according to the specifications specified by ourselves, an operation platform is built with software," + "The whole system except C++There is no big difference beyond being simpler."; Pattern pattern = Pattern.compile("[a-zA-Z]+");//Pattern object regular expression object matching letters //Pattern pattern = Pattern.compile("[0-9]+"); // Match number //Pattern pattern = Pattern.compile("([a-zA-Z]+)|([0-9]+)");// Numbers and letters Matcher matcher = pattern.matcher(content);//Matcher while(matcher.find()){ System.out.println("Found:"+matcher.group(0)); } } Found: Oak Found: C
If you want to find Baidu hot search?
<a target="_blank" title="Gansu cross country race accident set up a joint investigation team" href="/s?rsv_idx
Pattern pattern = Pattern.compile("<a target=\"_blank\" title=\"(\\S*)\""); Matcher matcher = pattern.matcher(content); while(matcher.find()){ System.out.println("Found:"+matcher.group(1)); }
What about the ip address?
A Class 10.0.0.0--10.255.255.255 B Class 172.16.0.0--172.31.255.255 C Class 192.168.0.0--192.168.255.255
Pattern pattern = Pattern.compile("\\d+\\.\\d+\\.\\d+\\.\\d+");//ip address Matcher matcher = pattern.matcher(content);//Matcher while(matcher.find()){ System.out.println("Found:"+matcher.group(0)); } Found: 10.0.0.0 Found: 10.255.255.255 Found: 172.16.0.0 Found: 172.31.255.255 Found: 192.168.0.0 Found: 192.168.255.255
Analyze the matcher Find() and matcher Underlying logic of group (0)
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-7jucBlos-1622336963819)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210525210519094. PNG)]
//Locate the substring that meets the rules according to the specified rules, and record the start index position and end index position + 1 of the substring that meets the conditions to int[] groups; Property, // groups[0]=xxx,groups[1]=xxx, and record oldLast as the end index position + 1, representing when to start the next matching
//matcher. The source code of the group (0) method will intercept the corresponding string according to the value of the groups array public String group(int group) { if (first < 0) throw new IllegalStateException("No match found"); if (group < 0 || group > groupCount()) throw new IndexOutOfBoundsException("No group " + group); if ((groups[group*2] == -1) || (groups[group*2+1] == -1)) return null; return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString(); }
Let's look at the difference between group(0) group(1) group(2)
Applied to () groups, where a () represents a group
String contents = "1998 December 8, 2008, the second generation Java Enterprise version of the platform J2EE release. In June 1999, Sun The company released the second generation Java Platform (referred to as Java2)3 versions of"; Pattern pattern = Pattern.compile("(\\d\\d)(\\d\\d)"); Matcher matcher = pattern.matcher(contents);//Matcher while(matcher.find()){ System.out.println("Found:"+matcher.group(0));//The overall string index position is stored in the 0 and 1 subscripts in the groups array. Note that the end index position of the string at position 1 is + 1 System.out.println("Find the first group(): "+matcher.group(1));//The first group is placed in the 2 and 3 subscripts in the groups array System.out.println("Find the second group(): "+matcher.group(2));//The second group is placed in the 4 and 5 subscripts in the groups array //If there is no error in the third group [- 1], it will be intercepted, but if there is no error in the third group, - } Found: 1998 Find the first group(): 19 Find the second group(): 98 Found: 1999 Find the first group(): 19 Find the second group(): 99
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-RnAhaUDS-1622336963824)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210525212404899. PNG)]
Basic grammar
Three common classes
[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-Uvyj05Ut-1622336963832)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210527203405831. PNG)]
Full match pattern matches
boolean matches = Pattern.matches(regstr, content);//Full match System.out.println(matches); //Underlying source code public static boolean matches(String regex, CharSequence input) { Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); return m.matches(); }
Basic methods of Matcher class
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-2eQWC1kc-1622336963834)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210527210451759. PNG)]
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-vycw0DEs-1622336963836)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-202105272105531656. PNG)]
Grouping, capturing, backreferencing
[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-qqc6yQZl-1622336963837)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052721123434. PNG)]
grouping
[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-0MjZjNyL-1622336963838)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052621121167. PNG)]
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG mjaslbaw-1622336963839) (C: \ users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210526212054045. PNG)]
//The two are equivalent. The second one is shorter and more economical. Note that the second one has nothing to do with the capture group matching and cannot be taken through group(1) regstr = "abc1|abc2|abc3"; regstr = "abc(?:1|2|3)";
capture
Back reference
regstr = "(\\d)\\1";//Two consecutive numbers regstr = "(\\d)(\\d)\\2\\1";//1221 2112
Metacharacter
Some characters need to be escaped. *+ ( ) $ / \ ? [] ^ {} these need to be added with \ \ transfer to match
qualifier
Used to specify how many consecutive occurrences of characters and combination items precede them
[the external chain image transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-JpH4oh2g-1622336963840)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526194755470. PNG)]
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-dISIlMq2-1622336963842)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526214019946. PNG)]
Select Match
When matching a string, there is selectivity. You can match this and that. You need to use the selection matching character | such as "ab|cd" to match AB or cd
[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-rC2YC5jM-1622336963843)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526195140912. PNG)]
String regstr = "a{3,4}";//You can aaa aaaa, but java is greedy matching. By default, it matches multiple aaaa regstr = "1+";//Matching one or more 1s is also a greedy matching result 11111 regstr = "a1?";//Match a1 a, default a1
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-z1tIoggE-1622336963844)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-2021052621443123. PNG)]
Grouping, combining and backreferencing
Special characters
[the external chain image transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-HicDpFXv-1622336963846)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526214509945. PNG)]
Character matching character
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-I7OEPyO3-1622336963847)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora \ typora user images \ image-20210525214220287. PNG)]
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-YgAsL2k9-1622336963848)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210525214329328. PNG)]
\\w Match single alphanumeric underscores; There is an error in the figure above \\s Match space tabs \\S Match any non whitespace characters . Matching Division\n To match all characters except.Character required\\. Escape String contents = "fsldfjsld09sdfABC"; String regstr1 = "[a-z]";//Any character between a-z String regstr2 = "[A-Z]";//Any character between A-Z String regstr3 = "abc";//Match abc default case sensitivity String regstr4 = "(?i)abc";//Match abc case insensitive String regstr5 = "[0-9]";//Any character between 0-9 Pattern pattern = Pattern.compile(regstr3,Pattern.CASE_INSENSITIVE);//Case insensitive Matcher matcher = pattern.matcher(contents);//Matcher
Locator
Specify where the string to match appears, such as at the beginning or end of the string
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-QQrLGxXP-1622336963849)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210526204410088. PNG)]
content = "1a11111aaaa"; regstr = "^[0-9]+[a-z]+$";//Start with at least one number and end with at least one lowercase letter The content on this match cannot match. It should be //End with a number followed by a lowercase letter regstr = "^[0-9]+\\-[a-z]+$";//Intermediate connection
Application examples
- Matching Chinese characters
content = "I love learning."; regstr = "^[\u0391-\uffe5]+$"; Pattern pattern = Pattern.compile(regstr);//pattern. compile Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("find:"+matcher.group()); } find:I love learning.
- Postal Code
Is a six digit number starting from 0-9
content = "123456"; regstr = "^[0-9]\\d{5}$"; Pattern pattern = Pattern.compile(regstr);//pattern. compile Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("find:"+matcher.group()); }
content = "123456"; regstr = "^[0-9]\\d{5}$";//You can add $or not Pattern pattern = Pattern.compile(regstr);//pattern. compile Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("find:"+matcher.group()); }
- qq number
A 5-10 digit number beginning with a 1-9
regstr = "^[0-9]\\d{4,9}$";
- phone number
Must be 11 digits starting with 13, 14, 15, 18
regstr = "^1[3|4|5|8]\\d{9}$";
- url
content = "https://dtu.s.e/y/bi/my/0-9/index.html#39?fd=3e&sdf=23"; //Note: [.?] All characters in [] only represent their own meaning regstr = "^((https|http)://)([\\w-]+\\.)+([\\w-])+(\\/[?.\\w-&#/=]*)?";
- Classic stutter program
I... I want to... Learn... Program java!
I want to learn programming java!
content = "I...i want...Learn to learn...programming java!"; Pattern pattern = Pattern.compile("\\.");//pattern. compile Matcher matcher = pattern.matcher(content); content = matcher.replaceAll("");//Remove all content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1");//Back reference replaces the contents of () with the following repeated words System.out.println(content);
Regular expressions are used in the String class
- Add jdk1.0 to the document 3 jdk1. 4 replace with JDK
content = "jdk1.3 jdk1.4"; String s = content.replaceAll("jdk1\\.3|jdk1\\.4", "jdk");
- Verify a mobile phone number, which must start with 138 or 139
content = "13888887777"; boolean matches = content.matches("1(38|39)\\d{8}"); System.out.println(matches);
- The split string is split according to # - number ~
content = "java-jkd#sm~fd1212fd"; String[] split = content.split("#|-|~|\\d+"); for (String a : split){ System.out.println(a); }
Exercises in this chapter
- Match email address
Only one @@ The former is the user name, which can be A-Z, A-Z, 0-9-_ Character@ Followed by the domain name, and the domain name can only be English letters, such as Sohu com tsingsf. org. cn
content = "sou@sou.com"; String regexx = "[\\w-]+@([a-zA-Z]+\\.)+[a-zA-Z]+"; System.out.println(content.matches(regexx));
content.matches() is a global match
The final call is matcher matches method in Java
- It is required to verify whether it is an integer or decimal
Slightly complicated...
content = "+34.232"; regstr = "^[-+]?([1-9]\\d*|0)(\\.\\d+)?$";
Start with the optional - + sign, 1-9 plus several numbers or only one 0; With dispensable End with at least one number
- Parse a url
agreement; Domain name; Port; file name
content = "http://www.sohu.com:8080/abc/index.html"; regstr = "^(http|https)://([\\w.]+):(\\d+)[\\w/]+/([\\w.]+)$"; Pattern pattern = Pattern.compile(regstr);//pattern. compile Matcher matcher = pattern.matcher(content); while (matcher.find()){ System.out.println("find:"+matcher.group()); System.out.println("Find agreement:"+matcher.group(1)); System.out.println("Domain name found:"+matcher.group(2)); System.out.println("Port found:"+matcher.group(3)); System.out.println("File name found:"+matcher.group(4)); } find:http://www.sohu.com:8080/abc/index.html Find agreement:http Domain name found:www.sohu.com Port found:8080 File name found:index.html
Complete collection of java regular expressions
number
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-6HpqHIvP-1622336963850)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215557407. PNG)]
chinese characters
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-6zPFU9oa-1622336963851)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215628117. PNG)]
Special characters
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-C3vxSBxI-1622336963853)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215659594. PNG)]
[the external chain image transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-yjMXD3uP-1622336963854)(C:\Users \ Zhou Bin \ appdata \ roaming \ typora user images \ image-20210529215725452. PNG)]