4. Greed and non greed

1. Greed

Greedy matching: when regular expressions contain qualifiers that can accept repetition, the usual behavior is to match as many characters as possible (on the premise that the whole expression can be matched). This matching method is called greedy matching.
Features: read the whole string at one time for matching. Whenever there is no match, discard the rightmost character, continue to match, match and discard in turn (this matching discarding method is also called backtracking) until the matching is successful or the whole string is discarded. Therefore, it is a maximized data return.

We talked about repetition qualifiers earlier. In fact, these qualifiers are greedy quantifiers, such as expression:

1   \d{3,6}

It is used to match 3 to 6 digits. In this case, it is a greedy matching, that is, if there are 6 digits in the string that can be matched, it is all matched.
as

1   String reg="\\d{3,6}";        
2   String test="61762828 176 2991 871";
3   System.out.println("Text:"+test);
4   System.out.println("Greedy mode:"+reg);
5   Pattern p1 =Pattern.compile(reg);
6   Matcher m1 = p1.matcher(test);
7    while(m1.find()){
8      System.out.println("Matching result:"+m1.group(0));
9   }

Output result:

1   Text: 61762828 176 2991 44 871
2   Greedy mode:\d{3,6}
3   Matching result: 617628
4   Matching result: 176
5   Matching result: 2991
6   Matching result: 871

It can be seen from the results that the "61762828" segment in the original string actually only needs 3 (617) to match successfully, but it is not satisfied, but it matches the maximum matching characters, that is, 6.
A quantifier is so greedy,
Then someone will ask, if multiple greedy quantifiers are put together, how do they control their matching right?

In this way, when multiple greedy strings are together, if the string can meet their maximum matching, they will not interfere with each other. However, if they cannot meet, they will give priority to the satisfaction of the maximum number according to the principle of depth first, that is, each greedy quantifier from left to right, and the rest will be allocated to the next quantifier for matching.

1   String reg="(\\d{1,2})(\\d{3,4})";        
2   String test="61762828 176 2991 87321";
3   System.out.println("Text:"+test);
4   System.out.println("Greedy mode:"+reg);
5   Pattern p1 =Pattern.compile(reg);
6   Matcher m1 = p1.matcher(test);
7    while(m1.find()){
8      System.out.println("Matching result:"+m1.group(0));
9     }

Output result:

1   Text: 61762828 176 2991 87321
2   Greedy mode:(\d{1,2})(\d{3,4})
3   Matching result: 617628
4   Matching result: 2991
5   Matching result: 87321
  1. "617628" means that the previous \ d{1,2} matches 61 and the latter matches 7628
  2. "2991" means that the previous \ d{1,2} matches 29 and the latter matches 91
  3. "87321" means that the previous \ d{1,2} matches 87 and the latter matches 321

Author: Lao Liu
Link: https://www.zhihu.com/question/48219401/answer/742444326
Source: Zhihu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source.
 

2. Laziness (not greed)

Lazy matching: when regular expressions contain qualifiers that can accept repetition, the usual behavior is to match as few characters as possible (on the premise that the whole expression can be matched). This matching method is called lazy matching.
Features: match from left to right, starting from the leftmost side of the string. Each time you try not to read in the character matching, the matching is completed if the matching is successful. Otherwise, read in a character and then match. Follow this cycle (read in the character and match) until the matching is successful or the character of the string is matched.

Lazy quantifier is to add a "?" after greedy quantifier

 

1   String reg="(\\d{1,2}?)(\\d{3,4})";        
2        String test="61762828 176 2991 87321";
3        System.out.println("Text:"+test);
4        System.out.println("Greedy mode:"+reg);
5        Pattern p1 =Pattern.compile(reg);
6        Matcher m1 = p1.matcher(test);
7        while(m1.find()){
8            System.out.println("Matching result:"+m1.group(0));
9        }

Output result:

1   Text: 61762828 176 2991 87321
2   Greedy mode:(\d{1,2}?)(\d{3,4})
3   Matching result: 61762
4   Matching result: 2991
5   Matching result: 87321

answer:

"61762" is the lazy matching 6 on the left and the greedy matching 1762 on the right
"2991" is the lazy matching 2 on the left and the greedy matching 991 on the right
The laziness on the left of "87321" matches 8, and the greed on the right matches 7321

Reference link

https://www.zhihu.com/question/48219401/answer/742444326

 

Keywords: regex

Added by cdrees on Sun, 20 Feb 2022 03:18:12 +0200