Custom regular expression matching algorithm for Java

preface

Customize the regular expression rules, and then complete the implementation of the matching algorithm.

1, Title

Please implement a function to match regular expressions containing '.' and '*'. The character '.' in the mode represents any character, and '/ *' represents that the character before it can appear any time (including 0 times). In this question, matching means that all characters of the string match the whole pattern. For example, the string "aaa" matches the patterns "a.a" and "ab*ac*a", but not "aa.a" and "ab*a".

Example 1:
Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" cannot match "aa" the entire string.

Example 2:
Input:
s = "aa"
p = "a*"
Output: true
Explanation: because '*' means that zero or more preceding elements can be matched, the preceding element here is' a '. Therefore, the string "aa" can be regarded as' a 'repeated once.

Example 3:
Input:
s = "ab"
p = ".*"
Output: true
Explanation: ". *" means that it can match zero or more ('*') arbitrary characters ('.').

Example 4:
Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: because '*' means zero or more, here 'c' is 0, and 'a' is repeated once. Therefore, you can match the string "aab".

Example 5:
Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

2, Matching algorithm

Match the four cases. Write each case well.
1) Case 1: recursive Exit 1, rule end, expression not end, return false.
2) Case 2: recursive exit 2, end of rule, end of expression, return true.
3) Case 3: recursive exit 3. The rule is not at the end of the expression. At this time, the rule must be marked with *.
4) Case 4: formal recursion 1. The rule is not at the end and the expression is not at the end. At this time, the for loop recursion should be based on whether there is' * 'or simple recursion without' * '.

package com.xhu.offer;

public class DFSForRE {
    private String s;
    private String p;

    public boolean isMatch(String s, String p) {
        if (".*".equals(p) || "".equals(s) && "".equals(p)) return true;
        this.s = s;
        this.p = p;

        /**
         * Idea: recursion
         * Compare whether the corresponding positions match. When encountering s *, you need for loop recursive matching, which is equivalent to dfs
         */
        return dfs(0, 0);
    }

    private boolean dfs(int i, int j) {
        //Case 1: recursive Exit 1, rule end, expression not end, return false
        if (j >= p.length() && i < s.length()) {
            return false;
        }
        //Case 2: recursive exit 2, end of rule, end of expression, return true
        if (j >= p.length() && i >= s.length()) {
            return true;
        }
        //Case 3: recursive exit 3. The rule is not at the end of the expression. At this time, the rule must be marked with *
        if (j < p.length() && i >= s.length()) {
            for (int n = j; n < p.length(); ) {
                if (n + 1 >= p.length() || p.charAt(n + 1) != '*')
                    return false;
                n+=2;
            }
            return true;
        }
        //Case 4: formal recursion. The rule is not at the end and the expression is not at the end. At this time, the for loop recursion should be based on whether there is' * 'or simple recursion without' * '
        if (j + 1 < p.length()) {
            char ch = p.charAt(j + 1);
            if (ch != '*') {
                //Point or character matching
                char ch1 = s.charAt(i);
                char ch2 = p.charAt(j);
                if (ch2 == '.' || ch1 == ch2)
                    return dfs(i + 1, j + 1);
                else return false;
            } else {
                for (int n = -1; n < s.length() - i; n++) {
                    boolean flag = false;
                    //Don't compare
                    if (n == -1) {
                        flag = dfs(i, j + 2);
                    } else if (p.charAt(j) == '.' || s.charAt(i + n) == p.charAt(j)) {
                        //Relatively successful, carry at the same time
                        flag = dfs(i + n + 1, j + 2);
                    }else{
                        return false;
                    }
                    //If you succeed, return. There's no need to cycle.
                    if (flag)
                        return true;
                }
                //Those with '*' can match everything, that is, the following characters in s are the same. However, the character rule still needs characters to match. According to the exit, false is returned.
                return false;
            }
        }
        //j+1 is out of bounds. Just see if i and j are equal
        else if (p.charAt(j) == '.' || s.charAt(i) == p.charAt(j))
            return dfs(i + 1, j + 1);
        //j is the last bit and i does not match, so false is returned
        else return false;
    }

}

summary

1) Calm down and carefully analyze the problems, and complete the disassembly and classification of the problems.
2) Recursion and DFS ideas are important.

reference

[1] Original title of Leetcode

Keywords: Java Algorithm regex recursion dfs

Added by mbaroz on Wed, 01 Dec 2021 10:55:44 +0200