Basic concepts
- Edit distance issue:
- It is difficult to edit the distance problem, but the solution is very beautiful, and it is also a rare practical algorithm
- Edit the distance usage scenario:
- For revising the misplaced content of the article The article can only be modified by 20 words, and supports adding, deleting and replacing operations to find the best scheme for modification
- For measuring the similarity of DNA DNA sequence is a sequence composed of a, G, C and T, which can be compared into a string. The similarity of two DNA sequences can be measured by editing distance. The smaller the editing distance, the more similar the two DNA sequences are
Train of thought analysis
- Edit distance issue:
- Given two strings s1 and s2, only three operations can be used to change s1 into s2 and find the minimum operand
- We need to ensure that the result is the same whether we change s1 into s2 or s2 into s1
- In the longest common subsequence, to solve the dynamic programming problem of two strings, we usually use two pointers I and j to point to the end of the two strings respectively, and then move forward step by step to reduce the scale of the problem
- Calculate edit distance:
- base case of algorithm:
- When s1 and s2 characters are the same In order to minimize the editing distance, no task operation is required. You can move it directly
- If j has finished s2,i has not finished s1 In order to minimize the editing distance, you can only use the delete operation to shorten s1 to s2
- If i has finished s1 and J has not finished s2, in order to minimize the editing distance, you can only use the insertion operation to increase s1 to s2
- base case of algorithm:
Problem solution
- Base case of the algorithm: when I finish s1 or j finishes s2, you can directly return the remaining length of another string
- For each pair s1[i] and s2[j], there are four operations:
if s1[i] == s2[j]: No operation,Skip directly skip i,j Move forward at the same time else: 1 out of 3 operation: insert-insert delete-delete replace-replace Copy code
- For the operation of 1 out of 3, you can use the recursive technique to try all three operations, and then select the result of which operation according to which operation has the smallest editing distance
def minDistance(s1, s2) -> int: # Returns the minimum edit distance of s1[0,..., i] and S2 [0,..., J] def dp(i, j): # base case if i == -1: return j + 1 if j == -1: return i + 1 if s1[i] == s2[j]: # No task operation is required # The minimum editing distance of s1[0,..., i] and S2 [0,..., J] is equal to the minimum editing distance of s1[0,..., i - 1] and S2 [0,..., J - 1] # That is, dp(i, j) is equal to dp(i - 1, j - 1) return dp(i - 1, j - 1) else: return min( # insert # Insert a character like s2[j] in s1[i] to match # Move the character of s2 forward by 1 bit and continue the comparison # Insert operation, operand plus 1 dp(i, j - 1) + 1 # delete # Delete this character from s1[i] to match # Move the character of s2 forward by 1 bit and continue the comparison # Delete operation, operand plus 1 dp(i - 1, j) + 1 # replace # Replace s1[i] with s2[j] to continue matching # At the same time, I and j move forward by 1 bit to continue the comparison # Replace operation, operand plus 1 dp(i - 1, j - 1) + 1 ) return dp(len(s1) - 1, len(s2) - 1) Copy code
- The solution has overlapping subproblems and needs to be optimized by dynamic programming algorithm
- How to determine whether there are overlapping subproblems?
- Abstract the algorithm framework of the solution
- Judge the different paths from the subproblem to the original problem
- Once there are repeated paths, there are a large number of repeated paths, that is, there are overlapping subproblems
Problem optimization
- There are two ways to use dynamic programming optimization for overlapping subproblems:
- memorandum
- DP Table
memorandum
def minDistance(s1, s2) -> int: # memorandum memo = dict() def dp(i, j): if (i, j) in memo: return memo[(i, j)] ... if s[i] == s[j]: memo[(i, j)] = ... else: memo[(i, j)] = ... return memo[(i, j)] return dp(len(s1) - 1, len(s2) - 1) Copy code
DP Table
- First, we need to understand the meaning of DP array. DP array is a two-dimensional array
- base case: dp[...][0] and dp[0] [...]
- Meaning of dp[i][j]:
- Def DP (I, J) - > int: returns the minimum editing distance of s1[0,...,i] and s2[0,...,j]
- dp[i - 1][j - 1]: store the minimum editing distance of s1[0,...,i] and S2 [0,..., J] Because the values of I and j of DP function are - 1, and the minimum index value of array is 0 So DP array will be offset by one bit
int minDistance(String s1, String s2) { int m = s1.length(), n = s2.length(); int[][] dp = new int[m][n]; // base case for (int i = 1; i <=m; i++) { dp[i][0] = i; } for (int j = 1; j <= n; j++) { dp[0][j] = j; } // Calculate DP Table from bottom to top for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1.isCharAt(i) == s2.isCharAt(j)) { dp[i][j] = dp[i - 1][j - 1]; } else { dp = min( // delete dp[i - 1][j] + 1; // replace dp[i - 1][j - 1] + 1; // insert dp[i][j - 1] + 1; ); } } } // Store the minimum editing distance of s1 and s2 return dp[m][n]; } int min(int a, int b, int c) { return Math.min(a, Math.min(b, c)); } Copy code
summary
- Handle the dynamic programming problem of two strings:
- Solve according to the algorithm of editing distance
- Create DP Table This makes it easier to find out the state transition relationship
- Because each dp[i][j] is only related to three nearby states, the spatial complexity can be compressed into O(min(m,n)), where m and N are the lengths of two strings, O(min(m,n)), where m and N are the lengths of two strings
- Additional information can be added to DP Table:
Node[][] dp; class Node { // Value of previous dp array int val; // Attribute operation int choice; } Copy code
- When making the best choice, record the operation, and then deduce the specific operation from the result
- The final result is dp[m][n], where val has the minimum editing distance and choice has the last operation
- By repeating this process, you can go back to the starting point dp[0][0] step by step to form a path, and edit the string according to this path, which is the minimum editing distance
- Java implementation of minimum editing distance
- Minimum editing distance C + + implementation:
class DynamicProgramming { public: int minDistance(String s1, String s2) { int m = s1.size(), n = s2.size(); vector<vector<int>> dp(m + 1, vector<int>(n + 1)); // basecase // When the string s2 is empty, the string s1 needs to delete all characters to be the same as s2 for (int i = 1; i <= m; i++) { dp[i][0] = i; } // When the string s1 is empty, the string s2 needs to delete all characters to be the same as s1 for (int j = 1; j <= n; j++) { dp[0][j] = j; } // Bottom up solution for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1[i - 1] == s2[j - 1]) { // The current characters of the two strings are equal dp[i][j] = dp[i - 1][j - 1]; } else { // The current characters of the two strings are different dp[i][j] = min({ // Delete s1[i] this character dp[i - 1][j] + 1, // Add the same character as s2[j] after s1[i] dp[i][j - 1] + 1, // Replace the character of s1[i] with the character of s2[j] dp[i - 1][j - 1] + 1 }); } } } // Store the minimum editing distance of the whole s1 and s2 return dp[m][n]; } };