Detailed analysis of classical dynamic programming problem: Principle Analysis and solution implementation of shortest editing distance algorithm

Basic concepts

  • Edit distance issue:
    • It is difficult to edit the distance problem, but the solution is very beautiful, and it is also a rare practical algorithm
  • Edit the distance usage scenario:
    • For revising the misplaced content of the article The article can only be modified by 20 words, and supports adding, deleting and replacing operations to find the best scheme for modification
    • For measuring the similarity of DNA DNA sequence is a sequence composed of a, G, C and T, which can be compared into a string. The similarity of two DNA sequences can be measured by editing distance. The smaller the editing distance, the more similar the two DNA sequences are

Train of thought analysis

  • Edit distance issue:
    • Given two strings s1 and s2, only three operations can be used to change s1 into s2 and find the minimum operand
    • We need to ensure that the result is the same whether we change s1 into s2 or s2 into s1
  • In the longest common subsequence, to solve the dynamic programming problem of two strings, we usually use two pointers I and j to point to the end of the two strings respectively, and then move forward step by step to reduce the scale of the problem
  • Calculate edit distance:
    • base case of algorithm:
      • When s1 and s2 characters are the same In order to minimize the editing distance, no task operation is required. You can move it directly
      • If j has finished s2,i has not finished s1 In order to minimize the editing distance, you can only use the delete operation to shorten s1 to s2
      • If i has finished s1 and J has not finished s2, in order to minimize the editing distance, you can only use the insertion operation to increase s1 to s2

Problem solution

  • Base case of the algorithm: when I finish s1 or j finishes s2, you can directly return the remaining length of another string
  • For each pair s1[i] and s2[j], there are four operations:
if s1[i] == s2[j]:
	No operation,Skip directly skip
	i,j Move forward at the same time
else:
	1 out of 3 operation:
		insert-insert
		delete-delete
		replace-replace
 Copy code
  • For the operation of 1 out of 3, you can use the recursive technique to try all three operations, and then select the result of which operation according to which operation has the smallest editing distance
def minDistance(s1, s2) -> int:

	# Returns the minimum edit distance of s1[0,..., i] and S2 [0,..., J]
	def dp(i, j):
		# base case
		if i == -1: return j + 1 
		if j == -1: return i + 1

		if s1[i] == s2[j]:
			# No task operation is required
				# The minimum editing distance of s1[0,..., i] and S2 [0,..., J] is equal to the minimum editing distance of s1[0,..., i - 1] and S2 [0,..., J - 1]
				# That is, dp(i, j) is equal to dp(i - 1, j - 1) 
			return dp(i - 1, j - 1)
		else:
			return min(
				# insert
					# Insert a character like s2[j] in s1[i] to match
					# Move the character of s2 forward by 1 bit and continue the comparison
					# Insert operation, operand plus 1
				dp(i, j - 1) + 1
				# delete
					# Delete this character from s1[i] to match
					# Move the character of s2 forward by 1 bit and continue the comparison
					# Delete operation, operand plus 1
				dp(i - 1, j) + 1
				# replace
					# Replace s1[i] with s2[j] to continue matching
					# At the same time, I and j move forward by 1 bit to continue the comparison
					# Replace operation, operand plus 1
				dp(i - 1, j - 1) + 1
			)
	return dp(len(s1) - 1, len(s2) - 1)
Copy code
  • The solution has overlapping subproblems and needs to be optimized by dynamic programming algorithm
  • How to determine whether there are overlapping subproblems?
    • Abstract the algorithm framework of the solution
    • Judge the different paths from the subproblem to the original problem
    • Once there are repeated paths, there are a large number of repeated paths, that is, there are overlapping subproblems

Problem optimization

  • There are two ways to use dynamic programming optimization for overlapping subproblems:
    • memorandum
    • DP Table

memorandum

def minDistance(s1, s2) -> int:

# memorandum
memo = dict()
def dp(i, j):
	if (i, j) in memo:
		return memo[(i, j)]
	...
	if s[i] == s[j]:
		memo[(i, j)] = ...
	else:
		memo[(i, j)] = ...
	return memo[(i, j)]
return dp(len(s1) - 1, len(s2) - 1)
Copy code

DP Table

  • First, we need to understand the meaning of DP array. DP array is a two-dimensional array
    • base case: dp[...][0] and dp[0] [...]
    • Meaning of dp[i][j]:
      • Def DP (I, J) - > int: returns the minimum editing distance of s1[0,...,i] and s2[0,...,j]
      • dp[i - 1][j - 1]: store the minimum editing distance of s1[0,...,i] and S2 [0,..., J] Because the values of I and j of DP function are - 1, and the minimum index value of array is 0 So DP array will be offset by one bit
int minDistance(String s1, String s2) {
	int m = s1.length(), n = s2.length();
	int[][] dp = new int[m][n];
	// base case
	for (int i = 1; i <=m; i++) {
		dp[i][0] = i;
	}
	for (int j = 1; j <= n; j++) {
		dp[0][j] = j;
	}
	// Calculate DP Table from bottom to top
	for (int i = 1; i <= m; i++) {
		for (int j = 1; j <= n; j++) {
			if (s1.isCharAt(i) == s2.isCharAt(j)) {
				dp[i][j] = dp[i - 1][j - 1];
			} else {
				dp = min(
					// delete
					dp[i - 1][j] + 1;
					// replace
					dp[i - 1][j - 1] + 1;
					// insert
					dp[i][j - 1] + 1;
				);
			}
		}
	}
	// Store the minimum editing distance of s1 and s2
	return dp[m][n];
}

int min(int a, int b, int c) {
	return Math.min(a, Math.min(b, c));
}
Copy code

summary

  • Handle the dynamic programming problem of two strings:
    • Solve according to the algorithm of editing distance
    • Create DP Table This makes it easier to find out the state transition relationship
  • Because each dp[i][j] is only related to three nearby states, the spatial complexity can be compressed into O(min(m,n)), where m and N are the lengths of two strings, O(min(m,n)), where m and N are the lengths of two strings
  • Additional information can be added to DP Table:
Node[][] dp;

class Node {
	// Value of previous dp array
	int val;
	// Attribute operation
	int choice;
}
Copy code
  • When making the best choice, record the operation, and then deduce the specific operation from the result
  • The final result is dp[m][n], where val has the minimum editing distance and choice has the last operation
  • By repeating this process, you can go back to the starting point dp[0][0] step by step to form a path, and edit the string according to this path, which is the minimum editing distance
  • Java implementation of minimum editing distance
  • Minimum editing distance C + + implementation:
class DynamicProgramming {
	public:
		int minDistance(String s1, String s2) {
			int m = s1.size(), n = s2.size();
			vector<vector<int>> dp(m + 1, vector<int>(n + 1));
			// basecase
			// When the string s2 is empty, the string s1 needs to delete all characters to be the same as s2
			for (int i = 1; i <= m; i++) {
				dp[i][0] = i;
			}
			// When the string s1 is empty, the string s2 needs to delete all characters to be the same as s1
			for (int j = 1; j <= n; j++) {
				dp[0][j] = j;
			}
			// Bottom up solution
			for (int i = 1; i <= m; i++) {
				for (int j = 1; j <= n; j++) {
					if (s1[i - 1] == s2[j - 1]) {
						// The current characters of the two strings are equal
						dp[i][j] = dp[i - 1][j - 1];
					} else {
						// The current characters of the two strings are different
						dp[i][j] = min({
							// Delete s1[i] this character
							dp[i - 1][j] + 1,
							// Add the same character as s2[j] after s1[i]
							dp[i][j - 1] + 1,
							// Replace the character of s1[i] with the character of s2[j]
							dp[i - 1][j - 1] + 1
							});
					}
				}
			}
			// Store the minimum editing distance of the whole s1 and s2
			return dp[m][n];
		}
};

 

Keywords: Algorithm data structure Back-end Programmer Dynamic Programming

Added by sw9 on Thu, 20 Jan 2022 20:18:55 +0200