After reflecting on the problems in the first two articles, I found that the knowledge points should be concise and concise
Common nouns in strings
Empty string: string without any characters, string length = 0
Space string: a string consisting of only one or more spaces
Substring: a substring consisting of any consecutive characters in a string.
Main string: a string containing substrings. For example, a = 'Shenzhen University' B = 'University' A is the main string and B is the sub string
Position: the sequence number of the character in the main string. The position of the substring in the main string is represented by the position of the first character of the substring in the main string.
String equality condition: two strings are equal only when their lengths are equal and the characters in each corresponding position are equal.
Pattern matching: the operation to determine the position of the substring in the main string for the first time
In the basic operation of string, the "whole string" is usually used as the operation object, such as finding a substring in the string, inserting a substring at a certain position of the string, etc. Rarely use characters as operating units (otherwise it is no different from linear tables)
Minimum operation set:
These operations cannot be implemented by other string operations. On the contrary, other string operations (except string clear string and string destroy DestroyString) can be implemented on this minimum subset of operations.
How to understand this minimum operation set?
For example, the function to judge whether the string is an empty string and the function to judge the number (length) of characters of the string can be implemented with a slight modification of the same function. For example, I only need the string length. When the length = 0, it is empty. In essence, these two functions calculate the string length, and calculating the string length is the minimum operation of these two operations.
The five functions on the right are #include < CString >. The five functions on the left are our own functions. With these ten functions, we can complete the string operation.
This is a static representation of a string
On the dynamic representation of strings
The first is the concept of heap.
About finding substrings
int Index(SString S, SString T, int pos) { // Returns the position of the substring T after the pos character in the main string S. If not, // The function value is - 1. Where, T is not empty, 0 ≤ pos ≤ S.length-1). i = pos; j = 0; while (i <= s.length-1 && j <= t.length-1) { if (S.ch[i] == T.ch[j]) { ++i; ++j; } // Continue comparing subsequent characters else { i = i-j+1; j = 0; } // The pointer moves back to start matching again } if (j ==t.length) return i-t.length; else return -1; } // Index`
The simple algorithm has the advantages of simplicity and directness, but has the disadvantages of high time complexity [min m+n, Max m*n]
Improved algorithm: KMP algorithm -- solving the redundancy comparison of naive algorithm
Preceding: next array
What is the next array? How? Let's talk about it in another article. We'll go straight to the program to review the computer test
In essence, the next array has two versions, one is the pattern string starting from j==1, and the other is the pattern string starting from j==0. The above starts from 1, and starting from zero, you only need to set all the above results to - 1.
After calculating the next array, we can really write the KMP algorithm. Before the KMP algorithm, please look at the simple algorithm. Compare the two algorithms:
#include<iostream> #include<string> using namespace std; //Starting from the pos position, returns the position of the substring in the main string int BF(string M,string T,int pos) { int i=pos; int j=0; int Mlen=M.length();//Range of main string [0,Slen) int Tlen=T.length();//Range of substring [0,Tlen) if(pos<0 && pos>Mlen) return -1; while(i<Mlen && j<Tlen) { if(M[i]==T[j]) { ++i; ++j; } else { i=i-j+1; j=0; } } if(j>=Tlen) return i-Tlen; else return -1; } int main() { string M="abcdefabcdx"; string T="abcdx"; cout<<BF(M,T,1)<<endl; return 0; }
KMP algorithm is improved from naive algorithm:
#include<iostream> #include<string> using namespace std; //Returns the next array of substring T. Note that arrays are passed pointers as formal parameters int getnext(string T,int *next,int Tlong); int KMP(string M,string T,int pos) { int i=pos; int j=0; int Mlen=M.size();//Range of main string [0,Slen) int Tlen=T.length();//Range of substring [0,Tlen) int next[255];//Define next array new getnext(T,next,Tlen);//new if(pos<0 && pos>Mlen) return -1; while(i<Mlen && j<Tlen) { if(M[i]==T[j]) { ++i; ++j; } else { j = next[j];//new } } if(j>=Tlen) return i-Tlen; else return -1; } int main() { string M="abcdefabcdx"; string T="abcdx"; cout<<KMP(M,T,1)<<endl; return 0; } int getnext(string T,int *next,int Tlong){ int i,k; i=1; k=0; next[1]=0; while(i<Tlong){//T[0] is the length of string t if(k==0||T[i]==T[k]) { ++i; ++k; next[i] = k; } else k=next[k]; } }
But there's a problem.
The following is the verified version:
#include<iostream> #include<string> using namespace std; int *getnext(string T); int KMP(string M,string T) { int i,j; int *next = getnext(T); for(i=0,j=0;i<M.size() && j<(int)T.size();) { if(j==-1||M[i]==T[j]) { ++i; ++j; } else { j = next[j];//new } } delete []next; if(j==(int)T.size()) return i-j; else return -1; } int main() { string M="abcdefabcdx"; string T="abcdx"; cout<<KMP(M,T)<<endl; return 0; } int *getnext(string T){ int j,k; int *next = new int [T.size()]; j=0,k=-1; next[j] = k; while(j<T.size()){ if(k==-1||T[j]==T[k]) next[++j] = ++k; else k = next[k]; } return next; }
KMP algorithm is improved first.