2021-12-21 data structure final review computer test 4 series

After reflecting on the problems in the first two articles, I found that the knowledge points should be concise and concise

Common nouns in strings

Empty string: string without any characters, string length = 0

Space string: a string consisting of only one or more spaces

Substring: a substring consisting of any consecutive characters in a string.

Main string: a string containing substrings. For example, a = 'Shenzhen University' B = 'University' A is the main string and B is the sub string

Position: the sequence number of the character in the main string. The position of the substring in the main string is represented by the position of the first character of the substring in the main string.

String equality condition: two strings are equal only when their lengths are equal and the characters in each corresponding position are equal.

Pattern matching: the operation to determine the position of the substring in the main string for the first time

In the basic operation of string, the "whole string" is usually used as the operation object, such as finding a substring in the string, inserting a substring at a certain position of the string, etc. Rarely use characters as operating units (otherwise it is no different from linear tables)

Minimum operation set:

These operations cannot be implemented by other string operations. On the contrary, other string operations (except string clear string and string destroy DestroyString) can be implemented on this minimum subset of operations.

How to understand this minimum operation set?

For example, the function to judge whether the string is an empty string and the function to judge the number (length) of characters of the string can be implemented with a slight modification of the same function. For example, I only need the string length. When the length = 0, it is empty. In essence, these two functions calculate the string length, and calculating the string length is the minimum operation of these two operations.

 

The five functions on the right are #include < CString >. The five functions on the left are our own functions. With these ten functions, we can complete the string operation.

This is a static representation of a string

On the dynamic representation of strings

The first is the concept of heap.

About finding substrings

int Index(SString S, SString T, int pos) {
   // Returns the position of the substring T after the pos character in the main string S. If not,
    // The function value is - 1. Where, T is not empty, 0 ≤ pos ≤ S.length-1).
    i = pos;   j = 0;
    while (i <= s.length-1 && j <= t.length-1) {
      if (S.ch[i] == T.ch[j]) { ++i;  ++j; }   // Continue comparing subsequent characters
      else { i = i-j+1;   j = 0; }     // The pointer moves back to start matching again
    }
   if (j ==t.length)  return  i-t.length;
   else return -1;
} // Index`                 

The simple algorithm has the advantages of simplicity and directness, but has the disadvantages of high time complexity [min m+n, Max m*n]

Improved algorithm: KMP algorithm -- solving the redundancy comparison of naive algorithm

Preceding: next array

What is the next array? How? Let's talk about it in another article. We'll go straight to the program to review the computer test

In essence, the next array has two versions, one is the pattern string starting from j==1, and the other is the pattern string starting from j==0. The above starts from 1, and starting from zero, you only need to set all the above results to - 1.

After calculating the next array, we can really write the KMP algorithm. Before the KMP algorithm, please look at the simple algorithm. Compare the two algorithms:

#include<iostream>
#include<string>
using namespace std;
//Starting from the pos position, returns the position of the substring in the main string
int BF(string M,string T,int pos)
{
	int i=pos;
	int j=0;
	int Mlen=M.length();//Range of main string [0,Slen)
	int Tlen=T.length();//Range of substring [0,Tlen)
 
	if(pos<0 && pos>Mlen)
		return -1;
 
	while(i<Mlen && j<Tlen)
	{
		if(M[i]==T[j])
		{
			++i;
			++j;
		}
		else
		{
			i=i-j+1;
			j=0;
		}
	}
	if(j>=Tlen)
		return i-Tlen;
	else 
		return -1;
}
int main()
{
	string M="abcdefabcdx";
	string T="abcdx";
	cout<<BF(M,T,1)<<endl;
	return 0;
}

KMP algorithm is improved from naive algorithm:

#include<iostream>
#include<string>
using namespace std;
//Returns the next array of substring T. Note that arrays are passed pointers as formal parameters
int getnext(string T,int *next,int Tlong);

int KMP(string M,string T,int pos)
{
    int i=pos;
    int j=0;
    int Mlen=M.size();//Range of main string [0,Slen)
    int Tlen=T.length();//Range of substring [0,Tlen)


    int next[255];//Define next array new
    getnext(T,next,Tlen);//new



    if(pos<0 && pos>Mlen)
        return -1;

    while(i<Mlen && j<Tlen)
    {
        if(M[i]==T[j])
        {
            ++i;
            ++j;
        }
        else
        {
            j = next[j];//new
        }
    }
    if(j>=Tlen)
        return i-Tlen;
    else
        return -1;
}
int main()
{
    string M="abcdefabcdx";
    string T="abcdx";
    cout<<KMP(M,T,1)<<endl;
    return 0;
}

int getnext(string T,int *next,int Tlong){
    int i,k;
    i=1;
    k=0;
    next[1]=0;
    while(i<Tlong){//T[0] is the length of string t
        if(k==0||T[i]==T[k])
        {
            ++i;
            ++k;
            next[i] = k;
        }
        else
            k=next[k];
    }
}

But there's a problem.

The following is the verified version:

#include<iostream>
#include<string>
using namespace std;

int *getnext(string T);

int KMP(string M,string T)
{
   int i,j;
    int *next = getnext(T);



    for(i=0,j=0;i<M.size() && j<(int)T.size();)
    {
        if(j==-1||M[i]==T[j])
        {
            ++i;
            ++j;
        }
        else
        {
            j = next[j];//new
        }
    }
    delete []next;
    if(j==(int)T.size())
        return i-j;
    else
        return -1;
}
int main()
{
    string M="abcdefabcdx";
    string T="abcdx";

    cout<<KMP(M,T)<<endl;

    return 0;
}

int *getnext(string T){
    int j,k;
    int *next = new int [T.size()];

    j=0,k=-1;
    next[j] = k;
    while(j<T.size()){
        if(k==-1||T[j]==T[k])
            next[++j] = ++k;
        else
            k = next[k];
    }
    return next;

}

KMP algorithm is improved first.  

Keywords: data structure

Added by edwin_h on Sun, 02 Jan 2022 16:50:39 +0200