For any information, human beings always have an impulse to find its most essential composition. For example, for all numbers, we will study prime numbers because prime numbers can not be decomposed, so any integer can be written in the form of prime factor multiplication. For strings, it seems irregular, but due to grammatical reasons, in fact, many strings use few types of characters, that is, the frequency of 26 letters in the alphabet is different. So humans began to study the minimum cyclic section, that is, whether a string is spliced by a cyclic section string. Let's look at the following example:
Pku2406 Power Strings
Find out how many repeated substrings a string consists of. For example, ababab is connected by three ABS, so the answer is 3. For example, abcd is connected by one abcd, so the answer is 1
Format
Input
Multiple groups of data, with "." representing the end of the test, the length of the string given by each group of data is < = 1e6
Output
Such as title
sample input
abcd
aaaa
ababab
.
sample output
1
4
3
Solution 1:
For this problem, let the read string exist in the character array s, and let its length be len
So you can enumerate the length of the loop section to be i, that is, the first i characters of the character array form the loop section, and then you can verify it. Since character comparison is involved here, hash is used.
#include<iostream> #include<cstdio> #include<string> #include<cstring> using namespace std; typedef unsigned long long LL; const LL base=131; const int N=1000010; int n; LL power[N],sum[N]; bool check(LL v,int k) //Judge whether s[1]~s[k] is a circular section { for(register int i=1;i+k-1<=n;i+=k){ if(v!=sum[i+k-1]-sum[i-1]*power[k]) return 0; } return 1; } int main() { power[0]=1; for(register int i=1;i<=N-10;++i) //hash preparation power[i]=power[i-1]*base; char s[N]; while(scanf("%s",s+1)){ if(s[1]=='.')break; n=strlen(s+1); sum[0]=0; for(register int i=1;i<=n;++i) sum[i]=sum[i-1]*base+LL(s[i]); for(register int i=1;i<=n;++i){ if(n%i)continue; LL expect=sum[i]; if(check(expect,i)){ printf("%d\n",n/i); break; } } } return 0; }
Solution 2:
In the above method, we set the length of the loop section as i, of course, i must be the divisor of len. So the whole string is divided into len/i. Then compare the past one by one. Dare to guess, can we not compare so many times?
Let's draw a diagram. The string s is divided as follows:
In order to distinguish, these paragraphs are marked with different colors.
If the first paragraph is the cyclic section we want, we will copy s once and shift i bit to the right
If the paragraph a2-a5 is equal to the following paragraph a1-a4, we can see that
A2=a1,a3=a2,a4=a3,a5=a4.
So the cycle section is A1
After analyzing this property, we only need a comparison between characters to know whether a prefix of the string is a cyclic section of the whole string.
#include<bits/stdc++.h> using namespace std; typedef unsigned long long ull; char a[2000000]; int len=0; ull sum[2000000],power[2000000]; ull get(int l,int r) { return sum[r]-sum[l-1]*power[r-l+1]; } int main() { while(true) { scanf("%s",a+1); len=strlen(a+1); if(a[1]=='.'&&len==1)break; memset(sum,0,sizeof(sum)); memset(power,0,sizeof(power)); for(int i=1;i<=len;i++) sum[i]=sum[i-1]*193+ull(a[i])+1; power[0]=1; for(int i=1;i<=len;i++) power[i]=power[i-1]*193; for(int i=1;i<=len;i++) //Specifies the length of the loop section { if(len%i!=0)continue; else { ull a1=get(1,len-i),a2=get(i+1,len); //Note: take the prefix with length len-i to see if it is equal to the suffix with length len-i if(a1==a2) { printf("%d\n",len/i);//Get the number of cyclic sections break; } } } } return 0; }
Sol3:
In problem solution 2, the number of comparisons is reduced, and it seems that there is no optimization. We turn our attention to the element of the length of the circular joint. In the previous methods, we only require the length i of the loop section to be the divisor of the total length len, so the number of segments divided is k=len/i, without considering the composition of the read string. Obviously, we can count the number of occurrences of each letter in the string. We might as well set it as sum1... Sum26. When we divide the whole string into k segments according to the loop section, we "divide" these letters into k segments, so K is gcd(len,sum1,sum2.... sum26) at most. If the detection is unsuccessful, it should also be the divisor of gcd(len,sum1,sum2.... sum26), So far, we have more accurately constrained the K range, and the code is omitted.