String hash method
character string h a s h hash hash is to map a string into a nonnegative integer.
(the probability of simultaneous collision is very low)
Design a large prime number p p p. Think of strings as p p p-ary number. (e.g 31 31 31, 131 131 131, 1331 1331 1331)
(think about why prime numbers are used)
For known strings s s s. We have h a s h hash The hash value is H ( s ) \text{H}(s) H(s), so where are we s s Add a character after s c c c forms a new string s + c s+c s+c, of the string h a s h hash The hash value is H ( s ) ∗ p + ( c − ′ a ′ + 1 ) \text{H}(s)*p+(c-'a'+1) H(s)∗p+(c−′a′+1).
If we take p = 131 p=131 p=131 for Strings ′ a b a c ′ 'abac' ′abac′, p p The p-ary number is ( 1 2 1 3 ) p (1~2~1~3)_p (1 # 2 # 1 # 3)p, i.e h a s h hash The hash value is 1 ∗ p 3 + 2 ∗ p 2 + 1 ∗ p + 3 1*p^3+2*p^2+1*p+3 1∗p3+2∗p2+1∗p+3. At this point, our string is updated to ′ a b a c d ′ 'abacd' 'abacd', then our ′ a b a c ′ 'abac' 'abac' needs to be shifted one bit to the left, p p p-ary number becomes ( 1 2 1 3 0 ) p (1~2~1~3~0)_p (1 # 2 # 1 # 3 # 0)p, plus characters ' d ' `d` Value of'd ' ( ′ d ′ − ′ a ′ + 1 ) ('d'-'a'+1) (′d′−′a′+1), p p Last p-ary number is ( 1 2 1 3 4 ) p (1~2~1~3~4)_p (1 2 1 3 4)p, h a s h hash The hash value is H ( ′ a b a c ′ ) ∗ p + ( ′ d ′ − ′ a ′ + 1 ) \text{H}('abac')*p+('d'-'a'+1) H(′abac′)∗p+(′d′−′a′+1)
- double h a s h hash hash: use two prime numbers as p 1 p_1 p1, p 2 p_2 p2, calculate two h a s h hash hash value, when two h a s h hash The hash values are equal respectively, which means that the two strings are equal. This method has less collision probability
Get substring hash value
So for a string, when we need to take its substring, for example, for a string s s s. We need to take its substring s l , r s_{l,r} sl,r + h a s h hash hash value, we can use H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p r − l + 1 \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^{r-l+1} H(s0,r)−H(s0,l−1)∗pr−l+1.
as s 0 , r = ′ a b a c ′ s_{0,r}='abac' s0,r=′abac′, s 0 , l − 1 = ′ a b ′ s_{0,l-1}='ab' s0,l − 1 = 'ab', so H ( s 0 , r ) = ( 1 2 1 3 ) p \text{H}(s_{0,r})=(1~2~1~3)_p H(s0,r)=(1 2 1 3)p, H ( s 0 , l − 1 ) = ( 1 2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p, so for s l , r = ′ a c ′ s_{l,r}='ac' sl,r = 'ac', first H ( s 0 , l − 1 ) = ( 1 2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p move left l e n g t h ( s l , r ) length(s_{l,r}) The length(sl,r) bit becomes H ( s 0 , l − 1 ) ∗ p 2 = ( 1 2 0 0 ) p \text{H}(s_{0,l-1})*p^2=(1~2~0~0)_p H(s0,l − 1) * p2=(1 − 2 − 0)p, last H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p 2 = ( 1 3 ) p = H ( ′ a c ′ ) = H ( s l , r ) \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^2=(1~3)_p=\text{H}('ac')=\text{H}(s_{l,r}) H(s0,r)−H(s0,l−1)∗p2=(1 3)p=H(′ac′)=H(sl,r)
Overflow and mold taking
When p p p is very large and the string is very long h a s h hash The hash value will also be very large. At this time, it is generally necessary to design a value m m m to string h a s h hash The hash value is used to take the module, with 2 2 Two treatment methods:
- use 2 64 2^{64} 264, i.e. direct use u n s i g n e d l o n g l o n g unsigned~long~long unsigned # long # long storage h a s h hash hash value, which is equivalent to automatic overflow 2 64 2^{64} 264 take the mold
- Use a large prime number for modulus
In general, strings h a s h hash hash is difficult to construct card data. Relatively speaking, the first and double h a s h hash hash collision probability is smaller, and the first method constant will be smaller because there is no need to take a module.
code
// acwing rabbit and rabbit #include <bits/stdc++.h> // using namespace std; #define ull unsigned long long #define p 131 int n; char s[1000005]; ull b[1000005]; ull hash[1000005]; int main() { b[0] = 1; // Power 0 of p hash[0] = 0; scanf("%s", s); int len = strlen(s); for (int i = 1; i <= len + 3; i++) { b[i] = b[i - 1] * p; hash[i] = hash[i - 1] * p + (s[i - 1] - 'a' + 1); } scanf("%d", &n); for (int i = 0, l1, l2, r1, r2; i < n; i++) { scanf("%d%d%d%d", &l1, &r1, &l2, &r2); ull s1 = hash[r1] - hash[l1 - 1] * b[r1 - l1 + 1]; ull s2 = hash[r2] - hash[l2 - 1] * b[r2 - l2 + 1]; if (s1 == s2) { puts("YES"); } else puts("NO"); } return 0; }
Two dimensional hash
It can be compared with two-dimensional prefix and. Pay attention to the horizontal h a s h hash hash and portrait h a s h hash hash used p p The p value should not be the same.
code
// UVa 11019 #include <bits/stdc++.h> using namespace std; #define ull unsigned long long #define p1 131 #define p2 1331 int n, m, x, y; char s1[1005][1005], s2[105][105]; ull h[1005][1005], h2[105][105], b1[1005], b2[1005]; void solve() { int ans = 0; scanf("%d%d", &n, &m); getchar(); for (int i = 1; i <= n; i++) scanf("%s", s1[i]); scanf("%d%d", &x, &y); getchar(); for (int i = 1; i <= x; i++) scanf("%s", s2[i]); for (int i = 1; i <= n; i++) for (int j = 1; j <= m; j++) h[i][j] = h[i][j - 1] * p1 + (int)(s1[i][j - 1] - 'a' + 1); for (int j = 1; j <= m; j++) for (int i = 1; i <= n; i++) h[i][j] += h[i - 1][j] * p2; for (int i = 1; i <= x; i++) for (int j = 1; j <= y; j++) h2[i][j] = h2[i][j - 1] * p1 + (int)(s2[i][j - 1] - 'a' + 1); for (int j = 1; j <= y; j++) for (int i = 1; i <= x; i++) h2[i][j] += h2[i - 1][j] * p2; ull val = h2[x][y]; // printf("# %llu #\n", val); for (int i = x; i <= n; i++) { for (int j = y; j <= m; j++) { ull tmp = h[i][j] - h[i - x][j] * b2[x] - h[i][j - y] * b1[y] + h[i - x][j - y] * b1[y] * b2[x]; if (tmp == val) ++ans; } } printf("%d\n", ans); } int main() { b1[0] = b2[0] = 1; for (int i = 1; i < 1000 + 3; i++) { b1[i] = b1[i - 1] * p1; b2[i] = b2[i - 1] * p2; } int t; scanf("%d", &t); while (t--) solve(); return 0; }
Refer to the advanced guide to algorithm competition
String hash
hdu 1711
poj 1200
poj 3461
Two dimensional hash
UVa 11019