Detailed explanation of string Hash 2D Hash

String hash method

character string h a s h hash hash is to map a string into a nonnegative integer.

(the probability of simultaneous collision is very low)

Design a large prime number p p p. Think of strings as p p p-ary number. (e.g 31 31 31， 131 131 131， 1331 1331 1331)

(think about why prime numbers are used)

For known strings s s s. We have h a s h hash The hash value is H ( s ) \text{H}(s) H(s), so where are we s s Add a character after s c c c forms a new string s + c s+c s+c, of the string h a s h hash The hash value is H ( s ) ∗ p + ( c − ′ a ′ + 1 ) \text{H}(s)*p+(c-'a'+1) H(s)∗p+(c−′a′+1).

If we take p = 131 p=131 p=131 for Strings ′ a b a c ′ 'abac' ′abac′， p p The p-ary number is ( 1   2   1   3 ) p (1~2~1~3)_p (1 # 2 # 1 # 3)p, i.e h a s h hash The hash value is 1 ∗ p 3 + 2 ∗ p 2 + 1 ∗ p + 3 1*p^3+2*p^2+1*p+3 1∗p3+2∗p2+1∗p+3. At this point, our string is updated to ′ a b a c d ′ 'abacd' 'abacd', then our ′ a b a c ′ 'abac' 'abac' needs to be shifted one bit to the left, p p p-ary number becomes ( 1   2   1   3   0 ) p (1~2~1~3~0)_p (1 # 2 # 1 # 3 # 0)p, plus characters ' d ' d Value of'd ' ( ′ d ′ − ′ a ′ + 1 ) ('d'-'a'+1) (′d′−′a′+1)， p p Last p-ary number is ( 1   2   1   3   4 ) p (1~2~1~3~4)_p (1 2 1 3 4)p​， h a s h hash The hash value is H ( ′ a b a c ′ ) ∗ p + ( ′ d ′ − ′ a ′ + 1 ) \text{H}('abac')*p+('d'-'a'+1) H(′abac′)∗p+(′d′−′a′+1)

• double h a s h hash hash: use two prime numbers as p 1 p_1 p1​， p 2 p_2 p2, calculate two h a s h hash hash value, when two h a s h hash The hash values are equal respectively, which means that the two strings are equal. This method has less collision probability
Get substring hash value

So for a string, when we need to take its substring, for example, for a string s s s. We need to take its substring s l , r s_{l,r} sl,r + h a s h hash hash value, we can use H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p r − l + 1 \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^{r-l+1} H(s0,r​)−H(s0,l−1​)∗pr−l+1.

as s 0 , r = ′ a b a c ′ s_{0,r}='abac' s0,r​=′abac′， s 0 , l − 1 = ′ a b ′ s_{0,l-1}='ab' s0,l − 1 = 'ab', so H ( s 0 , r ) = ( 1   2   1   3 ) p \text{H}(s_{0,r})=(1~2~1~3)_p H(s0,r​)=(1 2 1 3)p​， H ( s 0 , l − 1 ) = ( 1   2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p, so for s l , r = ′ a c ′ s_{l,r}='ac' sl,r = 'ac', first H ( s 0 , l − 1 ) = ( 1   2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p move left l e n g t h ( s l , r ) length(s_{l,r}) The length(sl,r) bit becomes H ( s 0 , l − 1 ) ∗ p 2 = ( 1   2   0   0 ) p \text{H}(s_{0,l-1})*p^2=(1~2~0~0)_p H(s0,l − 1) * p2=(1 − 2 − 0)p, last H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p 2 = ( 1   3 ) p = H ( ′ a c ′ ) = H ( s l , r ) \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^2=(1~3)_p=\text{H}('ac')=\text{H}(s_{l,r}) H(s0,r​)−H(s0,l−1​)∗p2=(1 3)p​=H(′ac′)=H(sl,r​)

Overflow and mold taking

When p p p is very large and the string is very long h a s h hash The hash value will also be very large. At this time, it is generally necessary to design a value m m m to string h a s h hash The hash value is used to take the module, with 2 2 Two treatment methods:

• use 2 64 2^{64} 264, i.e. direct use u n s i g n e d   l o n g   l o n g unsigned~long~long unsigned # long # long storage h a s h hash hash value, which is equivalent to automatic overflow 2 64 2^{64} 264 take the mold
• Use a large prime number for modulus

In general, strings h a s h hash hash is difficult to construct card data. Relatively speaking, the first and double h a s h hash hash collision probability is smaller, and the first method constant will be smaller because there is no need to take a module.

code

// acwing rabbit and rabbit
#include <bits/stdc++.h>
// using namespace std;

#define ull unsigned long long
#define p 131

int n;
char s;
ull b;
ull hash;

int main() {
b = 1; // Power 0 of p
hash = 0;
scanf("%s", s);
int len = strlen(s);
for (int i = 1; i <= len + 3; i++) {
b[i] = b[i - 1] * p;
hash[i] = hash[i - 1] * p + (s[i - 1] - 'a' + 1);
}
scanf("%d", &n);
for (int i = 0, l1, l2, r1, r2; i < n; i++) {
scanf("%d%d%d%d", &l1, &r1, &l2, &r2);
ull s1 = hash[r1] - hash[l1 - 1] * b[r1 - l1 + 1];
ull s2 = hash[r2] - hash[l2 - 1] * b[r2 - l2 + 1];
if (s1 == s2) {
puts("YES");
} else
puts("NO");
}
return 0;
}

Two dimensional hash

It can be compared with two-dimensional prefix and. Pay attention to the horizontal h a s h hash hash and portrait h a s h hash hash used p p The p value should not be the same.

code

// UVa 11019
#include <bits/stdc++.h>
using namespace std;

#define ull unsigned long long
#define p1 131
#define p2 1331

int n, m, x, y;
char s1, s2;
ull h, h2, b1, b2;

void solve() {
int ans = 0;
scanf("%d%d", &n, &m);
getchar();
for (int i = 1; i <= n; i++) scanf("%s", s1[i]);
scanf("%d%d", &x, &y);
getchar();
for (int i = 1; i <= x; i++) scanf("%s", s2[i]);
for (int i = 1; i <= n; i++)
for (int j = 1; j <= m; j++)
h[i][j] = h[i][j - 1] * p1 + (int)(s1[i][j - 1] - 'a' + 1);
for (int j = 1; j <= m; j++)
for (int i = 1; i <= n; i++) h[i][j] += h[i - 1][j] * p2;
for (int i = 1; i <= x; i++)
for (int j = 1; j <= y; j++)
h2[i][j] = h2[i][j - 1] * p1 + (int)(s2[i][j - 1] - 'a' + 1);
for (int j = 1; j <= y; j++)
for (int i = 1; i <= x; i++) h2[i][j] += h2[i - 1][j] * p2;
ull val = h2[x][y];
// printf("# %llu #\n", val);
for (int i = x; i <= n; i++) {
for (int j = y; j <= m; j++) {
ull tmp = h[i][j] - h[i - x][j] * b2[x] - h[i][j - y] * b1[y] +
h[i - x][j - y] * b1[y] * b2[x];
if (tmp == val) ++ans;
}
}
printf("%d\n", ans);
}

int main() {
b1 = b2 = 1;
for (int i = 1; i < 1000 + 3; i++) {
b1[i] = b1[i - 1] * p1;
b2[i] = b2[i - 1] * p2;
}
int t;
scanf("%d", &t);
while (t--) solve();
return 0;
}

Refer to the advanced guide to algorithm competition

hdu 1711

poj 1200

poj 3461

Two dimensional hash

UVa 11019

Keywords: C++ Algorithm

Added by dominant on Thu, 06 Jan 2022 10:29:05 +0200