Detailed explanation of string Hash 2D Hash

String hash method

character string h a s h hash hash is to map a string into a nonnegative integer.

(the probability of simultaneous collision is very low)

Design a large prime number p p p. Think of strings as p p p-ary number. (e.g 31 31 31, 131 131 131, 1331 1331 1331)

(think about why prime numbers are used)

For known strings s s s. We have h a s h hash The hash value is H ( s ) \text{H}(s) H(s), so where are we s s Add a character after s c c c forms a new string s + c s+c s+c, of the string h a s h hash The hash value is H ( s ) ∗ p + ( c − ′ a ′ + 1 ) \text{H}(s)*p+(c-'a'+1) H(s)∗p+(c−′a′+1).

If we take p = 131 p=131 p=131 for Strings ′ a b a c ′ 'abac' ′abac′, p p The p-ary number is ( 1   2   1   3 ) p (1~2~1~3)_p (1 # 2 # 1 # 3)p, i.e h a s h hash The hash value is 1 ∗ p 3 + 2 ∗ p 2 + 1 ∗ p + 3 1*p^3+2*p^2+1*p+3 1∗p3+2∗p2+1∗p+3. At this point, our string is updated to ′ a b a c d ′ 'abacd' 'abacd', then our ′ a b a c ′ 'abac' 'abac' needs to be shifted one bit to the left, p p p-ary number becomes ( 1   2   1   3   0 ) p (1~2~1~3~0)_p (1 # 2 # 1 # 3 # 0)p, plus characters ' d ' `d` Value of'd ' ( ′ d ′ − ′ a ′ + 1 ) ('d'-'a'+1) (′d′−′a′+1), p p Last p-ary number is ( 1   2   1   3   4 ) p (1~2~1~3~4)_p (1 2 1 3 4)p​, h a s h hash The hash value is H ( ′ a b a c ′ ) ∗ p + ( ′ d ′ − ′ a ′ + 1 ) \text{H}('abac')*p+('d'-'a'+1) H(′abac′)∗p+(′d′−′a′+1)

  • double h a s h hash hash: use two prime numbers as p 1 p_1 p1​, p 2 p_2 p2, calculate two h a s h hash hash value, when two h a s h hash The hash values are equal respectively, which means that the two strings are equal. This method has less collision probability
Get substring hash value

So for a string, when we need to take its substring, for example, for a string s s s. We need to take its substring s l , r s_{l,r} sl,r + h a s h hash hash value, we can use H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p r − l + 1 \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^{r-l+1} H(s0,r​)−H(s0,l−1​)∗pr−l+1.

as s 0 , r = ′ a b a c ′ s_{0,r}='abac' s0,r​=′abac′, s 0 , l − 1 = ′ a b ′ s_{0,l-1}='ab' s0,l − 1 = 'ab', so H ( s 0 , r ) = ( 1   2   1   3 ) p \text{H}(s_{0,r})=(1~2~1~3)_p H(s0,r​)=(1 2 1 3)p​, H ( s 0 , l − 1 ) = ( 1   2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p, so for s l , r = ′ a c ′ s_{l,r}='ac' sl,r = 'ac', first H ( s 0 , l − 1 ) = ( 1   2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p move left l e n g t h ( s l , r ) length(s_{l,r}) The length(sl,r) bit becomes H ( s 0 , l − 1 ) ∗ p 2 = ( 1   2   0   0 ) p \text{H}(s_{0,l-1})*p^2=(1~2~0~0)_p H(s0,l − 1) * p2=(1 − 2 − 0)p, last H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p 2 = ( 1   3 ) p = H ( ′ a c ′ ) = H ( s l , r ) \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^2=(1~3)_p=\text{H}('ac')=\text{H}(s_{l,r}) H(s0,r​)−H(s0,l−1​)∗p2=(1 3)p​=H(′ac′)=H(sl,r​)

Overflow and mold taking

When p p p is very large and the string is very long h a s h hash The hash value will also be very large. At this time, it is generally necessary to design a value m m m to string h a s h hash The hash value is used to take the module, with 2 2 Two treatment methods:

  • use 2 64 2^{64} 264, i.e. direct use u n s i g n e d   l o n g   l o n g unsigned~long~long unsigned # long # long storage h a s h hash hash value, which is equivalent to automatic overflow 2 64 2^{64} 264 take the mold
  • Use a large prime number for modulus

In general, strings h a s h hash hash is difficult to construct card data. Relatively speaking, the first and double h a s h hash hash collision probability is smaller, and the first method constant will be smaller because there is no need to take a module.

code

// acwing rabbit and rabbit
#include <bits/stdc++.h>
// using namespace std;

#define ull unsigned long long
#define p 131

int n;
char s[1000005];
ull b[1000005];
ull hash[1000005];

int main() {
    b[0] = 1; // Power 0 of p
    hash[0] = 0;
    scanf("%s", s);
    int len = strlen(s);
    for (int i = 1; i <= len + 3; i++) {
        b[i] = b[i - 1] * p;
        hash[i] = hash[i - 1] * p + (s[i - 1] - 'a' + 1);
    }
    scanf("%d", &n);
    for (int i = 0, l1, l2, r1, r2; i < n; i++) {
        scanf("%d%d%d%d", &l1, &r1, &l2, &r2);
        ull s1 = hash[r1] - hash[l1 - 1] * b[r1 - l1 + 1];
        ull s2 = hash[r2] - hash[l2 - 1] * b[r2 - l2 + 1];
        if (s1 == s2) {
            puts("YES");
        } else
            puts("NO");
    }
    return 0;
}

Two dimensional hash

It can be compared with two-dimensional prefix and. Pay attention to the horizontal h a s h hash hash and portrait h a s h hash hash used p p The p value should not be the same.

code

// UVa 11019
#include <bits/stdc++.h>
using namespace std;

#define ull unsigned long long
#define p1 131
#define p2 1331

int n, m, x, y;
char s1[1005][1005], s2[105][105];
ull h[1005][1005], h2[105][105], b1[1005], b2[1005];

void solve() {
    int ans = 0;
    scanf("%d%d", &n, &m);
    getchar();
    for (int i = 1; i <= n; i++) scanf("%s", s1[i]);
    scanf("%d%d", &x, &y);
    getchar();
    for (int i = 1; i <= x; i++) scanf("%s", s2[i]);
    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= m; j++)
            h[i][j] = h[i][j - 1] * p1 + (int)(s1[i][j - 1] - 'a' + 1);
    for (int j = 1; j <= m; j++)
        for (int i = 1; i <= n; i++) h[i][j] += h[i - 1][j] * p2;
    for (int i = 1; i <= x; i++)
        for (int j = 1; j <= y; j++)
            h2[i][j] = h2[i][j - 1] * p1 + (int)(s2[i][j - 1] - 'a' + 1);
    for (int j = 1; j <= y; j++)
        for (int i = 1; i <= x; i++) h2[i][j] += h2[i - 1][j] * p2;
    ull val = h2[x][y];
    // printf("# %llu #\n", val);
    for (int i = x; i <= n; i++) {
        for (int j = y; j <= m; j++) {
            ull tmp = h[i][j] - h[i - x][j] * b2[x] - h[i][j - y] * b1[y] +
                      h[i - x][j - y] * b1[y] * b2[x];
            if (tmp == val) ++ans;
        }
    }
    printf("%d\n", ans);
}

int main() {
    b1[0] = b2[0] = 1;
    for (int i = 1; i < 1000 + 3; i++) {
        b1[i] = b1[i - 1] * p1;
        b2[i] = b2[i - 1] * p2;
    }
    int t;
    scanf("%d", &t);
    while (t--) solve();
    return 0;
}

Refer to the advanced guide to algorithm competition

String hash

hdu 1711

poj 1200

poj 3461

Two dimensional hash

UVa 11019

Keywords: C++ Algorithm

Added by dominant on Thu, 06 Jan 2022 10:29:05 +0200