Detailed explanation of string Hash 2D Hash

String hash method

character string h a s h hash hash is to map a string into a nonnegative integer.

(the probability of simultaneous collision is very low)

Design a large prime number p p p. Think of strings as p p p-ary number. (e.g 31 31 31， 131 131 131， 1331 1331 1331)

(think about why prime numbers are used)

For known strings s s s. We have h a s h hash The hash value is H ( s ) \text{H}(s) H(s), so where are we s s Add a character after s c c c forms a new string s + c s+c s+c, of the string h a s h hash The hash value is H ( s ) ∗ p + ( c − ′ a ′ + 1 ) \text{H}(s)*p+(c-'a'+1) H(s)∗p+(c−′a′+1).

If we take p = 131 p=131 p=131 for Strings ′ a b a c ′ 'abac' ′abac′， p p The p-ary number is ( 1 2 1 3 ) p (1~2~1~3)_p (1 # 2 # 1 # 3)p, i.e h a s h hash The hash value is 1 ∗ p 3 + 2 ∗ p 2 + 1 ∗ p + 3 1*p^3+2*p^2+1*p+3 1∗p3+2∗p2+1∗p+3. At this point, our string is updated to ′ a b a c d ′ 'abacd' 'abacd', then our ′ a b a c ′ 'abac' 'abac' needs to be shifted one bit to the left, p p p-ary number becomes ( 1 2 1 3 0 ) p (1~2~1~3~0)_p (1 # 2 # 1 # 3 # 0)p, plus characters ' d ' `d` Value of'd ' ( ′ d ′ − ′ a ′ + 1 ) ('d'-'a'+1) (′d′−′a′+1)， p p Last p-ary number is ( 1 2 1 3 4 ) p (1~2~1~3~4)_p (1 2 1 3 4)p， h a s h hash The hash value is H ( ′ a b a c ′ ) ∗ p + ( ′ d ′ − ′ a ′ + 1 ) \text{H}('abac')*p+('d'-'a'+1) H(′abac′)∗p+(′d′−′a′+1)

double h a s h hash hash: use two prime numbers as p 1 p_1 p1， p 2 p_2 p2, calculate two h a s h hash hash value, when two h a s h hash The hash values are equal respectively, which means that the two strings are equal. This method has less collision probability

Get substring hash value

So for a string, when we need to take its substring, for example, for a string s s s. We need to take its substring s l , r s_{l,r} sl,r + h a s h hash hash value, we can use H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p r − l + 1 \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^{r-l+1} H(s0,r)−H(s0,l−1)∗pr−l+1.

as s 0 , r = ′ a b a c ′ s_{0,r}='abac' s0,r=′abac′， s 0 , l − 1 = ′ a b ′ s_{0,l-1}='ab' s0,l − 1 = 'ab', so H ( s 0 , r ) = ( 1 2 1 3 ) p \text{H}(s_{0,r})=(1~2~1~3)_p H(s0,r)=(1 2 1 3)p， H ( s 0 , l − 1 ) = ( 1 2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p, so for s l , r = ′ a c ′ s_{l,r}='ac' sl,r = 'ac', first H ( s 0 , l − 1 ) = ( 1 2 ) p \text{H}(s_{0,l-1})=(1~2)_p H(s0,l − 1) = (1 − 2)p move left l e n g t h ( s l , r ) length(s_{l,r}) The length(sl,r) bit becomes H ( s 0 , l − 1 ) ∗ p 2 = ( 1 2 0 0 ) p \text{H}(s_{0,l-1})*p^2=(1~2~0~0)_p H(s0,l − 1) * p2=(1 − 2 − 0)p, last H ( s 0 , r ) − H ( s 0 , l − 1 ) ∗ p 2 = ( 1 3 ) p = H ( ′ a c ′ ) = H ( s l , r ) \text{H}(s_{0,r})-\text{H}(s_{0,l-1})*p^2=(1~3)_p=\text{H}('ac')=\text{H}(s_{l,r}) H(s0,r)−H(s0,l−1)∗p2=(1 3)p=H(′ac′)=H(sl,r)

Overflow and mold taking

When p p p is very large and the string is very long h a s h hash The hash value will also be very large. At this time, it is generally necessary to design a value m m m to string h a s h hash The hash value is used to take the module, with 2 2 Two treatment methods:

use 2 64 2^{64} 264, i.e. direct use u n s i g n e d l o n g l o n g unsigned~long~long unsigned # long # long storage h a s h hash hash value, which is equivalent to automatic overflow 2 64 2^{64} 264 take the mold
Use a large prime number for modulus

In general, strings h a s h hash hash is difficult to construct card data. Relatively speaking, the first and double h a s h hash hash collision probability is smaller, and the first method constant will be smaller because there is no need to take a module.

code

// acwing rabbit and rabbit
#include <bits/stdc++.h>
// using namespace std;

#define ull unsigned long long
#define p 131

int n;
char s[1000005];
ull b[1000005];
ull hash[1000005];

int main() {
    b[0] = 1; // Power 0 of p
    hash[0] = 0;
    scanf("%s", s);
    int len = strlen(s);
    for (int i = 1; i <= len + 3; i++) {
        b[i] = b[i - 1] * p;
        hash[i] = hash[i - 1] * p + (s[i - 1] - 'a' + 1);
    }
    scanf("%d", &n);
    for (int i = 0, l1, l2, r1, r2; i < n; i++) {
        scanf("%d%d%d%d", &l1, &r1, &l2, &r2);
        ull s1 = hash[r1] - hash[l1 - 1] * b[r1 - l1 + 1];
        ull s2 = hash[r2] - hash[l2 - 1] * b[r2 - l2 + 1];
        if (s1 == s2) {
            puts("YES");
        } else
            puts("NO");
    }
    return 0;
}

Two dimensional hash

It can be compared with two-dimensional prefix and. Pay attention to the horizontal h a s h hash hash and portrait h a s h hash hash used p p The p value should not be the same.

code

// UVa 11019
#include <bits/stdc++.h>
using namespace std;

#define ull unsigned long long
#define p1 131
#define p2 1331

int n, m, x, y;
char s1[1005][1005], s2[105][105];
ull h[1005][1005], h2[105][105], b1[1005], b2[1005];

void solve() {
    int ans = 0;
    scanf("%d%d", &n, &m);
    getchar();
    for (int i = 1; i <= n; i++) scanf("%s", s1[i]);
    scanf("%d%d", &x, &y);
    getchar();
    for (int i = 1; i <= x; i++) scanf("%s", s2[i]);
    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= m; j++)
            h[i][j] = h[i][j - 1] * p1 + (int)(s1[i][j - 1] - 'a' + 1);
    for (int j = 1; j <= m; j++)
        for (int i = 1; i <= n; i++) h[i][j] += h[i - 1][j] * p2;
    for (int i = 1; i <= x; i++)
        for (int j = 1; j <= y; j++)
            h2[i][j] = h2[i][j - 1] * p1 + (int)(s2[i][j - 1] - 'a' + 1);
    for (int j = 1; j <= y; j++)
        for (int i = 1; i <= x; i++) h2[i][j] += h2[i - 1][j] * p2;
    ull val = h2[x][y];
    // printf("# %llu #\n", val);
    for (int i = x; i <= n; i++) {
        for (int j = y; j <= m; j++) {
            ull tmp = h[i][j] - h[i - x][j] * b2[x] - h[i][j - y] * b1[y] +
                      h[i - x][j - y] * b1[y] * b2[x];
            if (tmp == val) ++ans;
        }
    }
    printf("%d\n", ans);
}

int main() {
    b1[0] = b2[0] = 1;
    for (int i = 1; i < 1000 + 3; i++) {
        b1[i] = b1[i - 1] * p1;
        b2[i] = b2[i - 1] * p2;
    }
    int t;
    scanf("%d", &t);
    while (t--) solve();
    return 0;
}

Refer to the advanced guide to algorithm competition

String hash

hdu 1711

poj 1200

poj 3461

Two dimensional hash

UVa 11019

Keywords: C++ Algorithm

Added by dominant on Thu, 06 Jan 2022 10:29:05 +0200

Programming VIP

Detailed explanation of string Hash 2D Hash

String hash method

Get substring hash value

Overflow and mold taking

code

Two dimensional hash

code

String hash

Two dimensional hash

Popular Keywords