Solution - "JSOI 2019" and "Luogu P5334" festival celebrations

\(\mathscr{Description}\)

  Link.

   given the string \ (S \), find the starting subscript of the minimum representation of each prefix of \ (S \) (if there are multiple, take the smallest).

  \(|S|\le3\times10^6\).

\(\mathscr{Solution}\)

   notice the obvious fact that for a prefix \ (S[:i] \) and two starting subscripts \ (p,q \), if there is \ (s [P: I] < s [Q: I] \), there is \ (s [P: J] < s [Q: J] \) in all \ (J > I \). In other words, the final answer \ (r_i \) must meet \ (r_i\in B_i=\arg\min_{p\le i}\{S[p:i] \} \) and \ (B_{i+1}\subseteq B_i\cup\{i+1 \} \), so we can get \ (B_{i+1} \) by removing the bad starting position in \ (B_i\cup\{i+1 \} \).

Unfortunately, \ (|B_m|=\mathcal O(m) \) needs to be further streamlined. This prompts us to reflect on whether "taking the minimum of \ (S[p:i] \) is a necessary and sufficient condition for \ (p \) to become \ (r_i \)". The answer is No. Consider the suffixes \ (S[p:i] \) and \ (S[q:i] ~ (p < q) \). If \ (p,q\in B_i \), there is \ (S[p:p+i-q+1]=S[q:i] \), that is, \ (S[q:i] \) is a border of \ (S[p:i] \). From the relevant properties of border, it is natural to think of the case of studying \ (|S[q:i] | > \ frac{|S[p:i] |}{2} \), when \ (S[p:i] \) has a period \ (T=|S[p:i]|-|S[q:i] | \). Take \ (S[r:i]=S[i-T+1:i] \), which can explain that \ (S[p:i] \) is not greater than \ (S[q:i] \) and \ (S[r:i] \) at the same time. As shown in the figure:

    if we want to make \ (S[p:i] < s [Q: I] \), we need to add a character \ (s {I + 1} = x \) (red dot) to the suffix to make \ (x > y \) (blue dot). However, once there is \ (y < x \), we can make the last cycle \ (S[r:i] \) take the red dot all the way to the beginning of \ (S[p:i] \) and get \ (S[r:i + 1] < S[p:i + 1] \), so the conclusion is tenable.  \(\square\)

   therefore, it is maintained that there is no new set \ (B_i'\subseteq B_i \) satisfying the above \ (p,q \) relationship. Obviously, \ (| B_m'|=\mathcal O(\log m) \), so the total maintenance process is \ (\ mathcal O(n\log n) \).

   in addition, when finding the answer \ (r_i \), it is still necessary to enumerate each subscript in \ (B_i '\). Using the property of prefix equality, it is found that we only need to complete the quick comparison between \ (S \) substring and \ (S \) prefix, so we can achieve \ (\ mathcal O(1) \) with Z-function. To sum up, the final complexity is \ (\ mathcal O(n\log n) \).

  at this time, God wants to ask, ha, why don't you write \ (\ mathcal O(n) \)?

   looking at this problem again, we naturally associate it with Lyndon Word, which describes "the minimum representation is equal to itself", and Duval algorithm provides a \ (\ mathcal O(n) \) idea to investigate Lyndon Word, so we can try to transform the problem to Lyndon. Let the Lyndon of string \ (s=S[:i] \) be decomposed into

\[s=w_1^{k_1}w_2^{k_2}\cdots w_m^{k_m}, \]

Where \ (w_1 > w_2 > \ cdots > w_m \). Let's first re describe the key conclusion in the practice of \ (\ mathcal O(n\log n) \): obviously, the suffix of the minimum representation is at the beginning of a Lyndon Word. If \ (w_i^kw_{i+1}^{k_{i+1}}\cdots w_m^{k_m} \) is taken out, we can assert: \ (k=k_i \).

   now put on Duval and set \ (s=s'u^ku '\), where \ (u^k \) is Lyndon Word and \ (U' \) is the prefix of \ (U \) that has not been extended. The previous \ (s' \) is not important. Let's study it by remembering \ (t=u^ku '\), and consider adding the character \ (C = s {| s | + 1} \):

  • \(t_{|u '| + 1} < C \), the answer \ (p_{|s|+1} \) is obviously \ (t_1 \) in the subscript corresponding to the original string;
  • \(t_{|u '| + 1} > C \), considering Duval's process, we must first classify a prefix cycle as Lyndon, and then update \ (p_{|s|+1} \);
  • \(t_{|u'|+1}=c \), continue to coincide with the circular section, and two possible optimal solutions: \ (p=i \) and \ (p \) take the optimal starting position of \ (U \) at the corresponding position in \ (U' \). In the second case, although \ (U \) is Lyndon Word, the prefix of \ (S \) may be less than the suffix of \ (U \). Use the above Z-function method to support \ (\ mathcal O(1) \) comparison.

    finally, almost like Duval, we solved the problem in \ (\ mathcal O(n) \) time.

\(\mathscr{Code}\)

  • \(\mathcal O(n\log n)\):
/*+Rainybunny+*/

#include <bits/stdc++.h>

#define rep(i, l, r) for (int i = l, rep##i = r; i <= rep##i; ++i)
#define per(i, r, l) for (int i = r, per##i = l; i >= per##i; --i)

inline void wint(const int x) {
    if (9 < x) wint(x / 10);
    putchar(x % 10 ^ '0');
}

const int MAXN = 3e6, MAXG = 100;
int n, z[MAXN + 5];
char s[MAXN + 5];
int gcnt, good[MAXG + 5]; // indices that may be the answer.

inline void calcZ() {
    z[1] = n;
    for (int i = 2, l = 0, r = 0; i <= n; ++i) {
        if (i <= r) z[i] = std::min(z[i - l + 1], r - i + 1);
        while (i + z[i] <= n && s[i + z[i]] == s[1 + z[i]]) ++z[i];
        if (i + z[i] - 1 > r) r = i + z[l = i] - 1;
    }
}

inline void adapt(const int k) {
    auto compare = [&](const int u, const int v)->int {
        return u < v ?
          (s[u + k - v] == s[k] ? 0 : s[u + k - v] < s[k] ? -1 : 1)
          : (s[k] == s[v + k - u] ? 0 : s[k] < s[v + k - u] ? -1 : 1);
    };

    static int tmp[MAXG + 5]; rep (i, 1, gcnt) tmp[i] = good[i];
    int ocnt = gcnt, pst = good[gcnt]; good[gcnt = 1] = pst;
    per (i, ocnt - 1, 1) {
        if (int t = compare(pst, tmp[i]); t > 0) {
            good[gcnt = 1] = pst = tmp[i];
        } else if (!t) {
            pst = tmp[i];
            while (gcnt && k - good[gcnt] + 1 << 1 > k - pst + 1) --gcnt;
            good[++gcnt] = pst;
        }
    }
    std::reverse(good + 1, good + gcnt + 1);
}

inline int best(const int k) {
    // compare S[l:r] with S[:r-l+1].
    auto compare = [](const int l, const int r)->int {
        if (z[l] >= r - l + 1) return 0;
        return s[l + z[l]] < s[z[l] + 1] ? -1 : 1;
    };

    int ans = good[1];
    rep (i, 2, gcnt) {
        int cur = good[i];
        if (int f1 = compare(ans + k - cur + 1, k); f1 > 0) ans = cur;
        else if (!f1) {
            int f2 = compare(cur - ans + 1, cur - 1); // cur <> ans.
            if (f2 < 0) ans = cur;
        }
    }
    return ans;
}

int main() {
    scanf("%s", s + 1), n = strlen(s + 1), calcZ();

    rep (i, 1, n) {
        good[++gcnt] = i, adapt(i);
        wint(best(i)), putchar(i < n ? ' ' : '\n');
    }
    return 0;
}

  • \(\ mathcal O(n) \) (currently the optimal solution of the valley):
/*+Rainybunny+*/

#include <bits/stdc++.h>

#define rep(i, l, r) for (int i = l, rep##i = r; i <= rep##i; ++i)
#define per(i, r, l) for (int i = r, per##i = l; i >= per##i; --i)

inline void wint(const int x) {
    if (9 < x) wint(x / 10);
    putchar(x % 10 ^ '0');
}

const int MAXN = 3e6;
int n, z[MAXN + 5], ans[MAXN + 5];
char s[MAXN + 5];

inline void calcZ() {
    z[1] = n;
    for (int i = 2, l = 0, r = 0; i <= n; ++i) {
        if (i <= r) z[i] = std::min(z[i - l + 1], r - i + 1);
        while (i + z[i] <= n && s[i + z[i]] == s[1 + z[i]]) ++z[i];
        if (i + z[i] - 1 > r) r = i + z[l = i] - 1;
    }
}

inline void duval() {
    auto compare = [](const int u, const int v, const int k)->int {
        int p = u + k - v + 1, q = k - p + 2;
        if (z[p] < k - p + 1) {
            return s[p + z[p]] < s[1 + z[p]] ? -1 : 1;
        } else {
            return z[q] < u ? s[q + z[q]] < s[1 + z[q]] ? 1 : -1 : 0;
        }
    };
    
    for (int i = 1; i <= n;) {
        int j = i + 1, k = i;
        if (!ans[i]) ans[i] = i;
        while (j <= n && s[k] <= s[j]) {
            if (s[k] < s[j]) {
                if (!ans[j]) ans[j] = i;
                k = i;
            } else {
                if (!ans[j]) {
                    if (ans[k] < i) ans[j] = i;
                    else {
                        ans[j] = compare(i, j - k + ans[k], j) <= 0 ?
                          i : j - k + ans[k];
                    }
                }
                ++k;
            }
            ++j;
        }
        i += (j - i) / (j - k) * (j - k);
    }
}

int main() {
    n = fread(s + 1, 1, MAXN + 3, stdin);
    while (s[n] < 'a' || 'z' < s[n]) --n;
    
    calcZ(), duval();

    rep (i, 1, n) wint(ans[i]), putchar(i < n ? ' ' : '\n');
    return 0;
}

Added by meshi on Tue, 25 Jan 2022 15:17:50 +0200