# Modular combination - Lucas/exLucas - LibreOJ #181 Binomial coefficient

Proof and code are separated, and you can jump according to your own needs.

Go to my Blog, maybe the reading effect is better

## Lucas theorem

( n m ) m o d    p = ( ⌊ n p ⌋ ⌊ m p ⌋ ) ( n % p m % p ) m o d    p \begin{aligned} {n \choose m} \mod p = {\lfloor \frac{n}{p} \rfloor \choose \lfloor \frac{m}{p} \rfloor} {{n \% p} \choose {m \% p}} \mod p \end{aligned} (mn​)modp=(⌊pm​⌋⌊pn​⌋​)(m%pn%p​)modp​
Shangshi Dang p p It holds when p is prime.

At this point, if p p p is a smaller prime, and n , m n,m n. M is a large number (refers to a large number that cannot directly pass through the preprocessing factorial and its inverse element, such as 1 e 18 1e18 1e18), then we can solve the result recursively.

inline ll func(ll n, ll m, ll p, ll res = 1LL) { // n<=m
if (n < m) return 0;
while (m) res = res * C(n % p, m % p) % p, n /= p, m /= p;
return res;
}


According to the above code, we just preprocess p p Factorial and its inverse element within p, time complexity O ( p + log ⁡ m ) O(p + \log m) O(p+logm).

## prove

In fact, the recursive process above has prompted one thing: p p p-ary decomposition.

consider n , m n,m n. M in p p p-ary expression n = ∑ i = 0 k a k p k , m = ∑ i = 0 k b k p k n={\sum_{i=0}^{k}{a_kp^k}},m={\sum_{i=0}^{k}{b_kp^k}} n=∑i=0k​ak​pk,m=∑i=0k​bk​pk，

So Lucas theorem is actually:
( n m ) ≡ ( a 0 b 0 ) ( a 1 b 1 ) ... ( a k b k ) ( m o d p ) {n \choose m}\equiv{a_0 \choose b_0}{a_1 \choose b_1}\dots{a_k \choose b_k}\pmod{p} (mn​)≡(b0​a0​​)(b1​a1​​)...(bk​ak​​)(modp)

Now let's look at a simple formula:

( a + b ) p = ∑ i = 0 p ( p i ) a i b p − i ≡ a p + b p ( m o d p ) (a+b)^p={\sum_{i=0}^{p}{p \choose i}{a^ib^{p-i}}}\equiv{a^p+b^p}\pmod{p} (a+b)p=∑i=0p​(ip​)aibp−i≡ap+bp(modp)
Because the rest has factors p p p

Then we can get:
( 1 + x ) n ≡ ( 1 + x ) ∑ a i p i ≡ ∏ ( 1 + x p i ) a i ( m o d p ) {(1+x)^n}\equiv{(1+x)^{\sum{a_ip^i}}}\equiv{\prod(1+x^{p^i})^{a_i}}\pmod{p} (1+x)n≡(1+x)∑ai​pi≡∏(1+xpi)ai​(modp)
Now consider x m x^m The coefficient of xm is on the left ( n m ) {n \choose m} (mn), on the right is ∏ ( a i b i ) \prod{a_i \choose b_i} Π (bi Π ai) is already Lucas's expression.

## Expansion problem

If the above formula is p p p is a composite number, and n , m n,m n. M is still very large. How to solve it?

From the Chinese remainder theorem,

If combined p p The unique decomposition of p is ∏ i = 1 s p i α i \prod_{i=1}^{s}{p_i}^{\alpha_i} ∏i=1s​pi​ α I, then:
x ≡ ( n m ) ( m o d p ) ⇒ { x ≡ ( n m ) ( m o d p 1 α 1 ) x ≡ ( n m ) ( m o d p 2 α 2 ) ⋮ x ≡ ( n m ) ( m o d p s α s ) x \equiv {n \choose m} \pmod{p} \Rightarrow \begin{cases} x & \equiv & {n \choose m} \pmod{{p_1}^{\alpha_1}} \\ x & \equiv & {n \choose m} \pmod{{p_2}^{\alpha_2}} \\ & \vdots & \\ x & \equiv & {n \choose m} \pmod{{p_s}^{\alpha_s}} \\ \end{cases} x≡(mn​)(modp)⇒⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧​xxx​≡≡⋮≡​(mn​)(modp1​α1​)(mn​)(modp2​α2​)(mn​)(modps​αs​)​

So just find out p p The unique decomposition of p, then solve each congruence separately, and combine the answers through CRT to complete all the answers.

Now the question is how to find out ( n m ) m o d    p i α i {n \choose m} \mod{{p_i}^{\alpha_i}} (mn​)modpi​αi​.

We know ( n m ) = n ! m ! ( n − m ) ! {n \choose m}=\frac{n!}{m!(n-m)!} (mn​)=m!(n−m)!n! , if m ! m! m! And p α p^{\alpha} p α Mutual prime, then the result can be calculated directly as long as the factorial inverse element is obtained by expanding Euclidean. But, first n , m n,m n. Whether the size of M supports us to find the factorial directly. Are they mutually prime? Obviously negative.

However, if we can separate the non coprime part and the coprime part, we can get the answer by solving the inverse element.

be aware, p α p^{\alpha} p α Only one germplasm factor p p p. Then the non prime part of the factorial is obviously and p p p is not mutually prime. The number of such non mutually prime numbers can also be easily calculated, that is p o t p ( n ! ) = ∑ i = 1 ∞ ⌊ n p i ⌋ {pot_p(n!)}={\sum_{i=1}^{\infty}{\lfloor \frac{n}{p^i} \rfloor}} potp​(n!)= ∑i=1∞​⌊pin​⌋. Then in the mold p α p^{\alpha} p α All numbers in the sense Z p α \Z_{p^{\alpha}} Zp α Medium, (note) p α p^{\alpha} p α Relatively small) that is, greater than p α p^{\alpha} p α The number of must be in [ 0 , p − 1 ] [0,p-1] [0,p − 1], and it is easy to know that such numbers appear in cycles.

So now the idea is, preprocessing p α p^{\alpha} p α All and p p The product of p coprime numbers (let be b a s e base base), then n ! n! n! There must be ⌊ n p α ⌋ \lfloor \frac{n}{p^{\alpha}} \rfloor ⌊p α n ⌋ such products, the rest is n % p α n\%p^{\alpha} n%p α Inner and p p The product of the number of p coprime.

f a c ( n ) = b a s e ⌊ n p α ⌋ f a c ( n % p α ) fac(n)=base^{\lfloor \frac{n}{p^{\alpha}} \rfloor}fac(n\%p^{\alpha}) fac(n)=base⌊pαn​⌋fac(n%pα)

To sum up, now let's put factorials in O ( p α + log ⁡ p n ) O({p^{\alpha}}+\log_{p}{n}) O(p α+ In terms of time, logp (n) is divided into two parts: reciprocal prime product fac(n) and non reciprocal prime p^pot.

So, the code.

inline ll pot(ll n, ll p, ll res = 0) {
while (n)
res += n / p, n /= p;
return res;
}

inline ll fac(ll n, ll p, ll mod, ll res = 1LL) {
if (n == 0) return 1LL;
ll base = 1LL;
for (ll i = 2; i < mod; i++)
if (i % p) base = mul(base, i, mod);
while (n) {
ll tmp = qpow(base, n / mod, mod), sz = n % mod;
for (ll i = 2; i <= sz; i++)
if (i % p) tmp = mul(tmp, i, mod);
res = mul(res, tmp, mod), n /= p;
}
return res;
}


So now we can get:
( n m ) ≡ f a c ( n ) × i n v ( f a c ( m ) ) × i n v ( f a c ( n − m ) ) × p p o t p ( n ) − p o t p ( m ) − p o t p ( n − m ) ( m o d p α ) {n \choose m} \equiv fac(n) \times inv(fac(m)) \times inv(fac(n-m)) \times p^{pot_p(n)-pot_p(m)-pot_p(n-m)} \pmod{p^{\alpha}} (mn​)≡fac(n)×inv(fac(m))×inv(fac(n−m))×ppotp​(n)−potp​(m)−potp​(n−m)(modpα)
Then there is a simple CRT, which is not coded separately.

Logue P4720 [template] extended Lucas theorem / exLucas

Just give the whole code:

Because I've been addicted to pressing recently, it may seem a little uncomfortable, but I'm comfortable, but I can copy it locally to format, so I won't change it.

// #pragma GCC optimize("Ofast")
// #pragma GCC target("avx,avx2,fma")

#include <bits/stdc++.h>
using namespace std;
#define fast ios_base::sync_with_stdio(false), cin.tie(0), cout.tie(0)
#define fin freopen("input.txt", "r", stdin), freopen("output.txt", "w", stdout)
#define PII pair<int, int>
#define PLL pair<ll, ll>
#define pb push_back
#define mp make_pair
#define fi first
#define se second
#define ld nd << 1
#define rd nd << 1 | 1

typedef long double lld;
typedef long long ll;
typedef unsigned long long ull;
// typedef __int128_t i128;
typedef vector<ll> vec;

// const int mod = 1e9 + 7;
// const int mod = 998244353;
const int maxn = 1e6 + 50;
const int inf = 0x3f3f3f3f;			// 1e9;
const ll INF = 0x3f3f3f3f3f3f3f3f;	// 1e18;

int T;

/* inline ll mul(ll a, ll b, ll mod, ll res = 0) {
for ( ; b; b >>= 1, a = (a + a) % mod) if (b & 1) res = (res + a) % mod;
return res;
} */

inline ll mul(ll a, ll b, ll p) { return (a * b - (ll)((lld)a / p * b) * p + p) % p; }

inline ll qpow(ll x, ll m, ll mod, ll res = 1) {
for ( ; m; m >>= 1, x = mul(x, x, mod)) if (m & 1) res = mul(res, x, mod);
return res;
}

inline void exgcd(ll a, ll b, ll &g, ll &x, ll &y) {
if (b) exgcd(b, a % b, g, y, x), y -= a / b * x;
else x = 1, y = 0, g = a;
}

inline ll inv(ll x, ll m) {
ll a, b, g;
exgcd(x, m, g, a, b);
return (a % m + m) % m;
}

ll r[maxn], m[maxn], tot = 1;
inline ll CRT() {
ll M = 1LL, ans = 0LL;
for (int i = 1; i < tot; i++) M *= m[i];
for (int i = 1; i < tot; i++) { // combine
ll Mi = M / m[i], Mi_inv = inv(Mi, m[i]);
ans = (ans + mul(mul(r[i], Mi, M), Mi_inv, M)) % M;
} return ans;
}

inline ll pot(ll n, ll p, ll res = 0) {
while (n) res += n / p, n /= p;
return res;
}

inline ll fac(ll n, ll p, ll mod, ll res = 1LL) {
if (n == 0) return 1LL;
ll base = 1LL;
for (ll i = 2; i < mod; i++) if (i % p) base = mul(base, i, mod);
while (n) {
ll tmp = qpow(base, n / mod, mod), sz = n % mod;
for (ll i = 2; i <= sz; i++) if (i % p) tmp = mul(tmp, i, mod);
res = mul(res, tmp, mod), n /= p;
} return res;
}

inline ll exLucas(ll a, ll b, ll p) {
for (ll i = 2; i * i <= p; i ++) {
if (p % i == 0) {
ll t = 1LL;
while (p % i == 0) p /= i, t *= i;
ll fa = fac(a, i, t), fb = fac(b, i, t), fc = fac(a - b, i, t);
ll pk = qpow(i, pot(a, i) - pot(b, i) - pot(a - b, i), t);
m[tot] = t, r[tot ++] = mul(pk, mul(fa, mul(inv(fb, t), inv(fc, t), t), t), t);
}
}
if (p ^ 1LL) {
ll fa = fac(a, p, p), fb = fac(b, p, p), fc = fac(a - b, p, p);
ll pk = qpow(p, pot(a, p) - pot(b, p) - pot(a - b, p), p);
m[tot] = p, r[tot ++] = mul(pk, mul(fa, mul(inv(fb, p), inv(fc, p), p), p), p);
}
return CRT();
}

ll a, b, p, ans;

int main() {
scanf("%lld%lld%lld", &a, &b, &p);
ans = exLucas(a, b, p);
printf("%lld", ans);
return 0;
}


## Some optimizations (LibreOJ #181. Binomial coefficients)

The code given above can only pass some single group test samples with water data comparison. Because it is a board problem, it is not optimized.

Some other questions, such as LibreOJ #181. Binomial coefficient , this kind of data is definitely not good for malignant tumors.

The following optimization is based on modulus determination and multi group input.

### Pretreatment part

When we find the separation factorial, we can actually find the difference between the first k numbers and p α p^{\alpha} p α Product of Coprime numbers s u m [ k ] sum[k] sum[k], and b a s e base base λ \lambda λ pow b a s e _ s u m [ λ ] base\_sum[\lambda] base_sum[λ].

because b a s e base base and p α p^{\alpha} p α Mutual prime is known from Euler's theorem b a s e φ ( p α ) ≡ 1 ( m o d p α ) base^{\varphi(p^{\alpha})}\equiv 1\pmod{p^{\alpha}} baseφ(pα)≡1(modpα).

φ ( p k ) = p k − 1 φ ( p ) = p k − 1 ( p − 1 ) \varphi(p^k)=p^{k-1}\varphi(p)=p^{k-1}(p-1) φ(pk)=pk−1φ(p)=pk−1(p−1)

So when preprocessing, s u m sum sum preprocessing to p α p^{\alpha} pα， b a s e _ s u m base\_sum base_sum preprocessing to p k − 1 ( p − 1 ) p^{k-1}(p-1) pk − 1(p − 1) is sufficient.

Another is the CRT function.

set up m 1 , m 2 , ... , m k m_1,m_2,\dots,m_k m1, m2,..., mk are k k k positive integers that are mutually prime, m = ∏ m i m=\prod{m_i} m=∏mi​
order m = m i M i m=m_iM_i m=mi ﹤ Mi, then Congruence Equations { x ≡ b 1 ( m o d m 1 ) x ≡ b 2 ( m o d m 2 ) ⋮ x ≡ b k ( m o d m k ) \begin{cases}x &\equiv & b_1 \pmod{m_1} \\ x &\equiv & b_2 \pmod{m_2} \\ & \vdots & \\ x &\equiv & b_k \pmod{m_k} \end{cases} ⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧​xxx​≡≡⋮≡​b1​(modm1​)b2​(modm2​)bk​(modmk​)​
There is a unique solution x ≡ ∑ i = 1 k b i M i M i ′ ( m o d m ) x\equiv {\sum_{i=1}^{k}{b_iM_iM_i^{'}}}\pmod{m} x≡∑i=1k​bi​Mi​Mi′​(modm)， M i M i ′ ≡ 1 ( m o d m i ) M_iM_i^{'}\equiv1\pmod{m_i} Mi​Mi′​≡1(modmi​)

Since CRT needs to be called many times, we can preprocess directly M i M i ′ M_iM_i^{'} Mi ， Mi '.

After preprocessing, let's rewrite the fac function to find the pot at the same time.

inline int fac(long long x, long long & pot) {
int res = pot = 0;
while (x) {
res = res * base_sum[x / pk % phi] * sum[x % pk];
pot += x / p, x /= p;
}
return res;
}


Then the exLucas function

inline int exLucas(long long n, long long m) {
long long pa, pb, pc;
int fa = fac(n, pa);
int fb = fac(m, pb);
int fc = fac(n - m, pc);
return fa * inv(fb) * inv(fc) * qpow(p, pa - pb - pc);
}


### Complexity analysis

Let's calculate the complexity

• Unique decomposition plus preprocessing O ( log ⁡ P ∑ ( φ ( p k ) + p k + log ⁡ p k ) ) O(\log{P}\sum(\varphi(p^k)+p^k+\log{p^k})) O(logP∑(φ(pk)+pk+logpk))
• query O ( T log ⁡ P ( log ⁡ p ( n ) + log ⁡ p ( m ) + log ⁡ p ( n − m ) + log ⁡ ( p o t ) ) ) O(T\log{P}(\log_p{(n)}+\log_p{(m)}+\log_p{(n-m)}+\log(pot))) O(TlogP(logp​(n)+logp​(m)+logp​(n−m)+log(pot)))

About in total 1 e 8 1e8 1e8 and a (possibly) small constant, and then run locally for 3.6 seconds until TLE...

I felt too metaphysical, so I went to see the code submitted by the boss, and then I saw a very metaphysical optimization (modeling).

### Metaphysical optimization of big man (taking mold)

From ID: CCF_ Submission of Noi

struct FastMod {
unsigned long long b, m;
void in(unsigned long long x) {
b = x, m = (unsigned long long)((__uint128_t(1) << 64) / x);
}
int operator()(unsigned long long a) {
unsigned long long q = (unsigned long long)((__uint128_t(m) * a) >> 64), r = a - q * b;
return r >= b ? r - b : r;
}
};


Personal understanding: the efficiency of division is very low. The optimization of using bit operation instead of division to improve efficiency (?)

I've been using this myself. It's more conventional.

return (a * b - (ll)((ld)a / p * b) * p + p) % p;


I searched the efficiency of various operations in c + +

This is in What you see in this blog , I don't guarantee right or wrong, but at least I know that division is slow.

Shift > assignment > size comparison > addition > subtraction > multiplication > modulo > division.

Since the modulus is determined, we can deal with it and use bit operation to shift right instead of division, which will be faster than normal modulus.

### AC code

LibreOJ submit link Plus fast reading and pressing line, so it's a little ugly, but it shouldn't affect reading.

#include <cstdio>
#include <algorithm>
#include <vector>
#define gc() (p1 == p2 ? (p2 = buf + fread(p1 = buf, 1, 1 << 20, stdin), p1 == p2 ? EOF : *p1++) : *p1++)
#define read() ({ register long long x = 0, f = 1; register char c = gc(); while(c < '0' || c > '9') { if (c == '-') f = -1; c = gc();} while(c >= '0' && c <= '9') x = x * 10 + (c & 15), c = gc(); f * x; })
#define pc(x) (p - puf == 1 << 20 ? (fwrite(puf, 1, 1 << 20, stdout), p = puf) : 0, *p++ = x)
#define print(x, b) ({ pt(x), pc(b); })
char buf[1 << 20], *p1, *p2, puf[1 << 20], *p = puf;
int pt(int x) { return x <= 9 ? pc(x + '0') : (pt(x / 10), pc(x % 10 + '0')); }

int P;

struct FastMod {
unsigned long long b, m;
void in(unsigned long long x) {
b = x, m = (unsigned long long)((__uint128_t(1) << 64) / x);
}
int operator()(unsigned long long a) {
unsigned long long q = (unsigned long long)((__uint128_t(m) * a) >> 64), r = a - q * b;
return r >= b ? r - b : r;
}
} M;

struct node {
int p, pk, o, phi;
FastMod f;
std::vector<int> sum, base_sum;
void exgcd(int a, int b, int & x, int & y) { b ? (exgcd(b, a % b, y, x), y -= a / b * x) : (x = 1, y = 0); }
int inv(int a) { int x, y; exgcd(a, pk, x, y); return f(x + pk); }
int qpow(int x, int n, int res = 1) { for ( ; n; n >>= 1, x = f(1ll * x * x)) if (n & 1) res = f(1ll * res * x); return res; }
void set(int _p, int _pk) {
f.in(_pk), p = _p, pk = _pk, phi = _pk - _pk / _p, o = M(P / _pk * 1ll * inv(P / _pk));
sum.resize(pk), base_sum.resize(phi), sum[0] = base_sum[0] = 1;
for (register int i = 1; i < pk; i++) sum[i] = i % p ? f(1ll * sum[i-1] * i) : sum[i-1];
for (register int i = 1; i < phi; i++) base_sum[i] = f(1ll * base_sum[i-1] * sum[pk-1]);
}
int fac(long long x, long long & pot, int res = 1) {
pot = 0;
while (x) res = f(f(base_sum[x / pk % phi] * 1ll * res) * 1ll * sum[x % pk]), pot += x / p, x /= p;
return res;
}
int get(long long a, long long b) {
long long pa, pb, pc; int fa = fac(a, pa), fb = fac(b, pb), fc = fac(a - b, pc);
return f(f(f(1ll * qpow(p, pa - pb - pc) * fa) * 1ll * inv(fb)) * 1ll * inv(fc));
}
} pr[11];

long long n, m;

int main() {
register int T = read(), t = P = read(), tot = 0, ans; M.in(P);
for (register int i = 2; i <= t / i; i++) // Factorization
if (t % i == 0) {
for (ans = 1; t % i == 0; ans *= i, t /= i) continue;
pr[++ tot].set(i, ans);
}
if (t ^ 1) pr[++ tot].set(t, t);

while (T -- ) {