Binomial Distribution

definition

Let's take a look at the example of playing cricket. Suppose you win a game today, it means a successful event. You played another game, but you lost. If you win a game today, it doesn't mean you will win tomorrow. Let's assign a random variable x to represent the number of wins. What is the possible value of X? It can be any value, depending on how many times you flip a coin.

There are only two possible outcomes, success and failure. Therefore, the probability of success = 0.5, and the probability of failure can be easily calculated: q = p – 1 = 0.5.

Binomial distribution is a distribution with only two possible outcomes, such as success or failure, gain or loss, win or lose. The probability of success and failure in each attempt is equal.

The results may not be equal. If the probability of success in the experiment is 0.2, the probability of failure can be easily calculated as q = 1 - 0.2 = 0.8.

Each attempt is independent, because the result of the previous throw cannot determine or affect the result of the current throw. An experiment with only two possible results and repeated N times is called binomial. The parameters of binomial distribution are n and p, where n is the total number of trials and p is the probability of success of each trial.

Based on the above description, the attributes of binomial distribution include:

  • Each experiment is independent.
  • There are only two possible outcomes in the experiment: success or failure.
  • A total of n identical tests were performed.
  • All trials have the same probability of success and failure. (the test is the same)

formula

                           

𝑁⋅𝑝 represents the mean value of the distribution

  • PMF (probability quality function): it is the definition of discrete random variable Is the probability of discrete random variable in each specific value Generally speaking, this function is used to calculate the probability of each successful event result for a discrete probability event

  • Pdf (probability density function): it is the definition of continuous random variable Different from PMF, the value of PDF at a specific point is not the probability of that point. Continuous random probability events can only calculate the probability of events in a certain area by integrating this area Generally speaking, this probability density function is used to bring the critical points (maximum and minimum) of the interval where the probability is required into the integral Is the probability of the interval

TypeError: bar() missing 1 required positional argument: 'x'

Change left to x:

 plt.bar(left=np.arange(20), 
        height=(stats.binom.pmf(np.arange(20), p=.5, n=20)), 
        width=.75,
        alpha=0.75
       ) 

plt.bar(x=np.arange(20), 
        height=(stats.binom.pmf(np.arange(20), p=.5, n=20)), 
        width=.75,
        alpha=0.75
       ) 

# IMPORTS
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import matplotlib.style as style
from IPython.core.display import HTML

# PLOTTING CONFIG
%matplotlib inline
style.use('fivethirtyeight')
plt.rcParams["figure.figsize"] = (14, 7)

plt.figure(dpi=100)

# PDF
plt.bar(x=np.arange(20), 
        height=(stats.binom.pmf(np.arange(20), p=.5, n=20)), 
        width=.75,
        alpha=0.75
       )
# CDF
plt.plot(np.arange(20),
         stats.binom.cdf(np.arange(20), p=.5, n=20),
         color="#fc4f30",
        )

# LEGEND
plt.text(x=4.5, y=.7, s="pmf (normed)", alpha=.75, weight="bold", color="#008fd5")
plt.text(x=14.5, y=.9, s="cdf", alpha=.75, weight="bold", color="#fc4f30")

# TICKS
plt.xticks(range(21)[::2])
plt.tick_params(axis = 'both', which = 'major', labelsize = 18)
plt.axhline(y = 0.005, color = 'black', linewidth = 1.3, alpha = .7)

# TITLE, SUBTITLE & FOOTER
plt.text(x = -2.5, y = 1.25, s = "Binomial Distribution - Overview",
               fontsize = 26, weight = 'bold', alpha = .75)
plt.text(x = -2.5, y = 1.1, 
         s = 'Depicted below are the normed probability mass function (pmf) and the cumulative density\nfunction (cdf) of a Binomial distributed random variable $ y \sim Binom(N, p) $, given $ N = 20$ and $p =0.5 $.',
         fontsize = 19, alpha = .85)

The impact on the results after changing the P value is shown in the following figure:

plt.figure(dpi=100)

# PDF P = .2
plt.scatter(np.arange(21),
            (stats.binom.pmf(np.arange(21), p=.2, n=20)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(21),
         (stats.binom.pmf(np.arange(21), p=.2, n=20)),
         alpha=0.75,
        )

# PDF P = .5
plt.scatter(np.arange(21),
            (stats.binom.pmf(np.arange(21), p=.5, n=20)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(21),
         (stats.binom.pmf(np.arange(21), p=.5, n=20)),
         alpha=0.75,
        )

# PDF P = .9
plt.scatter(np.arange(21),
            (stats.binom.pmf(np.arange(21), p=.9, n=20)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(21),
         (stats.binom.pmf(np.arange(21), p=.9, n=20)),
         alpha=0.75,
        )

# LEGEND
plt.text(x=3.5, y=.075, s="$p = 0.2$", alpha=.75, weight="bold", color="#008fd5")
plt.text(x=9.5, y=.075, s="$p = 0.5$", alpha=.75, weight="bold", color="#fc4f30")
plt.text(x=17.5, y=.075, s="$p = 0.9$", alpha=.75, weight="bold", color="#e5ae38")

# TICKS
plt.xticks(range(21)[::2])
plt.tick_params(axis = 'both', which = 'major', labelsize = 18)
plt.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)

# TITLE, SUBTITLE & FOOTER
plt.text(x = -2.5, y = .37, s = "Binomial Distribution - $p$",
               fontsize = 26, weight = 'bold', alpha = .75)
plt.text(x = -2.5, y = .32, 
         s = 'Depicted below are three Binomial distributed random variables with varying $p $. As one can see\nthe parameter $p$ shifts and skews the distribution.',
         fontsize = 19, alpha = .85)

The effect of changing the N value on the results is as follows:

plt.figure(dpi=100)

# PDF N = 10
plt.scatter(np.arange(11),
            (stats.binom.pmf(np.arange(11), p=.5, n=10)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(11),
         (stats.binom.pmf(np.arange(11), p=.5, n=10)),
         alpha=0.75,
        )

# PDF N = 15
plt.scatter(np.arange(16),
            (stats.binom.pmf(np.arange(16), p=.5, n=15)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(16),
         (stats.binom.pmf(np.arange(16), p=.5, n=15)),
         alpha=0.75,
        )

# PDF N = 20
plt.scatter(np.arange(21),
            (stats.binom.pmf(np.arange(21), p=.5, n=20)),
            alpha=0.75,
            s=100
       )
plt.plot(np.arange(21),
         (stats.binom.pmf(np.arange(21), p=.5, n=20)),
         alpha=0.75,
        )

# LEGEND
plt.text(x=6, y=.225, s="$N = 10$", alpha=.75, weight="bold", color="#008fd5")
plt.text(x=8.5, y=.2, s="$N = 15$", alpha=.75, weight="bold", color="#fc4f30")
plt.text(x=11, y=.175, s="$N = 20$", alpha=.75, weight="bold", color="#e5ae38")

# TICKS
plt.xticks(range(21)[::2])
plt.tick_params(axis = 'both', which = 'major', labelsize = 18)
plt.axhline(y = 0, color = 'black', linewidth = 1.3, alpha = .7)

# TITLE, SUBTITLE & FOOTER
plt.text(x = -2.5, y = .31, s = "Binomial Distribution - $N$",
               fontsize = 26, weight = 'bold', alpha = .75)
plt.text(x = -2.5, y = .27, 
         s = 'Depicted below are three Binomial distributed random variables with varying $N$. As one can see\nthe parameter $N$ streches the distribution (the larger $N$ the flatter the distribution).',
         fontsize = 19, alpha = .85)

Random variables can also be constructed:

import numpy as np
from scipy.stats import binom

# draw a single sample
np.random.seed(42)
print(binom.rvs(p=0.3, n=10), end="\n\n")

# draw 10 samples
print(binom.rvs(p=0.3, n=10, size=10), end="\n\n")
2

[5 4 3 2 2 1 5 3 4 0]

Calculate the probability of the probability mass function:
 

from scipy.stats import binom

# additional imports for plotting purpose
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (14,7)

# likelihood of x and y
x = 1
y = 7
print("pmf(X=1) = {}\npmf(X=7) = {}".format(binom.pmf(k=x, p=0.3, n=10), binom.pmf(k=y, p=0.3, n=10)))

# continuous pdf for the plot
x_s = np.arange(11)
y_s = binom.pmf(k=x_s, p=0.3, n=10)
plt.scatter(x_s, y_s, s=100);
pmf(X=1) = 0.12106082099999989
pmf(X=7) = 0.009001691999999992

Calculate the probability of cumulative probability density function:

from scipy.stats import binom

# probability of x less or equal 0.3
print("P(X <=3) = {}".format(binom.cdf(k=3, p=0.3, n=10)))

# probability of x in [-0.2, +0.2]
print("P(2 < X <= 8) = {}".format(binom.cdf(k=8, p=0.3, n=10) - binom.cdf(k=2, p=0.3, n=10)))
P(X <=3) = 0.6496107183999998
P(2 < X <= 8) = 0.6170735276999999

Keywords: Python Algorithm

Added by cosmicsea on Sat, 12 Feb 2022 00:28:18 +0200