Statistical Distributions with NumPy (binomial, multinomial, poisson, gamma)

Why distributions matter

Sampling from the right probability distribution is essential for bootstrapping, A/B testing, Monte Carlo simulation, synthetic data, and stress testing. NumPy’s modern Generator API provides fast, reliable draws with clear parameterization and reproducible streams.

Setup & seeding (modern API)

import numpy as np
rng = np.random.default_rng(42)   # PCG64 by default; reproducible results

Binomial: counts of successes

When to use: Number of successes in n independent Bernoulli trials with success probability p (e.g., email opens). Parameters: (n, p).

# 1000 users, each sees 10 trials (e.g., 10 offers), p=0.2 success
samples = rng.binomial(n=10, p=0.2, size=1000)
print(samples[:10], samples.mean(), samples.var())

# Single-trial Bernoulli is binomial with n=1
bernoulli = rng.binomial(1, 0.3, size=20)

Multinomial: categorical outcomes

When to use: Counts across multiple categories for n trials with class probabilities p=[p1,...,pk] (e.g., traffic share across channels).

# 100 trials split across 3 classes with probs 0.2, 0.5, 0.3
counts = rng.multinomial(n=100, pvals=[0.2, 0.5, 0.3], size=5)
print(counts)
print(counts.sum(axis=1))  # each row sums to 100

# One-hot samples: draw 1 trial at a time and stack
onehots = rng.multinomial(1, [0.7, 0.2, 0.1], size=10)

Poisson: event counts over time/space

When to use: Number of events occurring in a fixed interval with average rate λ (e.g., calls per minute, defects per batch).

# Average 3.5 events per interval; sample 1000 intervals
counts = rng.poisson(lam=3.5, size=1000)
print(counts.mean(), counts.var())  # mean≈var≈λ for Poisson

Gamma: waiting times / positive continuous

When to use: Positive, continuous variables like waiting times; sum of exponentials. Common parameterizations: shape=k and scale=θ (mean = k·θ).

# Service times with shape=2.0 and scale=1.5
times = rng.gamma(shape=2.0, scale=1.5, size=10000)
print(times.mean(), times.std())

Sampling shapes & broadcasting

Most distribution methods accept size as tuple. Use broadcasting to simulate arrays of experiments efficiently.

# 100 experiments × 30 trials each from binomial
exp_binom = rng.binomial(n=30, p=0.4, size=(100,))

# 100 experiments × 7 days (Poisson daily counts)
weekly = rng.poisson(lam=2.2, size=(100,7))
print(weekly.shape)

Estimating parameters from samples

Quick sanity-checks by method-of-moments:

# Poisson: E[X]=Var[X]=λ
x = rng.poisson(4.0, size=5000)
print(x.mean(), x.var())

# Gamma(k, θ): mean = kθ, var = kθ^2
g = rng.gamma(2.5, 1.2, size=5000)
print(g.mean(), g.var())

Practical mini-cases

A/B test clicks: Binomial draws for control vs variant; compare conversion means and CIs.
Support load: Poisson process for calls per hour; plan staffing by quantiles.
Category forecasts: Multinomial samples for demand split across regions.
Queue times: Gamma samples to stress-test SLAs.

Working alongside your data

# Example: simulate extra demand noise around observed daily means
observed = np.array([10, 12, 9, 14, 11, 13, 8])   # 7 days
lam = observed.mean()                              # crude baseline
scenarios = rng.poisson(lam=lam, size=(1000, 7))   # 1000 weekly scenarios
p95 = np.percentile(scenarios.sum(axis=1), 95)
print('95th percentile of weekly demand:', p95)

Tips & pitfalls

Prefer np.random.default_rng() over legacy np.random.* to avoid global state.
Check parameterization: Gamma uses scale (θ), not rate.
For multinomial, pvals must sum to 1 (numerical tolerance applies).
Vectorize: draw in bulk with size=(...) instead of Python loops.

Practice: quick exercises

import numpy as np
rng = np.random.default_rng(123)

# 1) Binomial power check: simulate 10k experiments of 100 trials with p=0.55
exp = rng.binomial(100, 0.55, size=10_000)
print('Mean successes:', exp.mean())

# 2) Multinomial sanity: verify each row sums to n
M = rng.multinomial(50, [0.1, 0.2, 0.7], size=5)
print(M, M.sum(axis=1))

# 3) Poisson queueing: probability daily calls exceed 8 when λ=5
calls = rng.poisson(5, size=100_000)
print('P(X>8) ~', (calls > 8).mean())

# 4) Gamma fit intuition: empirical mean/var vs kθ and kθ^2
g = rng.gamma(3.0, 2.0, size=50_000)
print(g.mean(), g.var())

Download the above full source code from Github or run the code in your Google colab platform.

Statistical Distributions
https://github.com/plus2net/numpy/blob/main/numpy_16_statistical_distributions.ipynb

Numpy Random Generator rand() randint() randn() random_sample()

Pandas Python - Tutorials »

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.

Subscribe to our YouTube Channel here