Sampling from the right probability distribution is essential for bootstrapping, A/B testing, Monte Carlo simulation, synthetic data, and stress testing. NumPy’s modern Generator
API provides fast, reliable draws with clear parameterization and reproducible streams.
import numpy as np
rng = np.random.default_rng(42) # PCG64 by default; reproducible results
When to use: Number of successes in n
independent Bernoulli trials with success probability p
(e.g., email opens). Parameters: (n, p)
.
# 1000 users, each sees 10 trials (e.g., 10 offers), p=0.2 success
samples = rng.binomial(n=10, p=0.2, size=1000)
print(samples[:10], samples.mean(), samples.var())
# Single-trial Bernoulli is binomial with n=1
bernoulli = rng.binomial(1, 0.3, size=20)
When to use: Counts across multiple categories for n
trials with class probabilities p=[p1,...,pk]
(e.g., traffic share across channels).
# 100 trials split across 3 classes with probs 0.2, 0.5, 0.3
counts = rng.multinomial(n=100, pvals=[0.2, 0.5, 0.3], size=5)
print(counts)
print(counts.sum(axis=1)) # each row sums to 100
# One-hot samples: draw 1 trial at a time and stack
onehots = rng.multinomial(1, [0.7, 0.2, 0.1], size=10)
When to use: Number of events occurring in a fixed interval with average rate λ
(e.g., calls per minute, defects per batch).
# Average 3.5 events per interval; sample 1000 intervals
counts = rng.poisson(lam=3.5, size=1000)
print(counts.mean(), counts.var()) # mean≈var≈λ for Poisson
When to use: Positive, continuous variables like waiting times; sum of exponentials. Common parameterizations: shape=k
and scale=θ
(mean = k·θ).
# Service times with shape=2.0 and scale=1.5
times = rng.gamma(shape=2.0, scale=1.5, size=10000)
print(times.mean(), times.std())
Most distribution methods accept size
as tuple. Use broadcasting to simulate arrays of experiments efficiently.
# 100 experiments × 30 trials each from binomial
exp_binom = rng.binomial(n=30, p=0.4, size=(100,))
# 100 experiments × 7 days (Poisson daily counts)
weekly = rng.poisson(lam=2.2, size=(100,7))
print(weekly.shape)
Quick sanity-checks by method-of-moments:
# Poisson: E[X]=Var[X]=λ
x = rng.poisson(4.0, size=5000)
print(x.mean(), x.var())
# Gamma(k, θ): mean = kθ, var = kθ^2
g = rng.gamma(2.5, 1.2, size=5000)
print(g.mean(), g.var())
# Example: simulate extra demand noise around observed daily means
observed = np.array([10, 12, 9, 14, 11, 13, 8]) # 7 days
lam = observed.mean() # crude baseline
scenarios = rng.poisson(lam=lam, size=(1000, 7)) # 1000 weekly scenarios
p95 = np.percentile(scenarios.sum(axis=1), 95)
print('95th percentile of weekly demand:', p95)
np.random.default_rng()
over legacy np.random.*
to avoid global state.pvals
must sum to 1 (numerical tolerance applies).size=(...)
instead of Python loops.import numpy as np
rng = np.random.default_rng(123)
# 1) Binomial power check: simulate 10k experiments of 100 trials with p=0.55
exp = rng.binomial(100, 0.55, size=10_000)
print('Mean successes:', exp.mean())
# 2) Multinomial sanity: verify each row sums to n
M = rng.multinomial(50, [0.1, 0.2, 0.7], size=5)
print(M, M.sum(axis=1))
# 3) Poisson queueing: probability daily calls exceed 8 when λ=5
calls = rng.poisson(5, size=100_000)
print('P(X>8) ~', (calls > 8).mean())
# 4) Gamma fit intuition: empirical mean/var vs kθ and kθ^2
g = rng.gamma(3.0, 2.0, size=50_000)
print(g.mean(), g.var())
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.