NumPy Random Generator (PCG64) & Reproducibility: Seeding, Streams, Best Practices

Generator vs legacy `np.random.*`

Modern NumPy uses a two-layer RNG design: a BitGenerator (raw random bits, e.g., PCG64) and a Generator (user-facing API: normal, integers, choice, etc.). Prefer np.random.default_rng() instead of the legacy global functions to avoid hidden global state.

import numpy as np

# Modern API (recommended)
rng = np.random.default_rng(seed=42)  # PCG64 by default
print(rng.integers(0, 10, size=5))
print(rng.normal(0, 1, size=3))

# Legacy (still works, not recommended for new code)
# np.random.seed(42)
# np.random.randint(0, 10, 5)

Choose a BitGenerator (PCG64 default)

PCG64 is the default BitGenerator, offering excellent statistical quality and speed. You can also use Philox, SFC64, etc., but PCG64 is usually a solid choice.

from numpy.random import Generator, PCG64, Philox

rng_pcg = Generator(PCG64(12345))
rng_philox = Generator(Philox(2025))

print(rng_pcg.integers(10, size=4))
print(rng_philox.uniform(size=3))

Reproducibility with `SeedSequence`

SeedSequence expands an initial seed into many independent child seeds—great for creating multiple reproducible streams (e.g., parallel workers).

from numpy.random import SeedSequence, PCG64, Generator

ss = SeedSequence(2025)
# Spawn 3 independent child sequences
child_ss = ss.spawn(3)
streams = [Generator(PCG64(s)) for s in child_ss]

# Each stream is independent and reproducible
for i, rng in enumerate(streams, 1):
    print('stream', i, rng.integers(0, 100, size=3))

Parallel streams: safe patterns

Avoid sharing a single RNG across threads/processes. Instead, create one Generator per worker using SeedSequence.spawn. This prevents overlapping sequences and contention.

# Example sketch (no actual multiprocessing shown)
base = SeedSequence(123)
children = base.spawn(4)
rngs = [Generator(PCG64(s)) for s in children]

# worker k uses rngs[k] exclusively

Saving & restoring RNG state

You can serialize either the state (exact position in the sequence) or the seed (to regenerate from start). State-based replay resumes mid-stream; seed-based replay starts from the beginning.

import numpy as np

rng = np.random.default_rng(7)
state = rng.bit_generator.state            # dict

# Generate some numbers
a = rng.standard_normal(5)

# Restore exact state later
rng2 = np.random.default_rng()
rng2.bit_generator.state = state
b = rng2.standard_normal(5)

print(np.allclose(a, b))  # True (same continuation)

Distributions in `Generator`

Common methods: integers, random/random(size), choice, normal, lognormal, poisson, gamma, beta, binomial, etc.

rng = np.random.default_rng(101)
print(rng.choice(['A','B','C'], size=5, replace=True, p=[0.2,0.5,0.3]))
print(rng.normal(loc=10, scale=2, size=(2,3)))
print(rng.poisson(lam=3.5, size=4))

Replacing legacy calls

np.random.rand(n) → rng.random(n)
np.random.randn(n) → rng.standard_normal(n) or rng.normal()
np.random.randint(a, b) → rng.integers(a, b)
np.random.choice(...) → rng.choice(...)

Good practices for reproducible science

Use Generator instances—avoid the global legacy state.
Record your initial seed (and optionally SeedSequence.entropy).
For parallel jobs, create per-worker RNGs via SeedSequence.spawn.
Save RNG state if you need to resume exactly mid-stream.
Note NumPy version in your experiments; streams are stable across sessions but library changes can affect higher-level behavior.

File I/O: storing seeds & states

Keep experiment seeds in config files or store bit_generator.state via JSON. For bulk arrays, save with .npy/.npz (see File I/O page).

import json, numpy as np

rng = np.random.default_rng(2026)
state = rng.bit_generator.state
with open('rng_state.json', 'w') as f:
    json.dump(state, f)

# Later:
with open('rng_state.json') as f:
    state2 = json.load(f)
rng2 = np.random.default_rng()
rng2.bit_generator.state = state2

Quality & speed notes

PCG64: excellent statistical properties and fast general-purpose choice.
Philox: good for parallel workloads and counter-based usage.
Vectorize draws (size=(...)) instead of looping Python.

Practice: quick exercises

# 1) Create three independent RNG streams using SeedSequence.spawn and draw 2 Poisson numbers from each.
from numpy.random import SeedSequence, PCG64, Generator
ss = SeedSequence(1234)
streams = [Generator(PCG64(s)) for s in ss.spawn(3)]
for i, r in enumerate(streams, 1):
    print(i, r.poisson(2.5, size=2))

# 2) Save RNG state after 5 draws, restore it, and confirm the next 5 draws match
rng = np.random.default_rng(77)
first = rng.normal(size=5)
st = rng.bit_generator.state
cont1 = rng.normal(size=5)

rng2 = np.random.default_rng()
rng2.bit_generator.state = st
cont2 = rng2.normal(size=5)
print(np.allclose(cont1, cont2))

# 3) Replace legacy np.random calls with Generator equivalents in a toy script.

# 4) Simulate a biased 6-sided die with probabilities and verify empirical frequencies.
rng = np.random.default_rng(9)
p = np.array([0.10, 0.15, 0.20, 0.25, 0.20, 0.10])
rolls = rng.choice(np.arange(1,7), size=10_000, p=p)
print(np.round(np.bincount(rolls, minlength=7)[1:] / rolls.size, 3))

Download the above full source code from Github or run the code in your Google colab platform.

Random Generator
https://github.com/plus2net/numpy/blob/main/numpy_9_random_generator.ipynb

Numpy rand() randint() randn() random_sample() File I/O

Pandas Python - Tutorials »

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.