NaN
(Not-a-Number) represents missing/invalid floats. Note:
np.isnan
, np.isfinite
, np.isinf
to test values.np.nan*
functions to ignore NaNs in stats.import numpy as np
x = np.array([1.0, np.nan, np.inf, -np.inf, 5.0])
print(np.isnan(x)) # [False True False False False]
print(np.isfinite(x)) # [ True False False False True]
Use the nan*
family to skip NaNs automatically.
arr = np.array([1.0, np.nan, 3.0, 4.0])
print(np.nanmean(arr)) # 2.666...
print(np.nanmedian(arr)) # 3.0
print(np.nanstd(arr)) # ignores NaNs
Common strategies: fill with constant, with column statistics, or drop.
A = np.array([[1., np.nan, 3.],
[4., 5., np.nan],
[np.nan, 7., 9.]])
# Fill with column means
col_means = np.nanmean(A, axis=0)
inds = np.where(np.isnan(A))
A[inds] = np.take(col_means, inds[1])
# Drop rows with any NaN
B = np.array([[1.,2.],[np.nan,3.],[4.,np.nan],[5.,6.]])
mask_rows_no_nan = np.all(np.isfinite(B), axis=1)
clean = B[mask_rows_no_nan]
astype
Use astype
to convert dtypes. Be mindful of precision loss and NaN handling.
# Integers can't store NaN; cast to float first:
z = np.array([1, 2, 3])
zf = z.astype('float64')
zf[1] = np.nan
# Downcasting can lose precision
f64 = np.array([1.123456789], dtype=np.float64)
f32 = f64.astype(np.float32)
print(f32) # rounded
# Safe casting check (raises on unsafe)
try:
np.asarray([1.2, 3.4]).astype(np.int64, casting='safe')
except TypeError as e:
print('Safe cast blocked:', e)
Mixed-type operations promote to a common dtype. Integers + floats become floats; operations with NaN produce float results.
a = np.array([1, 2, 3], dtype=np.int32)
b = np.array([0.5, 1.5, np.nan], dtype=np.float32)
c = a + b
print(c.dtype) # float32 or float64 depending on platform/rules
np.ma.MaskedArray
lets you mask invalid entries without altering dtype, useful for integer data with missing values.
data = np.array([1, -999, 3, -999, 5]) # -999 means missing
m = np.ma.masked_equal(data, -999)
print(m.mean()) # masked value ignored
print(m.mask) # True where missing
Control floating-point warnings (divide by zero, invalid ops) with np.seterr
or a context manager.
np.seterr(divide='warn', invalid='warn', over='ignore', under='ignore')
with np.errstate(divide='ignore', invalid='raise'):
y = np.array([1.0, 0.0, np.nan])
try:
out = 1.0 / y
except FloatingPointError:
print('Invalid operation trapped')
When reading CSV/TSV, declare missing values explicitly to avoid strings or wrong dtypes. See File I/O.
from io import StringIO
csv = StringIO("""a,b,c
1,,3
4,5,
,7,9
""")
arr = np.genfromtxt(csv, delimiter=',', skip_header=1)
print(arr) # NaNs where empty
print(arr.dtype) # float64 (due to NaNs)
# 1) Replace NaNs column-wise using medians
X = np.array([[1., np.nan, 3.],
[np.nan, 5., 6.],
[7., 8., np.nan]])
med = np.nanmedian(X, axis=0)
idx = np.where(np.isnan(X))
X[idx] = np.take(med, idx[1])
print(X)
# 2) Convert an int array to float, inject NaNs at odd indices
a = np.arange(10, dtype=np.int32)
b = a.astype(np.float64)
b[1::2] = np.nan
print(np.isnan(b).sum())
# 3) Use masked arrays to compute mean of integer data with -1 as missing
y = np.array([5, -1, 7, 9, -1, 2])
ym = np.ma.masked_equal(y, -1)
print(float(ym.mean()))
# 4) Demonstrate unsafe vs safe casting
f = np.array([1.9, 2.1])
print(f.astype(np.int64, casting='unsafe')) # truncation
try:
print(f.astype(np.int64, casting='safe')) # should raise
except TypeError as e:
print('Safe cast blocked:', e)
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.