Handling NaNs & Type Casting Safely in NumPy

NaN, +Inf, -Inf basics

NaN (Not-a-Number) represents missing/invalid floats. Note:

  • Integer arrays cannot hold NaN — cast to float first.
  • Use np.isnan, np.isfinite, np.isinf to test values.
  • Use np.nan* functions to ignore NaNs in stats.
import numpy as np

x = np.array([1.0, np.nan, np.inf, -np.inf, 5.0])
print(np.isnan(x))    # [False  True False False False]
print(np.isfinite(x)) # [ True False False False  True]

NaN-aware statistics

Use the nan* family to skip NaNs automatically.

arr = np.array([1.0, np.nan, 3.0, 4.0])
print(np.nanmean(arr))   # 2.666...
print(np.nanmedian(arr)) # 3.0
print(np.nanstd(arr))    # ignores NaNs

Filling or removing NaNs

Common strategies: fill with constant, with column statistics, or drop.

A = np.array([[1.,  np.nan, 3.],
              [4.,  5.,     np.nan],
              [np.nan, 7.,  9.]])

# Fill with column means
col_means = np.nanmean(A, axis=0)
inds = np.where(np.isnan(A))
A[inds] = np.take(col_means, inds[1])

# Drop rows with any NaN
B = np.array([[1.,2.],[np.nan,3.],[4.,np.nan],[5.,6.]])
mask_rows_no_nan = np.all(np.isfinite(B), axis=1)
clean = B[mask_rows_no_nan]

Type casting safely with astype

Use astype to convert dtypes. Be mindful of precision loss and NaN handling.

# Integers can't store NaN; cast to float first:
z = np.array([1, 2, 3])
zf = z.astype('float64')
zf[1] = np.nan

# Downcasting can lose precision
f64 = np.array([1.123456789], dtype=np.float64)
f32 = f64.astype(np.float32)
print(f32)  # rounded

# Safe casting check (raises on unsafe)
try:
    np.asarray([1.2, 3.4]).astype(np.int64, casting='safe')
except TypeError as e:
    print('Safe cast blocked:', e)

Upcasting & dtype promotion

Mixed-type operations promote to a common dtype. Integers + floats become floats; operations with NaN produce float results.

a = np.array([1, 2, 3], dtype=np.int32)
b = np.array([0.5, 1.5, np.nan], dtype=np.float32)
c = a + b
print(c.dtype)    # float32 or float64 depending on platform/rules

Masked arrays (when NaN isn't enough)

np.ma.MaskedArray lets you mask invalid entries without altering dtype, useful for integer data with missing values.

data = np.array([1, -999, 3, -999, 5])   # -999 means missing
m = np.ma.masked_equal(data, -999)
print(m.mean())                           # masked value ignored
print(m.mask)                             # True where missing

Finetuning warnings & errors

Control floating-point warnings (divide by zero, invalid ops) with np.seterr or a context manager.

np.seterr(divide='warn', invalid='warn', over='ignore', under='ignore')

with np.errstate(divide='ignore', invalid='raise'):
    y = np.array([1.0, 0.0, np.nan])
    try:
        out = 1.0 / y
    except FloatingPointError:
        print('Invalid operation trapped')

I/O gotchas with NaNs

When reading CSV/TSV, declare missing values explicitly to avoid strings or wrong dtypes. See File I/O.

from io import StringIO

csv = StringIO("""a,b,c
1,,3
4,5,
,7,9
""")
arr = np.genfromtxt(csv, delimiter=',', skip_header=1)
print(arr)                 # NaNs where empty
print(arr.dtype)           # float64 (due to NaNs)

Practical patterns & tips

  • Choose float dtypes when you expect missing values.
  • Reserve int/bool for truly complete data; otherwise use masked arrays.
  • Be explicit with dtypes when saving/loading to preserve precision.
  • Use nan-functions for analytics; avoid manual masking unless necessary.
  • Check memory when upcasting large arrays (int32 → float64 doubles size).

Practice: quick exercises

# 1) Replace NaNs column-wise using medians
X = np.array([[1., np.nan, 3.],
              [np.nan, 5., 6.],
              [7., 8., np.nan]])
med = np.nanmedian(X, axis=0)
idx = np.where(np.isnan(X))
X[idx] = np.take(med, idx[1])
print(X)

# 2) Convert an int array to float, inject NaNs at odd indices
a = np.arange(10, dtype=np.int32)
b = a.astype(np.float64)
b[1::2] = np.nan
print(np.isnan(b).sum())

# 3) Use masked arrays to compute mean of integer data with -1 as missing
y = np.array([5, -1, 7, 9, -1, 2])
ym = np.ma.masked_equal(y, -1)
print(float(ym.mean()))

# 4) Demonstrate unsafe vs safe casting
f = np.array([1.9, 2.1])
print(f.astype(np.int64, casting='unsafe'))   # truncation
try:
    print(f.astype(np.int64, casting='safe')) # should raise
except TypeError as e:
    print('Safe cast blocked:', e)
Numpy File I/O Boolean masks Mean Views & Strides
Subhendu Mohapatra — author at plus2net
Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.



Subscribe to our YouTube Channel here



plus2net.com







Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles Contact us
©2000-2025   plus2net.com   All rights reserved worldwide Privacy Policy Disclaimer