Handling NaNs & Type Casting Safely in NumPy: Clean Data, Correct dtypes

NaN, +Inf, -Inf basics

NaN (Not-a-Number) represents missing/invalid floats. Note:

Integer arrays cannot hold NaN — cast to float first.
Use np.isnan, np.isfinite, np.isinf to test values.
Use np.nan* functions to ignore NaNs in stats.

import numpy as np

x = np.array([1.0, np.nan, np.inf, -np.inf, 5.0])
print(np.isnan(x))    # [False  True False False False]
print(np.isfinite(x)) # [ True False False False  True]

NaN-aware statistics

Use the nan* family to skip NaNs automatically.

arr = np.array([1.0, np.nan, 3.0, 4.0])
print(np.nanmean(arr))   # 2.666...
print(np.nanmedian(arr)) # 3.0
print(np.nanstd(arr))    # ignores NaNs

Filling or removing NaNs

Common strategies: fill with constant, with column statistics, or drop.

A = np.array([[1.,  np.nan, 3.],
              [4.,  5.,     np.nan],
              [np.nan, 7.,  9.]])

# Fill with column means
col_means = np.nanmean(A, axis=0)
inds = np.where(np.isnan(A))
A[inds] = np.take(col_means, inds[1])

# Drop rows with any NaN
B = np.array([[1.,2.],[np.nan,3.],[4.,np.nan],[5.,6.]])
mask_rows_no_nan = np.all(np.isfinite(B), axis=1)
clean = B[mask_rows_no_nan]

Type casting safely with `astype`

Use astype to convert dtypes. Be mindful of precision loss and NaN handling.

# Integers can't store NaN; cast to float first:
z = np.array([1, 2, 3])
zf = z.astype('float64')
zf[1] = np.nan

# Downcasting can lose precision
f64 = np.array([1.123456789], dtype=np.float64)
f32 = f64.astype(np.float32)
print(f32)  # rounded

# Safe casting check (raises on unsafe)
try:
    np.asarray([1.2, 3.4]).astype(np.int64, casting='safe')
except TypeError as e:
    print('Safe cast blocked:', e)

Upcasting & dtype promotion

Mixed-type operations promote to a common dtype. Integers + floats become floats; operations with NaN produce float results.

a = np.array([1, 2, 3], dtype=np.int32)
b = np.array([0.5, 1.5, np.nan], dtype=np.float32)
c = a + b
print(c.dtype)    # float32 or float64 depending on platform/rules

Masked arrays (when NaN isn't enough)

np.ma.MaskedArray lets you mask invalid entries without altering dtype, useful for integer data with missing values.

data = np.array([1, -999, 3, -999, 5])   # -999 means missing
m = np.ma.masked_equal(data, -999)
print(m.mean())                           # masked value ignored
print(m.mask)                             # True where missing

Finetuning warnings & errors

Control floating-point warnings (divide by zero, invalid ops) with np.seterr or a context manager.

np.seterr(divide='warn', invalid='warn', over='ignore', under='ignore')

with np.errstate(divide='ignore', invalid='raise'):
    y = np.array([1.0, 0.0, np.nan])
    try:
        out = 1.0 / y
    except FloatingPointError:
        print('Invalid operation trapped')

I/O gotchas with NaNs

When reading CSV/TSV, declare missing values explicitly to avoid strings or wrong dtypes. See File I/O.

from io import StringIO

csv = StringIO("""a,b,c
1,,3
4,5,
,7,9
""")
arr = np.genfromtxt(csv, delimiter=',', skip_header=1)
print(arr)                 # NaNs where empty
print(arr.dtype)           # float64 (due to NaNs)

Practical patterns & tips

Choose float dtypes when you expect missing values.
Reserve int/bool for truly complete data; otherwise use masked arrays.
Be explicit with dtypes when saving/loading to preserve precision.
Use nan-functions for analytics; avoid manual masking unless necessary.
Check memory when upcasting large arrays (int32 → float64 doubles size).

Practice: quick exercises

# 1) Replace NaNs column-wise using medians
X = np.array([[1., np.nan, 3.],
              [np.nan, 5., 6.],
              [7., 8., np.nan]])
med = np.nanmedian(X, axis=0)
idx = np.where(np.isnan(X))
X[idx] = np.take(med, idx[1])
print(X)

# 2) Convert an int array to float, inject NaNs at odd indices
a = np.arange(10, dtype=np.int32)
b = a.astype(np.float64)
b[1::2] = np.nan
print(np.isnan(b).sum())

# 3) Use masked arrays to compute mean of integer data with -1 as missing
y = np.array([5, -1, 7, 9, -1, 2])
ym = np.ma.masked_equal(y, -1)
print(float(ym.mean()))

# 4) Demonstrate unsafe vs safe casting
f = np.array([1.9, 2.1])
print(f.astype(np.int64, casting='unsafe'))   # truncation
try:
    print(f.astype(np.int64, casting='safe')) # should raise
except TypeError as e:
    print('Safe cast blocked:', e)

Download the above full source code from Github or run the code in your Google colab platform.

Nan and dtypes
https://github.com/plus2net/numpy/blob/main/numpy_10_nan_dtypes.ipynb

Numpy File I/O Boolean masks Mean Views & Strides

Pandas Python - Tutorials

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.