NumPy powers Pandas under the hood. Moving between ndarray and Pandas objects lets you combine fast array math with rich labeled analytics (indexes, groupby, joins). The key is to preserve dtypes, missing values, and alignment while avoiding unnecessary copies.
Use to_numpy() for a clean conversion; it respects dtypes better than the historical .values. For mixed dtypes, Pandas may upcast to a common dtype (often object).
import numpy as np
import pandas as pd
df = pd.DataFrame({
'a': [1, 2, 3],
'b': [1.5, np.nan, 3.2],
'c': pd.Categorical(['x','y','x'])
})
A = df.to_numpy() # may become object if columns are mixed
print(A.dtype)
# Column subset with homogeneous types -> better numeric dtype
A_num = df[['a','b']].to_numpy(dtype='float64')
print(A_num.dtype, A_num.shape)
Wrap arrays with labels to enable alignment-aware operations.
X = np.arange(6).reshape(3,2)
df2 = pd.DataFrame(X, columns=['col1','col2'], index=['r1','r2','r3'])
s = pd.Series(np.array([10, 20, 30]), index=df2.index, name='bonus')
print(df2)
print(s)
Pandas aligns on index/columns when combining objects; NumPy uses positional semantics. This is powerful but can surprise you.
dfA = pd.DataFrame({'x':[1,2,3]}, index=['a','b','c'])
dfB = pd.DataFrame({'x':[10,20,30]}, index=['b','c','d'])
# Pandas aligns by index labels
print((dfA + dfB))
# rows 'a' and 'd' become NaN due to misalignment
# NumPy would add positionally if you convert to arrays:
print(dfA.to_numpy() + dfB.to_numpy()) # shapes must match
Pandas uses NA-friendly dtypes and masks; NumPy floats use NaN to represent missing values. Converting ints with missing values will often upcast to float or object.
df = pd.DataFrame({'i':[1, None, 3], 'f':[1.0, np.nan, 3.2]})
print(df.dtypes) # Int64 (nullable) and float64
A = df.to_numpy()
print(A.dtype) # may be object to hold mixed types
# Strategy: convert subsets with a numeric dtype
A_float = df[['f']].to_numpy(dtype='float64') # preserves NaN
Run fast array math on DataFrame columns using .to_numpy() or direct broadcasting—then assign back.
df = pd.DataFrame({'x': np.arange(5), 'y': np.arange(5,10)})
xz = (df['x'].to_numpy() - df['x'].mean()) / df['x'].std()
df['x_z'] = xz
print(df.head())
After NumPy computations, rebuild labeled data.
X = np.random.randn(4,3)
mu = X.mean(axis=0)
sd = X.std(axis=0)
stats = pd.DataFrame({'mean': mu, 'std': sd}, index=['c1','c2','c3'])
print(stats)
Pandas has rich dtypes like datetime64[ns], Timedelta, and Categorical. On conversion, these can become object or int64-backed arrays. Preserve intent by converting columns separately.
dates = pd.date_range('2025-01-01', periods=3, freq='D')
df = pd.DataFrame({'when': dates, 'val':[1,2,3]})
print(df['when'].dtype) # datetime64[ns]
dt64 = df['when'].to_numpy() # NumPy datetime64[ns]
print(dt64.dtype)
# Categorical -> use .cat.codes if you need numeric arrays
cats = pd.Categorical(['low','med','high'])
arr_codes = pd.Series(cats).cat.codes.to_numpy()
to_numpy() returns a NumPy view when possible (e.g., homogeneous numeric blocks) but may return a copy depending on dtypes/contiguity. Avoid accidental copies by operating columnwise and requesting explicit dtypes only when needed.
Use Pandas for alignment-heavy ops (join/merge/groupby). Use NumPy for numeric kernels, then bring results back with labels.
# Example: compute standardized column with NumPy, join back:
vals = df[['val']].to_numpy(dtype='float64').ravel()
z = (vals - vals.mean()) / vals.std()
df['z_val'] = z
rec = np.array([(1, 1.5), (2, 2.5)], dtype=[('id','i4'),('score','f8')])
df_rec = pd.DataFrame.from_records(rec)
print(df_rec)
add, sub) dispatch to NumPy under the hood.Int64) or cast to float when exporting to NumPy.import numpy as np, pandas as pd
# 1) Convert a mixed-type DataFrame; keep numeric block as float64
df = pd.DataFrame({'a':[1,2,3], 'b':[1.1, np.nan, 3.3], 'c':['x','y','x']})
num = df[['a','b']].to_numpy(dtype='float64')
print(num.dtype, num.shape)
# 2) Create two frames with different indexes; show alignment and fix with reindex
A = pd.DataFrame({'v':[1,2,3]}, index=['r1','r2','r3'])
B = pd.DataFrame({'v':[10,20,30]}, index=['r2','r3','r4'])
print(A + B) # alignment -> NaNs
print((A.reindex(B.index) + B)) # aligned on B's index
# 3) Standardize each numeric column using NumPy and assign back
X = df[['a','b']]
arr = X.to_numpy(dtype='float64')
Z = (arr - arr.mean(axis=0)) / arr.std(axis=0)
df[['a_z','b_z']] = Z
print(df.head())
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.