dropna(): Remove rows or columns based on missing values
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna()) # remove all rows with NaN or None values
Output is here
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
2 None 3.0 70.0 40.0
3 None NaN NaN NaN
4 King 5.0 82.0 NaN
5 Alex 6.0 30.0 30.0
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
5 Alex 6.0 30.0 30.0
dropna(): Remove rows or columns based on missing values #C01
print(df.dropna(axis=1,how='all'))# Nothing will be removed.
Let us change the dataframe by keeping all NaN values to one column.
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[np.NaN,None,np.NaN,np.NaN,np.NaN,None]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna(axis=1,how='all')) #remove column if all data is NaN
Output : Column 'ENGLISH' is dropped as all are NaN with axis=1
NAME ID MATH
0 Ravi 1.0 80.0
1 Raju 2.0 40.0
2 None 3.0 70.0
3 None NaN NaN
4 King 5.0 82.0
5 Alex 6.0 30.0
thresh
Minimum NaN values required
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(how='any',axis=0,thresh=3)
print(df)
Output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 NaN
1 Raju 2.0 40.0 NaN
2 Alex 3.0 70.0 NaN
4 King 5.0 82.0 NaN