« Pandas « Data Cleaning
dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Return the Modified DataFrame ( if inplace=True ).
axis | 0 or 1, decide row or column to remove. |
how | Takes values any or all. Check examples below. |
thresh | int : Minimum NaN values required. |
subset | Labels along other axis to consider |
inplace | Boolean , along with method if value is True then original ( source ) dataframe is replaced after applying dropna() |
Examples using options
Here is DataFrame with NaN values .
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df)
Output is here
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
2 Alex 3.0 70.0 40.0
3 None NaN NaN NaN
4 King 5.0 82.0 NaN
5 None 6.0 30.0 30.0
how
any : rows are removed if any value contains NaN
all : rows are removed if all values are contains NaN
df=df.dropna(how='any')
print(df)
3, 4 and 5 numbered rows are removed as it contains NaN or None values ( at least one )
Output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
2 Alex 3.0 70.0 40.0
We will use how=all
df=df.dropna(how='all')
print(df)
row 3 is dropped with axis=0, output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
2 Alex 3.0 70.0 40.0
4 King 5.0 82.0 NaN
5 None 6.0 30.0 30.0
Remove the row if a perticular column has Null value
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,np.NaN,4,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(axis=0,subset=['ENGLISH'])
print(df)
Output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 81.0
1 Raju 2.0 40.0 70.0
2 Alex NaN 70.0 40.0
5 None 6.0 30.0 30.0
axis
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
print(df)
Output with axis=1
df=df.dropna(how='any',axis=1)
print(df)
all columns are deleted.
df=df.dropna(how='any',axis=1)
print(df)
Output
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5]
how=all
df=df.dropna(how='all',axis=1)
print(df)
Output : Column 'ENGLISH' is dropped as all are NaN with axis=1
NAME ID MATH
0 Ravi 1.0 80.0
1 Raju 2.0 40.0
2 Alex 3.0 70.0
3 None NaN NaN
4 King 5.0 82.0
5 None 6.0 30.0
thresh
Minimum NaN values required
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(how='any',axis=0,thresh=3)
print(df)
Output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 NaN
1 Raju 2.0 40.0 NaN
2 Alex 3.0 70.0 NaN
4 King 5.0 82.0 NaN
Handling NaT values
NaT : Missing value in Date and time.
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[81,70,40,np.NaN,np.NaN,30],
'Entry':['1/1/2020','2/1/2020',pd.NaT,
pd.NaT,'5/1/2020','1/2/2020']}
df = pd.DataFrame(data=my_dict)
print(df)
Remove the row if Entry column has NaT
df=df.dropna(axis=0,subset=['Entry'])
print(df)
Output
NAME ID MATH ENGLISH Entry
0 Ravi 1.0 80.0 81.0 1/1/2020
1 Raju 2.0 40.0 70.0 2/1/2020
4 King 5.0 82.0 NaN 5/1/2020
5 None 6.0 30.0 30.0 1/2/2020
inplace
We will use inplace=True so the original DataFrame is changed.
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,3,np.NaN,5,6],
'MATH':[80,40,70,np.NaN,82,30],
'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df.dropna(how='all',inplace=True)
print(df)
Output
NAME ID MATH ENGLISH
0 Ravi 1.0 80.0 NaN
1 Raju 2.0 40.0 NaN
2 Alex 3.0 70.0 NaN
4 King 5.0 82.0 NaN
5 None 6.0 30.0 NaN
Counting and identifying NaN values
We can count and display records with NaN by using isnull()
« isnull()
Removing rows or columns by using dropna()
Rows or columns can be filled by using fillna()
« fillna()
Data Cleaning
contains() to display and delete row based on Conditions »
« loc « at « mask
« Pandas
Pandas DataFrame
iloc - rows and columns by integers »
← Subscribe to our YouTube Channel here