Pandas DataFrame dropna()

Pandas Data Cleaning

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)


Return the Modified DataFrame ( if inplace=True ).
axis0 or 1, decide row or column to remove.
howTakes values any or all. Check examples below.
threshint : Minimum NaN values required.
subsetLabels along other axis to consider
inplaceBoolean , along with method if value is True then original ( source ) dataframe is replaced after applying dropna()

Examples using options

Here is DataFrame with NaN values .
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df)
Output is here
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  Alex  3.0  70.0     40.0
3  None  NaN   NaN      NaN
4  King  5.0  82.0      NaN
5  None  6.0  30.0     30.0

how

any : rows are removed if any value contains NaN
all : rows are removed if all values are contains NaN
df=df.dropna(how='any')
print(df)
3, 4 and 5 numbered rows are removed as it contains NaN or None values ( at least one )
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  Alex  3.0  70.0     40.0
We will use how=all
df=df.dropna(how='all')
print(df) 
row 3 is dropped with axis=0, output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  Alex  3.0  70.0     40.0
4  King  5.0  82.0      NaN
5  None  6.0  30.0     30.0

Remove the row if a perticular column has Null value

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,np.NaN,4,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(axis=0,subset=['ENGLISH'])
print(df)
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  Alex  NaN  70.0     40.0
5  None  6.0  30.0     30.0

axis

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
print(df)
Output with axis=1
df=df.dropna(how='any',axis=1)
print(df)
all columns are deleted.
df=df.dropna(how='any',axis=1)
print(df)
Output
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5]
how=all
df=df.dropna(how='all',axis=1)
print(df)
Output : Column 'ENGLISH' is dropped as all are NaN with axis=1
   NAME   ID  MATH
0  Ravi  1.0  80.0
1  Raju  2.0  40.0
2  Alex  3.0  70.0
3  None  NaN   NaN
4  King  5.0  82.0
5  None  6.0  30.0

thresh

Minimum NaN values required
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(how='any',axis=0,thresh=3)
print(df)
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
4  King  5.0  82.0      NaN

Handling NaT values

NaT : Missing value in Date and time.
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30],
         'Entry':['1/1/2020','2/1/2020',pd.NaT,
                   pd.NaT,'5/1/2020','1/2/2020']}
df = pd.DataFrame(data=my_dict)
print(df)
Remove the row if Entry column has NaT
df=df.dropna(axis=0,subset=['Entry'])
print(df)
Output
   NAME   ID  MATH  ENGLISH     Entry
0  Ravi  1.0  80.0     81.0  1/1/2020
1  Raju  2.0  40.0     70.0  2/1/2020
4  King  5.0  82.0      NaN  5/1/2020
5  None  6.0  30.0     30.0  1/2/2020

inplace

We will use inplace=True so the original DataFrame is changed.
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df.dropna(how='all',inplace=True)
print(df)
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
4  King  5.0  82.0      NaN
5  None  6.0  30.0      NaN

Counting and identifying NaN values

We can count and display records with NaN by using isnull()
isnull()

Removing rows or columns by using dropna()

Rows or columns can be filled by using fillna()
fillna()
Data Cleaning
contains() to display and delete row based on Conditions
loc at mask
Pandas Pandas DataFrame iloc - rows and columns by integers
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2023 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer