dropna(): Remove rows or columns based on missing values


Youtube Live session on Tkinter

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna()) # remove all rows with NaN or None values
Output is here
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
3  None  NaN   NaN      NaN
4  King  5.0  82.0      NaN
5  Alex  6.0  30.0     30.0
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
5  Alex  6.0  30.0     30.0

dropna(): Remove rows or columns based on missing values #C01

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Return the Modified DataFrame ( if inplace=True ).
axis0 (default ) or 1, decide row or column to remove.
howTakes values any or all. Check examples below.
threshint : Minimum NaN values required.
subsetLabels along other axis to consider
inplaceBoolean , along with method if value is True then original ( source ) dataframe is replaced after applying dropna()

how

any : rows are removed if any value contains NaN
all : rows are removed if all values are contains NaN
print(df.dropna(how='any'))
2,3 and 4 numbered rows are removed as it contains NaN or None values ( at least one )
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
5  Alex  6.0  30.0     30.0
We will use how=all , remove the row or column if all values contains NaN.
print(df.dropna(how='all')) 
row 3 (having all NaN values) is dropped with axis=0, output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
4  King  5.0  82.0      NaN
5  Alex  6.0  30.0     30.0

Remove the row if a perticular column has NaN value

print(df.dropna(axis=0,subset=['ENGLISH']))
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
5  Alex  6.0  30.0     30.0

axis

With axis=1 and how='any', all columns are deleted.
print(df.dropna(how='any',axis=1))
Output
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5]
how=all
print(df.dropna(axis=1,how='all'))# Nothing will be removed. 
Let us change the dataframe by keeping all NaN values to one column.
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,None,np.NaN,np.NaN,np.NaN,None]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna(axis=1,how='all')) #remove column if all data is NaN
Output : Column 'ENGLISH' is dropped as all are NaN with axis=1
   NAME   ID  MATH
0  Ravi  1.0  80.0
1  Raju  2.0  40.0
2  None  3.0  70.0
3  None  NaN   NaN
4  King  5.0  82.0
5  Alex  6.0  30.0

thresh

Minimum NaN values required
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(how='any',axis=0,thresh=3)
print(df)
Output
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
4  King  5.0  82.0      NaN

Handling NaT values

NaT : Missing value in Date and time.
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30],
         'Entry':['1/1/2020','2/1/2020',pd.NaT,
                   pd.NaT,'5/1/2020','1/2/2020']}
df = pd.DataFrame(data=my_dict)
print(df)
Remove the row if Entry column has NaT
df=df.dropna(axis=0,subset=['Entry'])
print(df)
Output
   NAME   ID  MATH  ENGLISH     Entry
0  Ravi  1.0  80.0     81.0  1/1/2020
1  Raju  2.0  40.0     70.0  2/1/2020
4  King  5.0  82.0      NaN  5/1/2020
5  None  6.0  30.0     30.0  1/2/2020

inplace

We will use inplace=True so the original DataFrame is changed.
import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df.dropna(how='all',inplace=True,axis=1)
print(df)
Output ( ENGLISH column is removed )
   NAME   ID  MATH
0  Ravi  1.0  80.0
1  Raju  2.0  40.0
2  Alex  3.0  70.0
3  None  NaN   NaN
4  King  5.0  82.0
5  None  6.0  30.0
Change the value of inplace to False and check the output
df.dropna(how='all',inplace=False,axis=1)
   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
3  None  NaN   NaN      NaN
4  King  5.0  82.0      NaN
5  None  6.0  30.0      NaN

Counting and identifying NaN values

We can count and display records with NaN by using isnull()
isnull()

Removing rows or columns by using dropna()

Rows or columns can be filled by using fillna()
fillna()

Questions

Data Cleaning
contains() to display and delete row based on Conditions
loc at mask
Pandas Pandas DataFrame iloc - rows and columns by integers
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer