Python Pandas DataFrame dropna() to remove labels on given axis when any data found to be missing or NaN

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna()) # remove all rows with NaN or None values

Output is here

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
3  None  NaN   NaN      NaN
4  King  5.0  82.0      NaN
5  Alex  6.0  30.0     30.0

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
5  Alex  6.0  30.0     30.0

dropna(): Remove rows or columns based on missing values #C01

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return the Modified DataFrame ( if inplace=True ).

`axis`	0 (default ) or 1, decide row or column to remove.
`how`	Takes values any or all. Check examples below.
`thresh`	int : Minimum NaN values required.
`subset`	Labels along other axis to consider
`inplace`	Boolean , along with method if value is True then original ( source ) dataframe is replaced after applying dropna()

how

any : rows are removed if any value contains NaN
all : rows are removed if all values are contains NaN

print(df.dropna(how='any'))

2,3 and 4 numbered rows are removed as it contains NaN or None values ( at least one )
Output

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
5  Alex  6.0  30.0     30.0

We will use how=all , remove the row or column if all values contains NaN.

print(df.dropna(how='all'))

row 3 (having all NaN values) is dropped with axis=0, output

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
4  King  5.0  82.0      NaN
5  Alex  6.0  30.0     30.0

Remove the row if a perticular column has NaN value

print(df.dropna(axis=0,subset=['ENGLISH']))

Output

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0     81.0
1  Raju  2.0  40.0     70.0
2  None  3.0  70.0     40.0
5  Alex  6.0  30.0     30.0

axis

With axis=1 and how='any', all columns are deleted.

print(df.dropna(how='any',axis=1))

Output

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5]

how=all

print(df.dropna(axis=1,how='all'))# Nothing will be removed.

Let us change the dataframe by keeping all NaN values to one column.

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju',None,None,'King','Alex'],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,None,np.NaN,np.NaN,np.NaN,None]}
df = pd.DataFrame(data=my_dict)
print(df)
print(df.dropna(axis=1,how='all')) #remove column if all data is NaN

Output : Column 'ENGLISH' is dropped as all are NaN with axis=1

   NAME   ID  MATH
0  Ravi  1.0  80.0
1  Raju  2.0  40.0
2  None  3.0  70.0
3  None  NaN   NaN
4  King  5.0  82.0
5  Alex  6.0  30.0

thresh

Minimum NaN values required

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df=df.dropna(how='any',axis=0,thresh=3)
print(df)

Output

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
4  King  5.0  82.0      NaN

Thresh is used for data cleaning where some threshold value of valid data is considered for retaining the rows or columns. Check here to know how data cleaning is done by using thresh option.

Handling NaT values

NaT : Missing value in Date and time.

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[81,70,40,np.NaN,np.NaN,30],
         'Entry':['1/1/2020','2/1/2020',pd.NaT,
                   pd.NaT,'5/1/2020','1/2/2020']}
df = pd.DataFrame(data=my_dict)
print(df)

Remove the row if Entry column has NaT

df=df.dropna(axis=0,subset=['Entry'])
print(df)

Output

   NAME   ID  MATH  ENGLISH     Entry
0  Ravi  1.0  80.0     81.0  1/1/2020
1  Raju  2.0  40.0     70.0  2/1/2020
4  King  5.0  82.0      NaN  5/1/2020
5  None  6.0  30.0     30.0  1/2/2020

inplace

We will use inplace=True so the original DataFrame is changed.

import pandas as pd
import numpy as np 
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
         'ID':[1,2,3,np.NaN,5,6],
         'MATH':[80,40,70,np.NaN,82,30],
         'ENGLISH':[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=my_dict)
df.dropna(how='all',inplace=True,axis=1)
print(df)

Output ( ENGLISH column is removed )

   NAME   ID  MATH
0  Ravi  1.0  80.0
1  Raju  2.0  40.0
2  Alex  3.0  70.0
3  None  NaN   NaN
4  King  5.0  82.0
5  None  6.0  30.0

Change the value of inplace to False and check the output

df.dropna(how='all',inplace=False,axis=1)

   NAME   ID  MATH  ENGLISH
0  Ravi  1.0  80.0      NaN
1  Raju  2.0  40.0      NaN
2  Alex  3.0  70.0      NaN
3  None  NaN   NaN      NaN
4  King  5.0  82.0      NaN
5  None  6.0  30.0      NaN

Counting and identifying NaN values

We can count and display records with NaN by using isnull()
isnull()

Removing rows or columns by using dropna()

Rows or columns can be filled by using fillna()
fillna()

Questions

What is the dropna() function in Pandas?
What are the parameters of the dropna() function?
What is the difference between the `how='any'` and `how='all'` options in the dropna() function?
What is the `thresh` parameter in the dropna() function?
What is the `subset` parameter in the dropna() function?
What is the `inplace` parameter in the dropna() function?
What are the advantages and disadvantages of using the dropna() function?
How can you use the dropna() function to handle missing values in a Pandas DataFrame?

How does the dropna() function interact with the fillna() function?
How can you use the dropna() function to drop rows or columns that contain a certain percentage of missing values?
How can you use the dropna() function to drop rows or columns that contain only missing values?

Data Cleaning
contains() to display and delete row based on Conditions
loc at mask
Pandas Pandas DataFrame iloc - rows and columns by integers

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here