import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex',None,'King',None],
'ID':[1,2,np.NaN,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,np.NaN,30]}
df = pd.DataFrame(data=my_dict)
print(df.notnull())
Output : All None and NOT NaN values are masked to True and others are given as False
NAME ID MATH ENGLISH
0 False False False False
1 False False False False
2 False True False False
3 True False False False
4 False False False True
5 True False False False
Check for Not Null values and map them as True
Return the masked bool values of each element.
notnull(): Filtering None and NOT NaN values C#05
In real life situation we get data in Excel or CSV file or from any database. We will try to get data from an Excel file and using the same data we will create our DataFrame.
import pandas as pd
import numpy as np
# Check your path for excel file
df = pd.read_excel('D:\student-isnull.xlsx')
print(df)
Output
id name class1 mark gender
0 1.0 John Deo Four 75.0 female
1 2.0 Max Ruin Three 85.0 male
2 NaN Arnold Three 55.0 male
3 4.0 Krish Star Four 60.0 female
4 NaN John Mike Four 60.0 female
5 6.0 Alex John Four 55.0 NaN
6 7.0 My John Rob Five 78.0 male
7 NaN NaN NaN NaN NaN
8 9.0 Tes Qry Six 78.0 male
9 10.0 NaN Four 55.0 female
10 11.0 Ronald Six NaN female
11 12.0 Recky Six 94.0 female
Checking if NaN is there or not
We can check if there is any actual data ( Not NaN) value is there or not in our DataSet.
print(df.notnull().values.any())
Output ( returns True if any value in DataFrame is real data by using any() )
True
We can check any column for presence of any Not NaN or Not None value. We are checking name column only here
In above case we can check all values by using all()
Showing rows not having any NaN value ( in a column )
print(df[~df.notnull().any(axis=1)] ) # all columns with Null value
# rows not having Null in all columns
print(df.loc[df.notnull().any(axis=1) ])
Display where id column is not having NaN value
print(df[df['id'].notnull()])
Output ( all null values in id column are removed )
id name class1 mark gender
0 1.0 John Deo Four 75.0 female
1 2.0 Max Ruin Three 85.0 male
3 4.0 Krish Star Four 60.0 female
5 6.0 Alex John Four 55.0 NaN
6 7.0 My John Rob Five 78.0 male
8 9.0 Tes Qry Six 78.0 male
9 10.0 NaN Four 55.0 female
10 11.0 Ronald Six NaN female
11 12.0 Recky Six 94.0 female
Display where name column is not having NaN value
print(df[df['name'].notnull()])
Output ( two rows 7 & 9 are removed as they are having NaN data in Name column )
id name class1 mark gender
0 1.0 John Deo Four 75.0 female
1 2.0 Max Ruin Three 85.0 male
2 NaN Arnold Three 55.0 male
3 4.0 Krish Star Four 60.0 female
4 NaN John Mike Four 60.0 female
5 6.0 Alex John Four 55.0 NaN
6 7.0 My John Rob Five 78.0 male
8 9.0 Tes Qry Six 78.0 male
10 11.0 Ronald Six NaN female
11 12.0 Recky Six 94.0 female