import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('NAME == "Raju"')
print(df)
Output
NAME ID MATH ENGLISH
1 Raju 2 40 70
Now let us try numeric filtering.
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH > 80')
print(df)
Output
NAME ID MATH ENGLISH
4 King 5 82 60
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH > 80',inplace=True)
print(my_data)
Output: Now the original DataFrame is changed. ( we are printing my_data after applying query )
NAME ID MATH ENGLISH
4 King 5 82 60
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH >= 50 & ENGLISH >=50',inplace=True)
print(my_data)
Output is here
NAME ID MATH ENGLISH
0 Ravi 1 80 81
3 Ron 4 70 50
4 King 5 82 60
In above code you can replace AND operator with OR operator to get rows with any one of the subject is equal to or more than 80.
my_data.query('MATH >= 80 | ENGLISH >=80',inplace=True)
Output
NAME ID MATH ENGLISH
0 Ravi 1 80 81
4 King 5 82 60
Using variables inside query
my_limit=80
my_data.query("MATH >= @my_limit | ENGLISH >=@my_limit",inplace=True)
print(my_data)
Output
NAME ID MATH ENGLISH
0 Ravi 1 80 81
4 King 5 82 60
my_data.query('NAME in ["Raju","Ron"] ',inplace=True)
print(my_data)
Output
NAME ID MATH ENGLISH
1 Raju 2 40 70
3 Ron 4 70 50
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,40,70,70,82,30],
'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_list=['Raju','Ron']
my_data.query('NAME == @my_list ',inplace=True)
print(my_data)
Output
NAME ID MATH ENGLISH
1 Raju 2 40 70
3 Ron 4 70 50
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex','Ron',np.nan,'Jack'],
'ID':[1,2,3,4,5,6],
'MATH':[80,np.nan,70,70,82,30],
'ENGLISH':[81,70,40,np.nan,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH.isnull()', engine='python')
print(df)
Output ( rows where Math is NaN )
NAME ID MATH ENGLISH
1 Raju 2 NaN 70.0
Let us use notnull() in our query in NAME columns
df=my_data.query('NAME.notnull()', engine='python')
print(df)
Output ( all rows where NAME column is not having NaN )
NAME ID MATH ENGLISH
0 Ravi 1 80.0 81.0
1 Raju 2 NaN 70.0
2 Alex 3 70.0 40.0
3 Ron 4 70.0 NaN
5 Jack 6 30.0 30.0
loc
at
where
mask03-10-2020 | |
Given two dataframes, data and a subset data2, how can you filter the subset data2 using a query on the values of a variable that is in data but is not in data2 ? |
05-10-2020 | |
Use merge |