Python Pandas DataFrame query to get filtered data based on condition

Pandas

Filter the data based on Query conditions
Parameters

expr : The query String to apply on the DataFrame for filtering.
inplace: Default is False , if it is set True then original DataFrame is changed.
kwargs : the keyword arguments accepted.

Examples

Let us search for a matching string. We will collect record for which NAME=Raju

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('NAME == "Raju"')
print(df)

Output

   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70

Now let us try numeric filtering.

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH > 80')
print(df)

Output

   NAME  ID  MATH  ENGLISH
4  King   5    82       60

inplace

By default inplace=Flase , this will not change the original DataFrame. By making it to True that is inplace=True we can change the original DataFrame.

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH > 80',inplace=True)
print(my_data)

Output: Now the original DataFrame is changed. ( we are printing my_data after applying query )

   NAME  ID  MATH  ENGLISH
4  King   5    82       60

Query with multiple conditions

Let us try by using two conditions with one AND operator.
Filter the records to get rows where MATH and ENGLISH both are equal to or above 50

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH >= 50 & ENGLISH >=50',inplace=True)
print(my_data)

Output is here

   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
3   Ron   4    70       50
4  King   5    82       60

In above code you can replace AND operator with OR operator to get rows with any one of the subject is equal to or more than 80.

my_data.query('MATH >= 80 | ENGLISH >=80',inplace=True)

Output

   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
4  King   5    82       60

Using variables inside query

my_limit=80
my_data.query("MATH >= @my_limit | ENGLISH >=@my_limit",inplace=True)
print(my_data)

Output

   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
4  King   5    82       60

Query using in

We can use in to get matching records by providing more than one matching conditions.

my_data.query('NAME in ["Raju","Ron"] ',inplace=True)
print(my_data)

Output

   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
3   Ron   4    70       50

Using variable with Query

Let us store our names in a list and use it inside our query.

import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_list=['Raju','Ron']
my_data.query('NAME == @my_list ',inplace=True)
print(my_data)

Output

   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
3   Ron   4    70       50

Query with isnull and notnull to check NaN data

Read more on isnull() here.

Here we have to use engine='Python' to check NaN data. We have added some data with NaN value

import pandas as pd 
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex','Ron',np.nan,'Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,np.nan,70,70,82,30],
         'ENGLISH':[81,70,40,np.nan,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH.isnull()', engine='python')
print(df)

Output ( rows where Math is NaN )

   NAME  ID  MATH  ENGLISH
1  Raju   2   NaN     70.0

Let us use notnull() in our query in NAME columns

df=my_data.query('NAME.notnull()', engine='python')
print(df)

Output ( all rows where NAME column is not having NaN )

   NAME  ID  MATH  ENGLISH
0  Ravi   1  80.0     81.0
1  Raju   2   NaN     70.0
2  Alex   3  70.0     40.0
3   Ron   4  70.0      NaN
5  Jack   6  30.0     30.0

loc at where mask

Pandas Pandas DataFrame iloc - rows and columns by integers

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

	03-10-2020
Given two dataframes, data and a subset data2, how can you filter the subset data2 using a query on the values of a variable that is in data but is not in data2 ?

	05-10-2020
Use merge