Pandas DataFrame query

Pandas

Filter the data based on Query conditions
Parameters

expr : The query String to apply on the DataFrame for filtering.
inplace: Default is False , if it is set True then original DataFrame is changed.
kwargs : the keyword arguments accepted.

Examples

Let us search for a matching string. We will collect record for which NAME=Raju
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('NAME == "Raju"')
print(df)
Output
   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
Now let us try numeric filtering.
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH > 80')
print(df)
Output
   NAME  ID  MATH  ENGLISH
4  King   5    82       60

inplace

By default inplace=Flase , this will not change the original DataFrame. By making it to True that is inplace=True we can change the original DataFrame.
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH > 80',inplace=True)
print(my_data)
Output: Now the original DataFrame is changed. ( we are printing my_data after applying query )
   NAME  ID  MATH  ENGLISH
4  King   5    82       60

Query with multiple conditions

Let us try by using two conditions with one AND operator.
Filter the records to get rows where MATH and ENGLISH both are equal to or above 50
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_data.query('MATH >= 50 & ENGLISH >=50',inplace=True)
print(my_data)
Output is here
   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
3   Ron   4    70       50
4  King   5    82       60
In above code you can replace AND operator with OR operator to get rows with any one of the subject is equal to or more than 80.
my_data.query('MATH >= 80 | ENGLISH >=80',inplace=True)
Output
   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
4  King   5    82       60
Using variables inside query
my_limit=80
my_data.query("MATH >= @my_limit | ENGLISH >=@my_limit",inplace=True)
print(my_data)
Output
   NAME  ID  MATH  ENGLISH
0  Ravi   1    80       81
4  King   5    82       60

Query using in

We can use in to get matching records by providing more than one matching conditions.
my_data.query('NAME in ["Raju","Ron"] ',inplace=True)
print(my_data)
Output
   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
3   Ron   4    70       50

Using variable with Query

Let us store our names in a list and use it inside our query.
import pandas as pd 
my_dict={'NAME':['Ravi','Raju','Alex','Ron','King','Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,40,70,70,82,30],
         'ENGLISH':[81,70,40,50,60,30]}
my_data = pd.DataFrame(data=my_dict)
my_list=['Raju','Ron']
my_data.query('NAME == @my_list ',inplace=True)
print(my_data)
Output
   NAME  ID  MATH  ENGLISH
1  Raju   2    40       70
3   Ron   4    70       50

Query with isnull and notnull to check NaN data

Read more on isnull() here.

Here we have to use engine='Python' to check NaN data. We have added some data with NaN value
import pandas as pd 
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex','Ron',np.nan,'Jack'],
         'ID':[1,2,3,4,5,6],
         'MATH':[80,np.nan,70,70,82,30],
         'ENGLISH':[81,70,40,np.nan,60,30]}
my_data = pd.DataFrame(data=my_dict)
df=my_data.query('MATH.isnull()', engine='python')
print(df)
Output ( rows where Math is NaN )
   NAME  ID  MATH  ENGLISH
1  Raju   2   NaN     70.0
Let us use notnull() in our query in NAME columns
df=my_data.query('NAME.notnull()', engine='python')
print(df)
Output ( all rows where NAME column is not having NaN )
   NAME  ID  MATH  ENGLISH
0  Ravi   1  80.0     81.0
1  Raju   2   NaN     70.0
2  Alex   3  70.0     40.0
3   Ron   4  70.0      NaN
5  Jack   6  30.0     30.0
loc at where mask

Pandas Pandas DataFrame iloc - rows and columns by integers


plus2net.com



03-10-2020

Given two dataframes, data and a subset data2, how can you filter the subset data2 using a query on the values of a variable that is in data but is not in data2 ?

05-10-2020

Use merge

Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer