Pandas loc to get rows based on column labels using callback functions from DataFrame
Difference between iloc and loc
loc is label based so we have to use column names. iloc is position based ( integer ).
We used DataFrame of 35 rows by reading from a CSV file.
For better understanding we can use bigger DataFrame with more rows. Copy and use or download the CSV or Excel file to create the sample DataFrame with 35 rows and 5 columns. Student ( Sample ) DataFrame →
id name class mark gender
0 1 John Deo Four 75 female
1 2 Max Ruin Three 85 male
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
----------------
----------------
---------------
You can display all 35 rows , Now let us try our scripts.
This line will display one column class of all rows
df.loc[:,'class'] # all rows of column class
Output
0 Four
1 Three
2 Three
-------
-------
All 35 rows will be displayed in above code.
All rows showing with two columns class, name
df.loc[:,['class','name']] # all rows of column class and name
class name
0 Four John Deo
1 Three Max Ruin
2 Three Arnold
3 Four Krish Star
------------
------------
------------
All 35 rows will be displayed in above code
Value at position
First rows with two columns
df.loc[0,['class','name']] # first row with two columns class and name
Output
class Four
name John Deo
Name: 0, dtype: object
2nd and 3rd rows with two columns
df.loc[[2,3],['class','name']] # 2nd and 3rd row with two columns class and name
Output
class name
2 Three Arnold
3 Four Krish Star
2nd and 3rd rows with all columns
df.loc[[2,3],:] # 2nd and 3rd row with two columns class and name
Output
id name class mark gender
2 3 Arnold Three 55 male
3 4 Krish Star Four 60 female
Matching data
While reading csv file we will add index to name column and then search.
import pandas as pd
df=pd.read_csv('student.csv',index_col='name')
df.loc['Tade Row']
Output
id 15
class Four
mark 88
gender male
Name: Tade Row, dtype: object
Matching two names
df.loc[['Tade Row','John Deo']] # printing row with two name matching
Output
id class mark gender
name
Tade Row 15 Four 88 male
John Deo 1 Four 75 female
All matching records
import pandas as pd
df=pd.read_csv('test.csv',index_col='class')
df.loc['Four'] # printing all rows with class='Four'
Output
id name mark gender
class
Four 1 John Deo 75 female
Four 4 Krish Star 60 female
Four 5 John Mike 60 female
Four 6 Alex John 55 male
Four 10 Big John 55 female
Four 15 Tade Row 88 male
Four 16 Gimmy 88 male
Four 21 Babby John 69 female
Four 31 Marry Toeey 88 male
Rows between ranges
All the records between two id
import pandas as pd
df=pd.read_csv('student.csv',index_col='id')
df.loc[4:6] # printing rows between two ids
Output
name class mark gender
id
4 Krish Star Four 60 female
5 John Mike Four 60 female
6 Alex John Four 55 male
In above code we have two unique ids and rows between these two ids will be displayed. If data are not unique then there will be an error. Try be using two class names, you will get error "Cannot get left slice bound for non-unique label: 'Five'"
Between two names ( column data)
import pandas as pd
df=pd.read_csv('test.csv',index_col='name')
#print(df)
df.loc['Ronald':'Bigy'] # Records between two names
Here are records between name 'Ronald' and 'Bigy'
id class mark gender
name
Ronald 11 Six 89 female
Recky 12 Six 94 female
Kty 13 Seven 88 female
Bigy 14 Seven 88 female
We can search name column after declaring as index column.
df= pd.read_csv('student.csv',index_col='name') # DataFrame
print(df.loc['Arnold','mark':]) # Mark of Arnold ( name )
print(df.loc[:,'mark']) # All marks ( with name )
print(df.loc['Arnold',:]) # All columns of Arnold
Using Callable function
print(df.loc[lambda x: x.name == 'Arnold'])
print(df.loc[lambda x: list(x.mark > 90)]) # mark more than 90
print(df.loc[lambda x: list(x.gender == 'male'),:])
print(df.loc[lambda x: list(x.name.str.len()>10),:]) # name is not index col
print(df.loc[lambda x: list(x.index.str.len()>10),:]) # name is index col
Record of matching string
import pandas as pd
df=pd.read_csv('student.csv')
print(df.loc[df['name']=='Alex John']) # Search for name
Output
id name class mark gender
5 6 Alex John Four 55 male
Partial matching string
In above code we have filtered records with exact matching. Sometime we have to search for partial matching.
Let us find out rows where part of the column string matching with search string.
import pandas as pd
df=pd.read_csv("student.csv")
df[df.name.str.contains('De',case=False)]
Output is here
id name class mark gender
0 1 John Deo Four 75 female
14 15 Tade Row Four 88 male
endswith : string ending with
print(df.loc[df['name'].str.endswith('John')])
Output
id name class mark gender
5 6 Alex John Four 55 male
9 10 Big John Four 55 female
20 21 Babby John Four 69 female
Not ending with
print(df.loc[~df['name'].str.endswith('John')])
startswith : string starting with
import pandas as pd
df=pd.read_csv("student.csv")
df[df.name.str.startswith('Jo')]
Output
id name class mark gender
0 1 John Deo Four 75 female
4 5 John Mike Four 60 female
Not starting with string
print(df[~df.name.str.startswith('Jo')])
Any Matching with more than one string
We can match multiple strings where any one can match ( OR )
import pandas as pd
df=pd.read_csv("student.csv")
df[df.name.str.contains('|'.join(['Jo','Al']))]
name column can have Jo or Al, output is here
id name class mark gender
0 1 John Deo Four 75 female
4 5 John Mike Four 60 female
5 6 Alex John Four 55 male
6 7 My John Rob Fifth 78 male
9 10 Big John Four 55 female
20 21 Babby John Four 69 female
Matching more than one string
All search strings must be present within the string ( AND )
import pandas as pd
df=pd.read_csv("student.csv")
df[df.name.str.contains('Jo') & df.name.str.contains('Al')]
Both strings Jo and Al must be present within name column.
id name class mark gender
5 6 Alex John Four 55 male
Matching two columns with OR. Here one is string and other column is integer. Either name contains Mike OR id == 6 .
don't forget to wrap the sub-statements with ()
str1 = df["id"] == 5 # id column value
str2 = df.name.str.contains("Al", case=False) # name column value matching Al
str3 = df["mark"] < 50 # mark column value less than 50
df2 = df[str1 | str2 | str3] # combine all conditions using | operator
Not matching
All rows where searched string is NOT present ( not matching )
df[~df.name.str.contains('Jo')]
When any one of the string is not present.
df[~df.name.str.contains('|'.join(['Jo','Al']))]
Getting rows from end
Last record
import pandas as pd
df=pd.read_csv('student.csv')
print(df.tail(1)) # Last records
Output
id name class mark gender
34 35 Rows Noump Six 88 female
Last n records ( five records from end )
print(df.tail(5)) # Last five records
Output
id name class mark gender
30 31 Marry Toeey Four 88 male
31 32 Binn Rott Seven 90 female
32 33 Kenn Rein Six 96 female
33 34 Gain Toe Seven 69 male
34 35 Rows Noump Six 88 female
Adding list to DataFrame
We will create a list and then add the list ( new_list ) at the end of the DataFrame. the Function len() will return the number of rows, the index position of last row is 1 less than the number of rows. So the new row will be added at the end of the DataFrame.
We will use loc to add the list.