Pandas DataFrame loc


Youtube Live session on Tkinter

Column name based position location using loc

Pandas loc to get rows based on column labels using callback functions from DataFrame


We used DataFrame of 35 rows by reading from a CSV file.

Read more on how to read csv file here
import pandas as pd 
df= pd.read_excel('D:\\my_data\\student.xlsx') # excel file 
df= pd.read_csv('D:\\my_data\\student.csv') # csv  file
Output
    id         name  class  mark     gender
0    1     John Deo   Four    75  female
1    2     Max Ruin  Three    85    male
2    3       Arnold  Three    55    male
3    4   Krish Star   Four    60  female
 ----------------
 ----------------
 ---------------
You can display all 35 rows , Now let us try our scripts.
This line will display one column class of all rows
df.loc[:,'class'] # all rows of column class
Output
0      Four
1     Three
2     Three
-------
-------
All 35 rows will be displayed in above code.
All rows showing with two columns class, name
df.loc[:,['class','name']] # all rows of column class and name
	class	name
0	Four	John Deo
1	Three	Max Ruin
2	Three	Arnold
3	Four	Krish Star
------------
------------
------------
All 35 rows will be displayed in above code

Value at position

First rows with two columns
df.loc[0,['class','name']] # first row with two columns class and name
Output
class        Four
name     John Deo
Name: 0, dtype: object

2nd and 3rd rows with two columns

df.loc[[2,3],['class','name']] # 2nd and 3rd row with two columns class and name
Output
	class	name
2	Three	Arnold
3	Four	Krish Star

2nd and 3rd rows with all columns

df.loc[[2,3],:] # 2nd and 3rd row with two columns class and name
Output
	id	name		class	mark	gender
2	3	Arnold		Three	55	male
3	4	Krish Star	Four	60	female

Matching data

While reading csv file we will add index to name column and then search.
import pandas as pd 
df=pd.read_csv('student.csv',index_col='name')
df.loc['Tade Row']
Output
id         15
class    Four
mark       88
gender      male
Name: Tade Row, dtype: object
Matching two names
df.loc[['Tade Row','John Deo']] # printing row with two name matching
Output
		id	class	mark	gender
name				
Tade Row	15	Four	88	male
John Deo	1	Four	75	female

All matching records

import pandas as pd 
df=pd.read_csv('test.csv',index_col='class')
df.loc['Four'] # printing all rows with class='Four'
Output
	id	name		mark	gender
class				
Four	1	John Deo	75	female
Four	4	Krish Star	60	female
Four	5	John Mike	60	female
Four	6	Alex John	55	male
Four	10	Big John	55	female
Four	15	Tade Row	88	male
Four	16	Gimmy	88	male
Four	21	Babby John	69	female
Four	31	Marry Toeey	88	male

Rows between ranges

All the records between two id
import pandas as pd 
df=pd.read_csv('student.csv',index_col='id')
df.loc[4:6] # printing rows between two ids 
Output
	name		class	mark	gender
id				
4	Krish Star	Four	60	female
5	John Mike	Four	60	female
6	Alex John	Four	55	male
In above code we have two unique ids and rows between these two ids will be displayed. If data are not unique then there will be an error. Try be using two class names, you will get error "Cannot get left slice bound for non-unique label: 'Five'"

Between two names ( column data)

import pandas as pd 
df=pd.read_csv('test.csv',index_col='name')
#print(df)
df.loc['Ronald':'Bigy']  # Records between two names 
Here are records between name 'Ronald' and 'Bigy'
	id	class	mark	gender
name				
Ronald	11	Six	89	female
Recky	12	Six	94	female
Kty	13	Seven	88	female
Bigy	14	Seven	88	female
We can search name column after declaring as index column.
df= pd.read_csv('student.csv',index_col='name') # DataFrame 
print(df.loc['Arnold','mark':]) # Mark of Arnold ( name )
print(df.loc[:,'mark']) # All marks ( with name )
print(df.loc['Arnold',:]) # All columns of Arnold 

Using Callable function

print(df.loc[lambda x: x.name == 'Arnold']) 
print(df.loc[lambda x: list(x.mark > 90)]) # mark more than 90
print(df.loc[lambda x: list(x.gender == 'male'),:])
print(df.loc[lambda x: list(x.name.str.len()>10),:]) # name is not index col
print(df.loc[lambda x: list(x.index.str.len()>10),:]) # name is index col

Record of matching string

import pandas as pd 
df=pd.read_csv('student.csv')
print(df.loc[df['name']=='Alex John']) # Search for name
Output

   id       name class  mark   gender
5   6  Alex John  Four    55  male

Partial matching string

In above code we have filtered records with exact matching. Sometime we have to search for partial matching.
Let us find out rows where part of the column string matching with search string.
import pandas as pd 
df=pd.read_csv("student.csv")
df[df.name.str.contains('De',case=False)]
Output is here
	id	name		class	mark	gender
0	1	John Deo	Four	75	female
14	15	Tade Row	Four	88	male

endswith : string ending with

print(df.loc[df['name'].str.endswith('John')])
Output
    id        name class  mark     gender
5    6   Alex John  Four    55    male
9   10    Big John  Four    55  female
20  21  Babby John  Four    69  female
Not ending with
print(df.loc[~df['name'].str.endswith('John')])

startswith : string starting with

import pandas as pd 
df=pd.read_csv("student.csv")
df[df.name.str.startswith('Jo')]
Output
	id	name		class	mark	gender
0	1	John Deo	Four	75	female
4	5	John Mike	Four	60	female

Not starting with string

print(df[~df.name.str.startswith('Jo')])

Any Matching with more than one string

We can match multiple strings where any one can match ( OR )
import pandas as pd 
df=pd.read_csv("student.csv")
df[df.name.str.contains('|'.join(['Jo','Al']))]
name column can have Jo or Al, output is here
	id	name		class	mark	gender
0	1	John Deo	Four	75	female
4	5	John Mike	Four	60	female
5	6	Alex John	Four	55	male
6	7	My John Rob	Fifth	78	male
9	10	Big John	Four	55	female
20	21	Babby John	Four	69	female

Matching more than one string

All search strings must be present within the string ( AND )
import pandas as pd 
df=pd.read_csv("student.csv")
df[df.name.str.contains('Jo') & df.name.str.contains('Al')]
Both strings Jo and Al must be present within name column.
	id	name		class	mark	gender
5	6	Alex John	Four	55	male
Using OR condition |
df2 = df[df.name.str.contains("Jo") | df.name.str.contains("Al")]
Matching two columns with OR. Here one is string and other column is integer. Either name contains Mike OR id == 6 .
don't forget to wrap the sub-statements with ()
df2 = df[((df.name.str.contains("Mike")) | (df["id"] == 6))]

Combine conditions

str1 = df["id"] == 5 # id column value 
str2 = df.name.str.contains("Al", case=False) # name column value matching Al
str3 = df["mark"] < 50 # mark column value less than 50
df2 = df[str1 | str2 | str3] # combine all conditions using | operator

Not matching

All rows where searched string is NOT present ( not matching )
df[~df.name.str.contains('Jo')]
When any one of the string is not present.
df[~df.name.str.contains('|'.join(['Jo','Al']))]

Getting rows from end

Last record
import pandas as pd 
df=pd.read_csv('student.csv')
print(df.tail(1)) # Last  records
Output
    id        name class  mark     gender
34  35  Rows Noump   Six    88  female
Last n records ( five records from end )
print(df.tail(5)) # Last five records 
Output
    id         name  class  mark     gender
30  31  Marry Toeey   Four    88    male
31  32    Binn Rott  Seven    90  female
32  33    Kenn Rein    Six    96  female
33  34     Gain Toe  Seven    69    male
34  35   Rows Noump    Six    88  female

Adding list to DataFrame

We will create a list and then add the list ( new_list ) at the end of the DataFrame. the Function len() will return the number of rows, the index position of last row is 1 less than the number of rows. So the new row will be added at the end of the DataFrame. We will use loc to add the list.
new_list=[36,'new name','Four',77,'male']
df.loc[len(df)]=new_list
print(df)

Replacing data based on condition using loc

Make the mark column value = 100 for students who got mark>=90.)
df.loc[df["mark"] >=90, "mark"] = 100
Adding multiple conditions ( this will replace with mark = 100 for class Six with more than or equal to 90 mark.
condition=(df['mark']>=90) & (df['class']=='Six')
df.loc[condition, "mark"] = 100
Instead of updating mark, change the class value to 'Seven' for class ='Six' and mark >=90
condition=(df['mark']>=90) & (df['class']=='Six')
df.loc[condition, "class"] = 'Seven'
df.loc[df['name']=='Rows Noump','name']='New Name'
Note that here the new list or data is added to the DataFrame only, not to the csv file at source.
Replace data based multiple condition like CASE THEN ( SQL ) by using np.where
at mask contains() to display and delete row based on Conditions

Pandas Pandas DataFrame iloc - rows and columns by integers
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer