« Pandas
Searches for string or pattern matching with different options.
Returns boolean series
Create a DataFrame.
import pandas as pd
my_dict={
'id':[1,2,3,4,5,6,7],
'name':['$John','Ma51','Arnold1','Krish0','Roni','Krish','Max'],
'class':['Four','Three','#Three','Four','7Four','Four,','%Three'],
'mark':[75,85,55,60,60,60,85]
}
df = pd.DataFrame(data=my_dict)
print(df)
This will return all the rows
We will use contains() to get only rows having ar in name column. We used the option case=False so this is a case insensitive matching. You can make it case sensitive by changing case option to case=True
df=df[df['name'].str.contains('ar',case=False)]
Output
id name class mark
2 3 Arnold1 #Three 55
regex=True | False
We can use regular expression pattern matching by setting the option regex=True.
We will collect rows where name column is starting with A or R
df=df[df['name'].str.contains('^[AR]',case=True,regex=True)]
output
id name class mark
2 3 Arnold1 #Three 55
4 5 Roni 7Four 60
Name column ending with h
df=df[df['name'].str.contains('h$',case=True,regex=True)]
Output
id name class mark
5 6 Krish Four, 60
Name column ending with h or n
df=df[df['name'].str.contains('[hn]$',case=True,regex=True)]
Name column not having ar
df=df[~df['name'].str.contains('ar',case=False)]
Or
df=df[df['name'].str.contains('^((?!ar).)*$',case=True,regex=True)]
Display all rows where class column is having special chars
print(df[df['class'].str.contains(r'[@#&$%+-/*]')])
Output
id name class mark
2 3 Arnold1 #Three 55
5 6 Krish Four, 60
6 7 Max %Three 85
Display all rows here name column is having number.
print(df[df['name'].str.contains('\\d',regex=True)])
Output
id name class mark
1 2 Max51 Three 85
2 3 Arnold1 #Three 55
3 4 Krish0 Four 60
Display all rows where name contains 0
print(df[df['name'].str.contains('0')] )
output
id name class mark
3 4 Krish0 Four 60
Display all rows where name contain 0 or class column is having special chars. ( OR combination )
print(df[df['class'].str.contains(r'[@#&$%+-/*]') |
df['name'].str.contains('0')])
Output
id name class mark
2 3 Arnold1 #Three 55
3 4 Krish0 Four 60
5 6 Krish Four, 60
6 7 Max %Three 85
Deleting the rows matching the condition
In all above cases we have displayed matching rows. We can use drop() to delete the matching rows and return the balance. Note that drop() will not change the main DataFrame.
Deleting the rows having 0 in name column.
df2=df.drop(df[df['name'].str.contains('0')].index)
print(df2)
Output ( id 4 is deleted )
id name class mark
0 1 $John Four 75
1 2 Ma51 Three 85
2 3 Arnold1 #Three 55
4 5 Roni 7Four 60
5 6 Krish Four, 60
6 7 Max %Three 85
Delete the rows having special chars in class column
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output ( id 3, 6 and 7 are deleted )
id name class mark
0 1 $John Four 75
1 2 Ma51 Three 85
3 4 Krish0 Four 60
4 5 Roni 7Four 60
Delete the rows having special characters in class or name columns.
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]') |
df['name'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output (id , 1, 3, 6 ,7 are deleted )
id name class mark
1 2 Ma51 Three 85
3 4 Krish0 Four 60
4 5 Roni 7Four 60
Delete rows having ? in Page column
From a list of Pages ( URLs) we want to remove the pages having query string. Here Page is our column name of DdataFrame df.
df=df[~df.Page.str.contains('\?')]
« str.contains.sum() Data Cleaning
« Pandas
read_csv()
read_excel()
to_excel()
← Subscribe to our YouTube Channel here