str.contains to search string or pattern matching in columns series using Pandas

df=df[df['name'].str.contains('ar',case=False)]

Returns boolean series

Create a DataFrame.

import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,6,7],
  'name':['$John','Ma51','Arnold1','Krish0','Roni','Krish','Max'],
  'class':['Four','Three','#Three','Four','7Four','Four,','%Three'],
  'mark':[75,85,55,60,60,60,85]  
	}
df = pd.DataFrame(data=my_dict)
print(df)

This will return all the rows

We will use contains() to get only rows having ar in name column. We used the option case=False so this is a case insensitive matching. You can make it case sensitive by changing case option to case=True

df=df[df['name'].str.contains('ar',case=False)]

Output

   id     name   class  mark
2   3  Arnold1  #Three    55

Delete the rows having matching sub-string in one column.

my_str='abcd'
df=df[~df['col1'].str.contains(my_str)]
#df=df[~df.index.str.contains('\?')] # index column having ? as sub string

Check this Exercise on how to use str.contains(), dataframe.max(), min() to analyse search queries and cliks

regex=True | False

We can use regular expression pattern matching by setting the option regex=True.
We will collect rows where name column is starting with A or R

df=df[df['name'].str.contains('^[AR]',case=True,regex=True)]

output

   id     name   class  mark
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60

Name column ending with h

df=df[df['name'].str.contains('h$',case=True,regex=True)]

Output

   id   name  class  mark
5   6  Krish  Four,    60

Name column ending with h or n

df=df[df['name'].str.contains('[hn]$',case=True,regex=True)]

Name column not having ar

df=df[~df['name'].str.contains('ar',case=False)]

df=df[df['name'].str.contains('^((?!ar).)*$',case=True,regex=True)]

Display all rows where class column is having special chars

print(df[df['class'].str.contains(r'[@#&$%+-/*]')])

Output

   id     name   class  mark
2   3  Arnold1  #Three    55
5   6    Krish   Four,    60
6   7      Max  %Three    85

Display all rows here name column is having number.

print(df[df['name'].str.contains('\\d',regex=True)])

Output

   id     name   class  mark
1   2    Max51   Three    85
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60

Display all rows where name contains 0

print(df[df['name'].str.contains('0')] )

output

   id    name class  mark
3   4  Krish0  Four    60

Display all rows where name contain 0 or class column is having special chars. ( OR combination )

print(df[df['class'].str.contains(r'[@#&$%+-/*]') | 
     df['name'].str.contains('0')])

Output


   id     name   class  mark
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85

Deleting the rows matching the condition

In all above cases we have displayed matching rows. We can use drop() to delete the matching rows and return the balance. Note that drop() will not change the main DataFrame.
Deleting the rows having 0 in name column.

df2=df.drop(df[df['name'].str.contains('0')].index)
print(df2)

Output ( id 4 is deleted )

   id     name   class  mark
0   1    $John    Four    75
1   2     Ma51   Three    85
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85

Delete the rows having special chars in class column

df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)

Output ( id 3, 6 and 7 are deleted )

   id    name  class  mark
0   1   $John   Four    75
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60

Delete the rows having special characters in class or name columns.

df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]') |
              df['name'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)

Output (id , 1, 3, 6 ,7 are deleted )

   id    name  class  mark
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60

Delete rows having ? in Page column

From a list of Pages ( URLs) we want to remove the pages having query string. Here Page is our column name of DdataFrame df.

df=df[~df.Page.str.contains('\?')]

Combine conditions

Here three conditions are used by using OR matching. Final output will contain all rows matching to any one or more conditions.

str1 = df["id"] == 5 # id column value 
str2 = df.name.str.contains("Al", case=False) # name column value matching Al
str3 = df["mark"] < 50 # mark column value less than 50
df2 = df[str1 | str2 | str3] # combine all conditions using | operator

We can use AND matching so output must match to all conditions.

df2 = df[str1 & str2 & str3] # combine all conditions using & operator

Using OR , AND combination of words

conditions=['tkinter','color']
my_str = '|'.join(conditions) # Any one word to be present ( OR ) 
print(my_str)
df2=df[df['queries'].str.contains(my_str,case=False)]

Using AND combination ( All words should be present )

conditions=['tkinter','color']

my_str=''
for w in conditions: # All words to be present ( AND ) 
    my_str=my_str + "(?=.*" + w + ")" 

print(my_str)
#df2=df[df['queries'].str.contains(r'^(?=.*tkinter)(?=.*color)',case=False)]
df2=df[df['queries'].str.contains(my_str,case=False)]

str.contains.sum() Data Cleaning

Search DataFrame by user inputs through Tkinter.
Pandas read_csv() read_excel() to_excel()

Numpy arrays Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

str.contains():Searches for string or pattern matching

regex=True | False

Deleting the rows matching the condition

Delete rows having ? in Page column

Combine conditions

Using OR , AND combination of words

Subscribe