str.contains():Searches for string or pattern matching

df=df[df['name'].str.contains('ar',case=False)]
Returns boolean series

Create a DataFrame.
import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,6,7],
  'name':['$John','Ma51','Arnold1','Krish0','Roni','Krish','Max'],
  'class':['Four','Three','#Three','Four','7Four','Four,','%Three'],
  'mark':[75,85,55,60,60,60,85]  
	}
df = pd.DataFrame(data=my_dict)
print(df)
This will return all the rows

We will use contains() to get only rows having ar in name column. We used the option case=False so this is a case insensitive matching. You can make it case sensitive by changing case option to case=True
df=df[df['name'].str.contains('ar',case=False)]
Output
   id     name   class  mark
2   3  Arnold1  #Three    55
Delete the rows having matching sub-string in one column.
my_str='abcd'
df=df[~df['col1'].str.contains(my_str)]
#df=df[~df.index.str.contains('\?')] # index column having ? as sub string

regex=True | False

We can use regular expression pattern matching by setting the option regex=True.
We will collect rows where name column is starting with A or R
df=df[df['name'].str.contains('^[AR]',case=True,regex=True)] 
output
   id     name   class  mark
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60
Name column ending with h
df=df[df['name'].str.contains('h$',case=True,regex=True)]
Output
   id   name  class  mark
5   6  Krish  Four,    60
Name column ending with h or n
df=df[df['name'].str.contains('[hn]$',case=True,regex=True)]
Name column not having ar
df=df[~df['name'].str.contains('ar',case=False)]
Or
df=df[df['name'].str.contains('^((?!ar).)*$',case=True,regex=True)]
Display all rows where class column is having special chars
print(df[df['class'].str.contains(r'[@#&$%+-/*]')])
Output
   id     name   class  mark
2   3  Arnold1  #Three    55
5   6    Krish   Four,    60
6   7      Max  %Three    85
Display all rows here name column is having number.
print(df[df['name'].str.contains('\\d',regex=True)])
Output
   id     name   class  mark
1   2    Max51   Three    85
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60
Display all rows where name contains 0
print(df[df['name'].str.contains('0')] )
output
   id    name class  mark
3   4  Krish0  Four    60
Display all rows where name contain 0 or class column is having special chars. ( OR combination )
print(df[df['class'].str.contains(r'[@#&$%+-/*]') | 
     df['name'].str.contains('0')])  
Output

   id     name   class  mark
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85

Deleting the rows matching the condition

In all above cases we have displayed matching rows. We can use drop() to delete the matching rows and return the balance. Note that drop() will not change the main DataFrame.
Deleting the rows having 0 in name column.
df2=df.drop(df[df['name'].str.contains('0')].index)
print(df2)
Output ( id 4 is deleted )
   id     name   class  mark
0   1    $John    Four    75
1   2     Ma51   Three    85
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85
Delete the rows having special chars in class column
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output ( id 3, 6 and 7 are deleted )
   id    name  class  mark
0   1   $John   Four    75
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60
Delete the rows having special characters in class or name columns.
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]') |
              df['name'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output (id , 1, 3, 6 ,7 are deleted )
   id    name  class  mark
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60

Delete rows having ? in Page column

From a list of Pages ( URLs) we want to remove the pages having query string. Here Page is our column name of DdataFrame df.
df=df[~df.Page.str.contains('\?')]

Combine conditions

Here three conditions are used by using OR matching. Final output will contain all rows matching to any one or more conditions.
str1 = df["id"] == 5 # id column value 
str2 = df.name.str.contains("Al", case=False) # name column value matching Al
str3 = df["mark"] < 50 # mark column value less than 50
df2 = df[str1 | str2 | str3] # combine all conditions using | operator
We can use AND matching so output must match to all conditions.
df2 = df[str1 & str2 & str3] # combine all conditions using & operator

Using OR , AND combination of words

conditions=['tkinter','color']
my_str = '|'.join(conditions) # Any one word to be present ( OR ) 
print(my_str)
df2=df[df['queries'].str.contains(my_str,case=False)]
Using AND combination ( All words should be present )
conditions=['tkinter','color']

my_str=''
for w in conditions: # All words to be present ( AND ) 
    my_str=my_str + "(?=.*" + w + ")" 

print(my_str)
#df2=df[df['queries'].str.contains(r'^(?=.*tkinter)(?=.*color)',case=False)]
df2=df[df['queries'].str.contains(my_str,case=False)]
str.contains.sum() Data Cleaning

Search DataFrame by user inputs through Tkinter.
Pandas read_csv() read_excel() to_excel()
Subscribe to our YouTube Channel here


Subscribe

* indicates required
Subscribe to plus2net

    plus2net.com



    Post your comments , suggestion , error , requirements etc here





    Python Video Tutorials
    Python SQLite Video Tutorials
    Python MySQL Video Tutorials
    Python Tkinter Video Tutorials
    We use cookies to improve your browsing experience. . Learn more
    HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
    ©2000-2024 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer