str.contains()

Pandas

Searches for string or pattern matching with different options.
Returns boolean series

Create a DataFrame.
import pandas as pd 
my_dict={
  'id':[1,2,3,4,5,6,7],
  'name':['$John','Ma51','Arnold1','Krish0','Roni','Krish','Max'],
  'class':['Four','Three','#Three','Four','7Four','Four,','%Three'],
  'mark':[75,85,55,60,60,60,85]  
	}
df = pd.DataFrame(data=my_dict)
print(df)
This will return all the rows

We will use contains() to get only rows having ar in name column. We used the option case=False so this is a case insensitive matching. You can make it case sensitive by changing case option to case=True
df=df[df['name'].str.contains('ar',case=False)]
Output
   id     name   class  mark
2   3  Arnold1  #Three    55

regex=True | False

We can use regular expression pattern matching by setting the option regex=True.
We will collect rows where name column is starting with A or R
df=df[df['name'].str.contains('^[AR]',case=True,regex=True)] 
output
   id     name   class  mark
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60
Name column ending with h
df=df[df['name'].str.contains('h$',case=True,regex=True)]
Output
   id   name  class  mark
5   6  Krish  Four,    60
Name column ending with h or n
df=df[df['name'].str.contains('[hn]$',case=True,regex=True)]
Name column not having ar
df=df[~df['name'].str.contains('ar',case=False)]
Or
df=df[df['name'].str.contains('^((?!ar).)*$',case=True,regex=True)]
Display all rows where class column is having special chars
print(df[df['class'].str.contains(r'[@#&$%+-/*]')])
Output
   id     name   class  mark
2   3  Arnold1  #Three    55
5   6    Krish   Four,    60
6   7      Max  %Three    85
Display all rows here name column is having number.
print(df[df['name'].str.contains('\\d',regex=True)])
Output
   id     name   class  mark
1   2    Max51   Three    85
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60
Display all rows where name contains 0
print(df[df['name'].str.contains('0')] )
output
   id    name class  mark
3   4  Krish0  Four    60
Display all rows where name contain 0 or class column is having special chars. ( OR combination )
print(df[df['class'].str.contains(r'[@#&$%+-/*]') | 
     df['name'].str.contains('0')])  
Output

   id     name   class  mark
2   3  Arnold1  #Three    55
3   4   Krish0    Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85

Deleting the rows matching the condition

In all above cases we have displayed matching rows. We can use drop() to delete the matching rows and return the balance. Note that drop() will not change the main DataFrame.
Deleting the rows having 0 in name column.
df2=df.drop(df[df['name'].str.contains('0')].index)
print(df2)
Output ( id 4 is deleted )
   id     name   class  mark
0   1    $John    Four    75
1   2     Ma51   Three    85
2   3  Arnold1  #Three    55
4   5     Roni   7Four    60
5   6    Krish   Four,    60
6   7      Max  %Three    85
Delete the rows having special chars in class column
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output ( id 3, 6 and 7 are deleted )
   id    name  class  mark
0   1   $John   Four    75
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60
Delete the rows having special characters in class or name columns.
df2=df.drop(df[df['class'].str.contains(r'[@#&$%+-/*]') |
              df['name'].str.contains(r'[@#&$%+-/*]')].index)
print(df2)
Output (id , 1, 3, 6 ,7 are deleted )
   id    name  class  mark
1   2    Ma51  Three    85
3   4  Krish0   Four    60
4   5    Roni  7Four    60
str.contains.sum() Data Cleaning


Pandas read_csv() read_excel() to_excel()


plus2net.com



Post your comments , suggestion , error , requirements etc here





Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2021 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer