str.split(): Break the string using delimiters

import pandas as pd 
my_dict={'name':['Ravi King','Raju Queen','Alex Jack']}
df = pd.DataFrame(data=my_dict)
print(df.name.str.split()) # without delimiter
Output
0     [Ravi, King]
1    [Raju, Queen]
2     [Alex, Jack]
Returns Series, Index, DataFrame

Options

Break email address using @ to separate userid and domain part of the email address.
import pandas as pd 
my_dict={'email':['Ravi@example.com','Raju@example.com','Alex@example.com']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('@'))
Output
0    [Ravi, example.com]
1    [Raju, example.com]
2    [Alex, example.com]
By using the option expand=True, we can get data in columns ( DataFrame ). We can use columns to get our data.
print(df.email.str.split('@',expand=True))
Output
      0            1
0  Ravi  example.com
1  Raju  example.com
2  Alex  example.com
The userid part can be collected like this
print(df.email.str.split('@',expand=True)[0])
Change the column name to 1 ( [1] ) to get domain part.

Handling NaN

import numpy as np
import pandas as pd 
my_dict={'email':['Ravi@example.com','Raju@example.com',np.nan,'Alex@example.com']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('@'))
Output
0    [Ravi, example.com]
1    [Raju, example.com]
2                    NaN
3    [Alex, example.com]
Using get() to get the columns
print(df.email.str.split('@').str.get(0))
Output
0    Ravi
1    Raju
2     NaN
3    Alex

n= int ( default =-1) all

We can specify number of splits to apply, by default all matching occurrences are used ( n=-1 ). We have changed our sample data to include more number of delimiters.
import numpy as np
import pandas as pd 
my_dict={'email':['id.Ravi@example.co.in','id.Raju@example.co.in',np.nan,'id.Alex@example.co.in']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('.',expand=True,n=1))
Output
     0                   1
0   id  Ravi@example.co.in
1   id  Raju@example.co.in
2  NaN                 NaN
3   id  Alex@example.co.in

rsplit()

We can break or split the string starting from right side or from end by using rsplit()
import numpy as np
import pandas as pd 
my_dict={'email':['id.Ravi@example.co.in','id.Raju@example.co.in',np.nan,'id.Alex@example.co.in']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.rsplit('.',expand=True,n=1))
Output
                    0    1
0  id.Ravi@example.co   in
1  id.Raju@example.co   in
2                 NaN  NaN
3  id.Alex@example.co   in

Uses of split()

One of the common requirement is to separate directory and file from the path. Here are some sample data where some addresses ( URLs) are given. Let us try to collect directory name and file name from the data.
import pandas as pd 
my_dict={'Page':['https://www.plus2net.com/html_tutorial/button-linking.php',
                 'https://www.plus2net.com/c-tutorial/grade.php',
                 'https://www.plus2net.com/sql_tutorial/between-date.php',
                 'https://www.plus2net.com/php_tutorial/variables2.php',
                 'https://www.plus2net.com/sql_tutorial/sql_like.php',
                 'https://www.plus2net.com/sql_tutorial/sql_sum-multiple.php',
                 'https://www.plus2net.com/sql_tutorial/date-lastweek.php',
                 'https://www.plus2net.com/sql_tutorial/sql_max.php',
                 'https://www.plus2net.com/sql_tutorial/sql_count.php',
                 'https://www.plus2net.com/html_tutorial/html_marquee_behvr.php',
                 'https://www.plus2net.com/javascript_tutorial/clock.php',
                 'https://www.plus2net.com/php_tutorial/php_drop_down_list.php'
]}
df = pd.DataFrame(data=my_dict)
print(df.Page.str.split('/',expand=True)[3])
Output is here
0           html_tutorial
1              c-tutorial
2            sql_tutorial
3            php_tutorial
4            sql_tutorial
5            sql_tutorial
6            sql_tutorial
7            sql_tutorial
8            sql_tutorial
9           html_tutorial
10    javascript_tutorial
11           php_tutorial
To get the file name we can use like this
print(df.Page.str.split('/',expand=True)[4])
After split we will add to columns
df[['id','prototcal','url','dir','file']]=df.Page.str.split('/',expand=True)


For multi level matching of columns we can use like this. This will help when we are not sure about the number of columns we will get in return. Sometime 2 columns sometime more than 2 columns.
df3 = df['page'].str.split('/', expand=True)
df3.columns = ['page_id{}'.format(x+1) for x in df3.columns]
df = df.join(df3)
Pandas contains() Converting char case slice() cat()
Subscribe to our YouTube Channel here



plus2net.com







Python Video Tutorials
Python SQLite Video Tutorials
Python MySQL Video Tutorials
Python Tkinter Video Tutorials
We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles Contact us
©2000-2025   plus2net.com   All rights reserved worldwide Privacy Policy Disclaimer