str.split()

Pandas

Break the string using delimiters. If no delimiter provided then break using whitespace.
Returns Series, Index, DataFrame

No delimiter

import pandas as pd 
my_dict={'name':['Ravi King','Raju Queen','Alex Jack']}
df = pd.DataFrame(data=my_dict)
print(df.name.str.split())
Output
0     [Ravi, King]
1    [Raju, Queen]
2     [Alex, Jack]

Options

Break email address using @ to separate userid and domain part of the email address.
import pandas as pd 
my_dict={'email':['Ravi@example.com','Raju@example.com','Alex@example.com']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('@'))
Output
0    [Ravi, example.com]
1    [Raju, example.com]
2    [Alex, example.com]
By using the option expand=True, we can get data in columns ( DataFrame ). We can use columns to get our data.
print(df.email.str.split('@',expand=True))
Output
      0            1
0  Ravi  example.com
1  Raju  example.com
2  Alex  example.com
The userid part can be collected like this
print(df.email.str.split('@',expand=True)[0])
Change the column name to 1 ( [1] ) to get domain part.

Handling NaN

import numpy as np
import pandas as pd 
my_dict={'email':['Ravi@example.com','Raju@example.com',np.nan,'Alex@example.com']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('@'))
Output
0    [Ravi, example.com]
1    [Raju, example.com]
2                    NaN
3    [Alex, example.com]
Using get() to get the columns
print(df.email.str.split('@').str.get(0))
Output
0    Ravi
1    Raju
2     NaN
3    Alex

n= int ( default =-1) all

We can specify number of splits to apply, by default all matching occurrences are used ( n=-1 ). We have changed our sample data to include more number of delimiters.
import numpy as np
import pandas as pd 
my_dict={'email':['id.Ravi@example.co.in','id.Raju@example.co.in',np.nan,'id.Alex@example.co.in']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.split('.',expand=True,n=1))
Output
     0                   1
0   id  Ravi@example.co.in
1   id  Raju@example.co.in
2  NaN                 NaN
3   id  Alex@example.co.in

rsplit()

We can break or split the string starting from right side or from end by using rsplit()
import numpy as np
import pandas as pd 
my_dict={'email':['id.Ravi@example.co.in','id.Raju@example.co.in',np.nan,'id.Alex@example.co.in']}
df = pd.DataFrame(data=my_dict)
print(df.email.str.rsplit('.',expand=True,n=1))
Output
                    0    1
0  id.Ravi@example.co   in
1  id.Raju@example.co   in
2                 NaN  NaN
3  id.Alex@example.co   in

Uses of split()

One of the common requirement is to separate directory and file from the path. Here are some sample data where some addresses ( URLs) are given. Let us try to collect directory name and file name from the data.
import pandas as pd 
my_dict={'Page':['https://www.plus2net.com/html_tutorial/button-linking.php',
                 'https://www.plus2net.com/c-tutorial/grade.php',
                 'https://www.plus2net.com/sql_tutorial/between-date.php',
                 'https://www.plus2net.com/php_tutorial/variables2.php',
                 'https://www.plus2net.com/sql_tutorial/sql_like.php',
                 'https://www.plus2net.com/sql_tutorial/sql_sum-multiple.php',
                 'https://www.plus2net.com/sql_tutorial/date-lastweek.php',
                 'https://www.plus2net.com/sql_tutorial/sql_max.php',
                 'https://www.plus2net.com/sql_tutorial/sql_count.php',
                 'https://www.plus2net.com/html_tutorial/html_marquee_behvr.php',
                 'https://www.plus2net.com/javascript_tutorial/clock.php',
                 'https://www.plus2net.com/php_tutorial/php_drop_down_list.php'
]}
df = pd.DataFrame(data=my_dict)
print(df.Page.str.split('/',expand=True)[3])
Output is here
0           html_tutorial
1              c-tutorial
2            sql_tutorial
3            php_tutorial
4            sql_tutorial
5            sql_tutorial
6            sql_tutorial
7            sql_tutorial
8            sql_tutorial
9           html_tutorial
10    javascript_tutorial
11           php_tutorial
To get the file name we can use like this
print(df.Page.str.split('/',expand=True)[4])


Pandas contains() Converting char case slice() cat()


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2020 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer