Split Pandas DataFrame column by Multiple delimiters

Pandas

In this post, we will learn about how to Split the Pandas DataFrame columns by Multiple delimiters with the help of the Pandas series.str.split() Method and regular expression for multiple delimiters.

To better understand the Split Pandas DataFrame column by Multiple delimiters let us first understand how the Pandas series.str.split() method works.

Pandas series.str.split() Method


The Pandas.series.str.split() method is used to split the string based on a delimiter. If delimiter is not given by default it uses whitespace to split the string.

Syntax

series.str.split((pat=None, n=- 1, expand=False)

Parmeters

  • Pat : String or regular expression.If not given ,split is based on whitespace.
  • N :It limits the number of split in output.
  • Expand : Expand the string into separate column.
    • If True, return DataFrame/MultiIndex expanding dimensionality.
    • If False, return Series/Index, containing lists of strings.

1. Split Pandas DataFrame column by single Delimiter


In this example, we are splitting columns into multiple columns using the str.split() method with delimiter hyphen(-). We have dataframe column “Mark” that we are splitting into “Mark” and “Mark_” columns.

We can use any delimiter as per need.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','num-100'],['Jack','Sci','x-100'],
                   ['Max','Phy','f-99'],['Rack','Music','num-80']], 
                  columns = ['Name','Subj','Mark'])

#column split based on delimiter hypen

df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('-',expand=True)


print(df_stu)

Output

   Name   Subj Mark   Mark_
0  John   Math  100    num
1  Jack    Sci  100     x
2   Max    Phy   99     f
3  Rack  Music   80     num

2. Split Pandas DataFrame column by Mutiple Delimiter


In this example, we are using the str.split() method to split the “Mark ” column into multiple columns by using this multiple delimiter (- _; / %) The “Mark ” column will be split as “Mark “ and “Mark _”.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','num%100'],['Jack','Sci','x/100'],
                  ['Max','Phy','f_99'],['Rack','Music','num;80'],['Jon','Phy','num-100']], 
                  columns = ['Name','Subj','Mark'])


df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('/|;|_|%|-',expand=True)                 


print(df_stu)

Output

   Name   Subj Mark  Mark_
0  John   Math  100    num
1  Jack    Sci  100      x
2   Max    Phy   99      f
3  Rack  Music   80    num
4   Jon    Phy  100    num


3. Split column by Multiple delimiters no digit check


In this example, we are splitting the column “MobileNo” that contains multiple delimiters instead of splitting based on multiple delimiters. We have used only no digit check on column”MobileNo” to remove multiple delimiters and split into multiple columns using str.split() method

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','100','21345443/960010678'],['Jack','Sci','100','12345442/950010679'],
                  ['Max','Phy','99','12343087;555676731'],['Rack','Music','80','23450055,636414129'],['Jon','Phy','100','24346442/850010670']], 
                  columns = ['Name','Subj','Mark','MobileNo'])


df_stu[[' MobileNo_1','MobileNo']] = df_stu['MobileNo'].str.split(r'\D',expand=True)                 


print('non-digit check:\n',df_stu)



Output

   Name   Subj Mark   MobileNo  MobileNo_1
0  John   Math  100  960010678    21345443
1  Jack    Sci  100  950010679    12345442
2   Max    Phy   99  555676731    12343087
3  Rack  Music   80  636414129    23450055
4   Jon    Phy  100  850010670    24346442

4. Split column by Multiple delimiters based on digit check


In this example, we are splitting the Dataframe column into multiple columns by using the str.split() method with only a digit check.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','100','21345443/960010678'],['Jack','Sci','100','12345442/950010679'],
                  ['Max','Phy','99','12343087;555676731'],['Rack','Music','80','23450055,636414129'],['Jon','Phy','100','24346442/850010670']], 
                  columns = ['Name','Subj','Mark','MobileNo'])


df_stu[[' MobileNo_1','MobileNo']] = df_stu['MobileNo'].str.extractall(r'(\d+)')[0].unstack()                


print(df_stu)

5. Split column by Domain name and emailID


In this example, we are splitting the EmailID column based on a delimiter (@) into multiple columns with the help str.split() method by splitting the emailID from Domain.

Program Example

# Python PandasPprogram to extract domain name from EmailID

import pandas as pd

df_stu = pd.DataFrame([['John','Math','kkk@gmail.com'],['Jack','Sci','fff@devenum.com'],
                  ['Max','Phy','ggg@devenum.com'],['Rack','Music','sss@yahoo.com'],['Jon','Phy','sss@gmail.com']], 
                  columns = ['Name','Subj','EmailID'])


df_stu['Domain'] = df_stu['EmailID'].str.split('@').str[1]

print(df_stu)

Output

  Name   Subj          EmailID       Domain
0  John   Math    kkk@gmail.com    gmail.com
1  Jack    Sci  fff@devenum.com  devenum.com
2   Max    Phy  ggg@devenum.com  devenum.com
3  Rack  Music    sss@yahoo.com    yahoo.com
4   Jon    Phy    sss@gmail.com    gmail.com

6. Split column based on Mutiple possible delimiter


In this example, we are using the regular expression to split the column based on Multiple possible delimiters by using str.split(‘[ /,;]’,expand=True).The str.split() method except single or multiple delimiters using regular expression

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','Mr.John,fox,Smith'],['Jack','Sci','Mr.Jack/snow,sami'],
                  ['Max','Phy','Mr.Max,rax;devid']], 
                  columns = ['Name','Subj','FullName'])


df_stu[['FirstName','MiddleName','LastName']]  = df_stu.FullName.str.split('[ /,;]', expand=True)

print(df_stu)

Output

   Name  Subj           FullName FirstName MiddleName LastName
0  John  Math  Mr.John,fox,Smith   Mr.John        fox    Smith
1  Jack   Sci  Mr.Jack/snow,sami   Mr.Jack       snow     sami
2   Max   Phy   Mr.Max,rax;devid    Mr.Max        rax    devid

Summary

In this post, we have learned multiple ways to Split the Pandas DataFrame column by Multiple delimiters with the help of examples that includes a single delimiter, multiple delimiters, Using a regular expression, split based on only digit check or non-digit check by using Pandas series.str.split() method.