In this post, we will learn about how to Split the Pandas DataFrame columns by Multiple delimiters with the help of the Pandas series.str.split() Method and regular expression for multiple delimiters. To better understand the Split Pandas DataFrame column by Multiple delimiters let us first understand how the Pandas series.str.split() method works.
Pandas series.str.split() Method
The Pandas.series.str.split() method is used to split the string based on a delimiter. If delimiter is not given by default it uses whitespace to split the string.
Syntax
series.str.split((pat=None, n=- 1, expand=False)
Parmeters
- Pat : String or regular expression.If not given ,split is based on whitespace.
- N :It limits the number of split in output.
- Expand : Expand the string into separate column.
- If
True
, return DataFrame/MultiIndex expanding dimensionality. - If
False
, return Series/Index, containing lists of strings.
- If
1. Split Pandas DataFrame column by single Delimiter
In this example, we are splitting columns into multiple columns using the str.split() method with delimiter hyphen(-). We have dataframe column “Mark” that we are splitting into “Mark” and “Mark_” columns.
We can use any delimiter as per need.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','num-100'],['Jack','Sci','x-100'],
['Max','Phy','f-99'],['Rack','Music','num-80']],
columns = ['Name','Subj','Mark'])
#column split based on delimiter hypen
df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('-',expand=True)
print(df_stu)
Output
Name Subj Mark Mark_
0 John Math 100 num
1 Jack Sci 100 x
2 Max Phy 99 f
3 Rack Music 80 num
2. Split Pandas DataFrame column by Mutiple Delimiter
In this example, we are using the str.split() method to split the “Mark ” column into multiple columns by using this multiple delimiter (- _; / %) The “Mark ” column will be split as “Mark “ and “Mark _”.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','num%100'],['Jack','Sci','x/100'],
['Max','Phy','f_99'],['Rack','Music','num;80'],['Jon','Phy','num-100']],
columns = ['Name','Subj','Mark'])
df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('/|;|_|%|-',expand=True)
print(df_stu)
Output
Name Subj Mark Mark_
0 John Math 100 num
1 Jack Sci 100 x
2 Max Phy 99 f
3 Rack Music 80 num
4 Jon Phy 100 num
3. Split column by Multiple delimiters no digit check
In this example, we are splitting the column “MobileNo” that contains multiple delimiters instead of splitting based on multiple delimiters. We have used only no digit check on column”MobileNo” to remove multiple delimiters and split into multiple columns using str.split() method
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','100','21345443/960010678'],['Jack','Sci','100','12345442/950010679'],
['Max','Phy','99','12343087;555676731'],['Rack','Music','80','23450055,636414129'],['Jon','Phy','100','24346442/850010670']],
columns = ['Name','Subj','Mark','MobileNo'])
df_stu[[' MobileNo_1','MobileNo']] = df_stu['MobileNo'].str.split(r'\D',expand=True)
print('non-digit check:\n',df_stu)
Output
Name Subj Mark MobileNo MobileNo_1
0 John Math 100 960010678 21345443
1 Jack Sci 100 950010679 12345442
2 Max Phy 99 555676731 12343087
3 Rack Music 80 636414129 23450055
4 Jon Phy 100 850010670 24346442
4. Split column by Multiple delimiters based on digit check
In this example, we are splitting the Dataframe column into multiple columns by using the str.split() method with only a digit check.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','100','21345443/960010678'],['Jack','Sci','100','12345442/950010679'],
['Max','Phy','99','12343087;555676731'],['Rack','Music','80','23450055,636414129'],['Jon','Phy','100','24346442/850010670']],
columns = ['Name','Subj','Mark','MobileNo'])
df_stu[[' MobileNo_1','MobileNo']] = df_stu['MobileNo'].str.extractall(r'(\d+)')[0].unstack()
print(df_stu)
5. Split column by Domain name and emailID
In this example, we are splitting the EmailID column based on a delimiter (@) into multiple columns with the help str.split() method by splitting the emailID from Domain.
Program Example
# Python PandasPprogram to extract domain name from EmailID
import pandas as pd
df_stu = pd.DataFrame([['John','Math','kkk@gmail.com'],['Jack','Sci','fff@devenum.com'],
['Max','Phy','ggg@devenum.com'],['Rack','Music','sss@yahoo.com'],['Jon','Phy','sss@gmail.com']],
columns = ['Name','Subj','EmailID'])
df_stu['Domain'] = df_stu['EmailID'].str.split('@').str[1]
print(df_stu)
Output
Name Subj EmailID Domain
0 John Math kkk@gmail.com gmail.com
1 Jack Sci fff@devenum.com devenum.com
2 Max Phy ggg@devenum.com devenum.com
3 Rack Music sss@yahoo.com yahoo.com
4 Jon Phy sss@gmail.com gmail.com
6. Split column based on Mutiple possible delimiter
In this example, we are using the regular expression to split the column based on Multiple possible delimiters by using str.split(‘[ /,;]’,expand=True).The str.split() method except single or multiple delimiters using regular expression
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','Mr.John,fox,Smith'],['Jack','Sci','Mr.Jack/snow,sami'],
['Max','Phy','Mr.Max,rax;devid']],
columns = ['Name','Subj','FullName'])
df_stu[['FirstName','MiddleName','LastName']] = df_stu.FullName.str.split('[ /,;]', expand=True)
print(df_stu)
Output
Name Subj FullName FirstName MiddleName LastName
0 John Math Mr.John,fox,Smith Mr.John fox Smith
1 Jack Sci Mr.Jack/snow,sami Mr.Jack snow sami
2 Max Phy Mr.Max,rax;devid Mr.Max rax devid
Summary
In this post, we have learned multiple ways to Split the Pandas DataFrame column by Multiple delimiters with the help of examples that includes a single delimiter, multiple delimiters, Using a regular expression, split based on only digit check or non-digit check by using Pandas series.str.split() method.