The data frame is a tabular form data structure. Instead of the need for a whole dataframe, we need to split it based on rows and columns. In this post, Split Pandas DataFrame by rows and columns by index, delimiters, and column names with multiple ways and examples.
1. split Pandas dataframe column by delimiter
This Dataframe contains Mark column values with delimiter hyphen(-). We are delimiting hyphen( – ) from each value of the Math column and splitting it into two-columns Math and Mark_ (delimited values column).
We can use any of the delimiters(, – / ) and many more as per requirement.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','num-100'],['Jack','Sci','x-100'],
['Max','Phy','f-99'],['Rack','Music','num-80']],
columns = ['Name','Subj','Mark'])
#column split based on delimiter
df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('-',expand=True)
print(df_stu)
Output
Name Subj Mark Mark_
0 John Math 100 num
1 Jack Sci 100 x
2 Max Phy 99 f
3 Rack Music 80 num
- Add one or multiple columns to Pandas DataFrame
- Fill nan values of multiple columns in Pandas
- How to ffill missing value in Pandas
- Add mutiple columns to Pandas DataFrame
- Add numpy array to Pandas Dataframe as column
- Append List as a row in Pandas Dataframe
- How to Pandas sum all columns except one
- Pandas sum rows by columns(6 ways)
2. split Pandas dataframe column by mutiple delimiter
Instead of a single delimiter, we need to delimit columns with multiple delimiters. We are splitting the dataframe column Math with multiple delimiters (‘/|;|_|%|-‘)
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math','num%100'],['Jack','Sci','x/100'],
['Max','Phy','f_99'],['Rack','Music','num;80']],
columns = ['Name','Subj','Mark'])
df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('/|;|_|%|-',expand=True)
print(df_stu)
Output
Name Subj Mark Mark_
0 John Math 100 num
1 Jack Sci 100 x
2 Max Phy 99 f
3 Rack Music 80 num
3. iloc() method to split dataframe by mutiple rows index
The dataframe iloc() function is used to slice the dataframe and select entries based on the index range of rows and columns. If a colon(:) is passed as an index range for rows and columns then all entries of corresponding rows and columns data will be included in the dataframe output.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
['Max','Phy',99],['Rack','Music',80]],
columns = ['Name','Subj','Mark'])
res_df = df_stu.iloc[1:,:]
res_df1 = df_stu.iloc[:3,:]
print(res_df,'\n')
print(res_df1)
Output
Name Subj Mark
1 Jack Sci 100
2 Max Phy 99
3 Rack Music 80
Name Subj Mark
0 John Math 100
1 Jack Sci 100
2 Max Phy 99
4. Pandas Split dataframe by list of indexes
In this example, we are using the Dataframe iloc() method to split the dataframe rows based on the List range of indexes.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
['Max','Phy',99],['Rack','Music',80]],
columns = ['Name','Subj','Mark'])
res_df = df_stu.iloc[[2 ,0 ] , : ]
print(res_df,'\n')
Output
Name Subj Mark
2 Max Phy 99
0 John Math 100
5.iloc() to split dataframe by columns index
In this example, we are using the DataFrame iloc() method to split the dataframe based on column indexes. We are selecting the column index range(0 to 2) and (0 to 3).
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
['Max','Phy',99],['Rack','Music',80]],
columns = ['Name','Subj','Mark'])
#spliting dataframe by columns
res_df = df_stu.iloc[:,:2]
#spliting dataframe by columns
res_df1 = df_stu.iloc[:,:3]
print(res_df,'\n')
print(res_df1)
Output
Name Subj
0 John Math
1 Jack Sci
2 Max Phy
3 Rack Music
Name Subj Mark
0 John Math 100
1 Jack Sci 100
2 Max Phy 99
3 Rack Music 80
6. Pandas split dataframe by column Name
In this example instead of index, we are splitting the dataframe based on column label.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
['Max','Phy',99],['Rack','Music',80]],
columns = ['Name','Subj','Mark'])
res_df = df_stu[['Subj','Mark']]
print(res_df,'\n')
Output
Subj Mark
0 Math 100
1 Sci 100
2 Phy 99
3 Music 80
7. Split dataframe Using Groupby
The Python pandas groupby() method is used to group the data in the dataframe based on category and is used to split the data based on some conditions on the group.
In this example, we are using groupby() to group data based on “Subj” and selecting the value of subject “Math”.
df_stu = pd.DataFrame({
'Name': ["John","Max","Rack","Tax"],
'Subj': ["Math","Math","Sci","Math"],
'Marks':[100,100,99,99]
})
result_df = df_stu.groupby('Subj')
print(result_df.get_group('Math'))
Output
Name Subj Marks
0 John Math 100
1 Max Math 100
3 Tax Math 99
8. Sample() method to split dataframe in Pandas
The sample() returns a random number of rows and columns from the dataframe and allows us the extract elements from a given axis.
In this example, frac=0.9 select the 90% rows from the dataframe and random_state allows us to get the same random data every time.
Program Example
import pandas as pd
df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
['Max','Phy',99],['Rack','Music',80]],
columns = ['Name','Subj','Mark'])
result_df = df_stu.sample(frac=0.9,random_state=60)
print("split from df_student: \n")
print(result_df)
Output
split from df_student:
Name Subj Mark
2 Max Phy 99
0 John Math 100
3 Rack Music 80
1 Jack Sci 100
Summary
In this post, we have learned how to split Split DataFrame by delimiter and index in Pandas by column name and group by and sample() method.