Split Pandas DataFrame by rows and columns

Pandas

The data frame is a tabular form data structure. Instead of the need for a whole dataframe, we need to split it based on rows and columns. In this post, Split Pandas DataFrame by rows and columns by index, delimiters, column name with multiple ways and examples.

1. split Pandas dataframe column by delimiter


This Dataframe contains Mark column values with delimiter hyphen(-). We are delimiting hyphen( – ) from each value of the Math column and splitting it into two-columns Math and Mark_ (delimited values column).

We can use any of the delimiters(, – / ) and many more as per requirement.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','num-100'],['Jack','Sci','x-100'],
                   ['Max','Phy','f-99'],['Rack','Music','num-80']], 
                  columns = ['Name','Subj','Mark'])

#column split based on delimiter

df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('-',expand=True)


print(df_stu)

Output

   Name   Subj Mark   Mark_
0  John   Math  100    num
1  Jack    Sci  100     x
2   Max    Phy   99     f
3  Rack  Music   80     num

2.split Pandas dataframe column by mutiple delimiter


Instead of a single delimiter, we need to delimit columns with multiple delimiters. We are splitting the dataframe column Math with multiple delimiters (‘/|;|_|%|-‘)

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math','num%100'],['Jack','Sci','x/100'],
                  ['Max','Phy','f_99'],['Rack','Music','num;80']], 
                  columns = ['Name','Subj','Mark'])


df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('/|;|_|%|-',expand=True)                 


print(df_stu)

Output

   Name   Subj Mark  Mark_
0  John   Math  100    num
1  Jack    Sci  100      x
2   Max    Phy   99      f
3  Rack  Music   80    num

3. iloc() method to split dataframe by mutiple rows index


The dataframe iloc() function is used to slice the dataframe and select entries based on the index range of rows and columns. If a colon(:) is passed as an index range for rows and columns then all entries of corresponding rows and columns data will be included in dataframe output.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
                   ['Max','Phy',99],['Rack','Music',80]], 
                  columns = ['Name','Subj','Mark'])

res_df  = df_stu.iloc[1:,:] 
res_df1  = df_stu.iloc[:3,:]
print(res_df,'\n')
print(res_df1)    
 

Output

   Name   Subj  Mark
1  Jack    Sci   100
2   Max    Phy    99
3  Rack  Music    80 

   Name  Subj  Mark
0  John  Math   100
1  Jack   Sci   100
2   Max   Phy    99

4. Pandas Split dataframe by list of indexes


In this example, we are using the Dataframe iloc() method to split the dataframe rows based on the List range of indexes.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
                   ['Max','Phy',99],['Rack','Music',80]], 
                  columns = ['Name','Subj','Mark'])

res_df  = df_stu.iloc[[2 ,0 ] , : ] 
print(res_df,'\n')

Output

   Name  Subj  Mark
2   Max   Phy    99
0  John  Math   100

5.iloc() to split dataframe by columns index


In this example, we are using the DataFrame iloc() method to split the dataframe based on columns indexes. We are selecting the column index range(0 to 2) and (0 to 3).

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
                   ['Max','Phy',99],['Rack','Music',80]], 
                  columns = ['Name','Subj','Mark'])


#spliting dataframe by columns
res_df  = df_stu.iloc[:,:2]

#spliting dataframe by columns
res_df1  = df_stu.iloc[:,:3]

print(res_df,'\n')
print(res_df1)    

Output

   Name   Subj
0  John   Math
1  Jack    Sci
2   Max    Phy
3  Rack  Music 

   Name   Subj  Mark
0  John   Math   100
1  Jack    Sci   100
2   Max    Phy    99
3  Rack  Music    80

6. Pandas split dataframe by column Name


In this example instead of index, we are splitting dataframe based on column label.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
                   ['Max','Phy',99],['Rack','Music',80]], 
                  columns = ['Name','Subj','Mark'])

res_df  = df_stu[['Subj','Mark']]
   
print(res_df,'\n')

Output

    Subj  Mark
0   Math   100
1    Sci   100
2    Phy    99
3  Music    80 

7. Split dataframe Using Groupby


The Python pandas groupby() method is used to group the data in the dataframe based on category and is used to split the data based on some conditions on the group.

In this example, we are using groupby() to group data based on “Subj” and selecting the value of subject “Math”.

df_stu = pd.DataFrame({
    'Name': ["John","Max","Rack","Tax"],
    'Subj': ["Math","Math","Sci","Math"],
    'Marks':[100,100,99,99]
})


result_df = df_stu.groupby('Subj')


print(result_df.get_group('Math'))

Output

   Name  Subj  Marks
0  John  Math    100
1   Max  Math    100
3   Tax  Math     99

8. Sample() method to split dataframe in Pandas


The sample() returns a random number of rows and columns from the dataframe and allows us the extract elements from a given axis.

In this example, frac=0.9 select the 90% rows from the dataframe and random_state allows us to get the same random data every time.

Program Example

import pandas as pd

df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100],
                   ['Max','Phy',99],['Rack','Music',80]], 
                  columns = ['Name','Subj','Mark'])



result_df = df_stu.sample(frac=0.9,random_state=60)

print("split from df_student: \n")
print(result_df)


Output

split from df_student: 

   Name   Subj  Mark
2   Max    Phy    99
0  John   Math   100
3  Rack  Music    80
1  Jack    Sci   100

Summary

In this post, we have learned how to split Split DataFrame by delimiter and index in Pandas by column name and group by and sample() method.