How to group by multiple columns Pandas

In this post, we will learn How to group by multiple columns in Pandas. The groupby() function is used to group the same repeated values in a given data and return a Dataframegroupby containing results after applying aggregate functions like sum() and mean(), max(), and count(), min(),median().In this example, we are using the groupby() function by multiple conditions.

1. How to group by multiple columns Pandas and Sum


To group by multiple columns of Pandas Dataframe, We have passed the list of columns [‘Name’,’ Marks’] as parameters to the groupby() function that will group the same values of columns ‘Name’,’ Marks’ and apply the sum() function on columns ‘Fee’, ‘Tution_Fee’.The reset_index() is used to set a new index of Dataframe by applying it to display results properly.

  • Group the data of multiple columns [‘Name’,’ Marks’].
  • The sum() function is applied to the result returned by groupby().
import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,400],    
    'Tution_Fee':[400,400,600,700]
}
  
  
dfobj = pd.DataFrame(data) 
 
dfobj  = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].sum().reset_index()
 
 
 
print(dfobj)

output

  Name  Marks  Fee  Tution_Fee
0   Max    100  300         600
1  Rama     97  600        1500

2. How to group by multiple columns Pandas


In the above example, We have used groupby() with function sum(). We can return the sum of multiple or single columns on groupby using agg().We have used agg() to apply multiple statistical functions like ‘mean’, ‘count’, ‘sum’,’min’,’max’ that will return sum, mean count min and max from multiple columns[‘Fee’, ‘Tution_Fee’] by using group by [‘Name’,’Marks’].

  • To apply a single function using group by change the below statement to dfobj.groupby([‘Name’,’Marks’]) [‘Fee’, ‘Tution_Fee’].agg(‘sum’).
import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 

 
 
dfobj  = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].agg(['mean', 'count', 'sum','min','max'])

 
 
 
print(dfobj)

Output

Fee                      Tution_Fee                      
           mean count  sum  min  max       mean count   sum  min  max
Name Marks                                                           
Max  100    300     1  300  300  300        600     1   600  600  600
Rama 97     100     3  300  100  100        400     3  1200  400  400

3. How to group by multiple columns Pandas count


To achieve the Pandas group by count, First, we have grouped the entire dataframe based on identical values of columns [‘Name’,’Marks’] and applied the count() function to the resulting group data. The reset_index() is used to set the new index of the Dataframe to display the dataframe in a correct format.

import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 

 
 
dfobj  = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].count().reset_index()

print(dfobj)

Output

  Name  Marks  Fee  Tution_Fee
0   Max    100    1           1
1  Rama     97    3           3

4. Pandas groupby and agg() multiple columns


In this example, we have grouped the entire dataframe based on identical values of columns [‘Name’,’Marks’] and aggregate the values of columns [‘Fee’, ‘Tution_Fee’] using the apply() method of Pandas dataframe. The apply() function is used to apply a function on that data frame.

import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 

fun= {'Fee':['sum','mean'], 'Tution_Fee':['max','min']}
 
dfobj  = dfobj.groupby(['Name','Marks']).agg(fun).reset_index()




print(dfobj)

Output

Name Marks  Fee      Tution_Fee     
               sum mean        max  min
0   Max   100  300  300        600  600
1  Rama    97  300  100        400  400

5. Pandas groupby multiple columns count distinct


Sometimes we have grouped by multiple columns with identical values and we want to count distinct values from the group result data. In this below example We have grouped the same values in a dataframe by using multiple columns [‘Name’,’Marks’] and count distinct by using nunique() method. The reset_index() to set a new index after group data to display the dataframe.

import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 
dfobj  = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].nunique().reset_index()


print(dfobj)

Output

   Name  Marks  Fee  Tution_Fee
0   Max    100    1           1
1  Rama     97    1           1