In this post, we will learn How to group by multiple columns in Pandas. The groupby() function is used to group the same repeated values in a given data and return a Dataframegroupby containing results after applying aggregate functions like sum() and mean(), max(), and count(), min(),median().In this example, we are using the groupby() function by multiple conditions.
1. How to group by multiple columns Pandas and Sum
To group by multiple columns of Pandas Dataframe, We have passed the list of columns [‘Name’,’ Marks’] as parameters to the groupby() function that will group the same values of columns ‘Name’,’ Marks’ and apply the sum() function on columns ‘Fee’, ‘Tution_Fee’.The reset_index() is used to set a new index of Dataframe by applying it to display results properly.
- Group the data of multiple columns [‘Name’,’ Marks’].
- The sum() function is applied to the result returned by groupby().
import pandas as pd
data = {
'Name': ['Rama', 'Rama', 'Max', 'Rama'],
'Marks':[97,97,100,97],
'Fee':[100,100,300,400],
'Tution_Fee':[400,400,600,700]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].sum().reset_index()
print(dfobj)
output
Name Marks Fee Tution_Fee
0 Max 100 300 600
1 Rama 97 600 1500
- Pandas sum rows by columns(6 ways)
- Pandas sum columns by multiple conditions
- How to Pandas sum all columns except one
- How to sum columns with Nan Pandas Dataframe
- How to sum specific rows Pandas dataframe
- How to Sum rows by condition Pandas Dataframe
- How to sum Pandas columns into new column
2. How to group by multiple columns Pandas
In the above example, We have used groupby() with function sum(). We can return the sum of multiple or single columns on groupby using agg().We have used agg() to apply multiple statistical functions like ‘mean’, ‘count’, ‘sum’,’min’,’max’ that will return sum, mean count min and max from multiple columns[‘Fee’, ‘Tution_Fee’] by using group by [‘Name’,’Marks’].
- To apply a single function using group by change the below statement to dfobj.groupby([‘Name’,’Marks’]) [‘Fee’, ‘Tution_Fee’].agg(‘sum’).
import pandas as pd
data = {
'Name': ['Rama', 'Rama', 'Max', 'Rama'],
'Marks':[97,97,100,97],
'Fee':[100,100,300,100],
'Tution_Fee':[400,400,600,400]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].agg(['mean', 'count', 'sum','min','max'])
print(dfobj)
Output
Fee Tution_Fee
mean count sum min max mean count sum min max
Name Marks
Max 100 300 1 300 300 300 600 1 600 600 600
Rama 97 100 3 300 100 100 400 3 1200 400 400
3. How to group by multiple columns Pandas count
To achieve the Pandas group by count, First, we have grouped the entire dataframe based on identical values of columns [‘Name’,’Marks’] and applied the count() function to the resulting group data. The reset_index() is used to set the new index of the Dataframe to display the dataframe in a correct format.
import pandas as pd
data = {
'Name': ['Rama', 'Rama', 'Max', 'Rama'],
'Marks':[97,97,100,97],
'Fee':[100,100,300,100],
'Tution_Fee':[400,400,600,400]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].count().reset_index()
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Max 100 1 1
1 Rama 97 3 3
4. Pandas groupby and agg() multiple columns
In this example, we have grouped the entire dataframe based on identical values of columns [‘Name’,’Marks’] and aggregate the values of columns [‘Fee’, ‘Tution_Fee’] using the apply() method of Pandas dataframe. The apply() function is used to apply a function on that data frame.
import pandas as pd
data = {
'Name': ['Rama', 'Rama', 'Max', 'Rama'],
'Marks':[97,97,100,97],
'Fee':[100,100,300,100],
'Tution_Fee':[400,400,600,400]
}
dfobj = pd.DataFrame(data)
fun= {'Fee':['sum','mean'], 'Tution_Fee':['max','min']}
dfobj = dfobj.groupby(['Name','Marks']).agg(fun).reset_index()
print(dfobj)
Output
Name Marks Fee Tution_Fee
sum mean max min
0 Max 100 300 300 600 600
1 Rama 97 300 100 400 400
5. Pandas groupby multiple columns count distinct
Sometimes we have grouped by multiple columns with identical values and we want to count distinct values from the group result data. In this below example We have grouped the same values in a dataframe by using multiple columns [‘Name’,’Marks’] and count distinct by using nunique() method. The reset_index() to set a new index after group data to display the dataframe.
import pandas as pd
data = {
'Name': ['Rama', 'Rama', 'Max', 'Rama'],
'Marks':[97,97,100,97],
'Fee':[100,100,300,100],
'Tution_Fee':[400,400,600,400]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].nunique().reset_index()
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Max 100 1 1
1 Rama 97 1 1