How to group by count unique value Pandas

In this post, we are going to learn How to group by count unique value Pandas. To install Pandas on the local system by using the pip command “pip install pandas” and import it into our code by using “import pandas as pd” to use its function Dataframe. The Pandas groupby() function is used to group the same repeated values in given data and split the dataframe into different groups. This function returns a Dataframegroupby object.We can apply aggregate() functions sum() and mean(), max(), and count(), min(),median() on grouped dataframe.We can apply single or multiple functions on grouped dataframe passing as an argument to function agg() by e.g ‘sum’ or a list of functions like [‘sum’,’max’,’min’].

1. Pandas groupby single column count distinct


Sometimes we have grouped by multiple columns with identical values and count distinct values from the group resulting in data. In this below example We have grouped the same values in a dataframe by using multiple columns [‘Name’,’Marks’] and counting distinct Values by using nunique() method. The reset_index() to set a new index after group data to display the dataframe.

import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 
dfobj  = dfobj.groupby(['Name','Marks'])['Fee', 'Tution_Fee'].nunique().reset_index()


print(dfobj)

Output

   Name  Marks  Fee  Tution_Fee
0   Max    100    1           1
1  Rama     97    1           1

2. Pandas groupby multiple columns count distinct


In this example, First, we have grouped the same values of Dataframe by column (‘Name’) and counted distinct values by using the unique() method. The reset_index() Method is used to reset the index of the Dataframe.

import pandas as pd
 
    
data = {
    'Name': ['Rama', 'Rama', 'Max', 'Rama'],     
    'Marks':[97,97,100,97],    
    'Fee':[100,100,300,100],    
    'Tution_Fee':[400,400,600,400]
}
 
  
  
dfobj = pd.DataFrame(data)
 
 
dfobj  = dfobj.groupby('Name')['Fee'].unique().reset_index()



print(dfobj)

Output

   Name    Fee
0   Max  [300]
1  Rama  [100]