How to drop duplicate columns Pandas dataframe

In this post, we will learn How to drop duplicate columns in Pandas dataframe. We will use the Pandas library, so to use it first we have to install it on the local system by using the pip command “pip install pandas” and import it into our code by using “import pandas as pd” to use its functions

Pandas drop_duplicates()


The pd.drop_duplicates() is used to drop/remove duplicates from the dataframe. During data analysis, these functions return index objects after removing duplicates. In this function, Even we can have the choice to choose which occurrence of duplicate we want to keep in the dataframe.

Syntax

drop_duplicates(keep='last')

Parameters

  • The keep parameters take any of these values and defualt is ‘first’
  • ‘first’: Remove all duplicates and keep the first occurrence
  • ‘last’: Remove all duplicates and keep the last occurrence.
  • ‘False’: Remove all duplicates from the dataframe.

1. How to drop duplicate columns Pandas dataframe


In this example, we are using drop_duplicates() to drop all duplicated columns from the dataframe. The dataframe contains ‘name’ and ‘Fee’ as duplicate columns calling drop_duplicates() will drop all duplicated columns from the given DataFrame.

import pandas as pd

   
data = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Name': ['Rama', 'Rack', 'Max', 'David'],
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],
    'Fee':[100,200,300,400],
    'Tution_Fee':[400,500,600,700]
}
 
dfobj = pd.DataFrame(data)

dfobj = dfobj.drop_duplicates()
print(dfobj)

Output

  Name  Marks  Fee  Tution_Fee
0   Jack     97  100         400
1   Rack     97  200         500
2    Max    100  300         600
3  David    100  400         700

3. How to drop duplicate columns in Pandas dataframe


In this example we have to pass the keep parameter values keep=’last’ to keep the last occurrence of duplicate columns and delete all except this. To keep the first occurrence of duplicate columns and remove all, we need to change the keep parameter values to keep=’First’.

import pandas as pd

   
data = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Name': ['Rama', 'Rack', 'Max', 'David'],
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],
    'Fee':[100,200,300,400],
    'Tution_Fee':[400,500,600,700]
}
 
dfobj = pd.DataFrame(data)

dfobj = dfobj.drop_duplicates(keep='last')
print(dfobj)

Output

    Name  Marks  Fee  Tution_Fee
0   Rama     97  100         400
1   Rack     97  200         500
2    Max    100  300         600
3  David    100  400         700

4. How to drop duplicate columns Pandas dataframe


The df. columns.duplicated() return a boolean series of True and False. In the case of False values mean columns are unique and the True mean data frame column is duplicate.The code [~dfobj.columns.duplicated()] will select the non-duplicate columns.

import pandas as pd

   
data = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],
    'Fee':[100,200,300,400],
    'Tution_Fee':[400,500,600,700]
}
 
dfobj = pd.DataFrame(data)

dfobj = dfobj.loc[:,~dfobj.columns.duplicated()].copy()

print(dfobj)

Output

  Name  Marks  Fee  Tution_Fee
0   Jack     97  100         400
1   Rack     97  200         500
2    Max    100  300         600
3  David    100  400         700

5. Lambda and apply to drop duplicate columns Pandas dataframe


The python apply() function takes a function and applies it to the entire dataframe on the requested axis. In this below example we are applying duplicated() function to all columns of the dataframe to drop duplicated columns of the dataframe.

import pandas as pd

   
data = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Name': ['Rama', 'Rack', 'Max', 'David'],
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],
    'Fee':[100,200,300,400],
    'Tution_Fee':[400,500,600,700]
}
 
dfobj = pd.DataFrame(data)

dfobj = dfobj.loc[:,~dfobj.apply(lambda x: x.duplicated(),axis=1).all()].copy()
print(dfobj)

Output

    Name  Marks  Fee  Tution_Fee
0   Rama     97  100         400
1   Rack     97  200         500
2    Max    100  300         600
3  David    100  400         700

6. Pandas drop duplicate after merging


Sometimes we have a requirement to drop the duplicated columns while merging two data frames in Pandas. In the below example we have two dataframe that contain the same columns while merging them creates a DataFrame with duplicate columns. We are doping the duplicate columns while merging in the below program.

import pandas as pd

   
data = {
    'Name': ['Rama', 'Rack', 'Max', 'David'],     
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],    
    'Tution_Fee':[400,500,600,700]
}

data_one = {
    'Name': ['Rama', 'Rack', 'Max', 'David'],     
    'Marks':[97,97,100,100],    
    'Fee':[100,200,300,400],
    'Tution_Fee':[400,500,600,700]
}
 
 
dfobj = pd.DataFrame(data)
dfobj2 = pd.DataFrame(data_one)



Merge_df = dfobj.merge(dfobj2,left_index=True, right_index=True,
             how='outer', suffixes=('', '_DROP')).filter(regex='^(?!.*_DROP)')


print(Merge_df)

Output

    Name  Marks  Fee  Tution_Fee
0   Rama     97  100         400
1   Rack     97  200         500
2    Max    100  300         600
3  David    100  400         700

Summary

In this post, we have learned how to drop duplicate columns Pandas dataframe with examples by using different methods of Pandas dataframe that include drop_duplicates() along with apply() and lambda.