In this post, we will learn How to drop duplicate columns in Pandas dataframe. We will use the Pandas library, so to use it first we have to install it on the local system by using the pip command “pip install pandas” and import it into our code by using “import pandas as pd” to use its functions
Pandas drop_duplicates()
The pd.drop_duplicates() is used to drop/remove duplicates from the dataframe. During data analysis, these functions return index objects after removing duplicates. In this function, Even we can have the choice to choose which occurrence of duplicate we want to keep in the dataframe.
Syntax
drop_duplicates(keep='last')
Parameters
- The keep parameters take any of these values and defualt is ‘first’
- ‘first’: Remove all duplicates and keep the first occurrence
- ‘last’: Remove all duplicates and keep the last occurrence.
- ‘False’: Remove all duplicates from the dataframe.
1. How to drop duplicate columns Pandas dataframe
In this example, we are using drop_duplicates() to drop all duplicated columns from the dataframe. The dataframe contains ‘name’ and ‘Fee’ as duplicate columns calling drop_duplicates() will drop all duplicated columns from the given DataFrame.
import pandas as pd
data = {
'Name': ['Jack', 'Rack', 'Max', 'David'],
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.drop_duplicates()
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Jack 97 100 400
1 Rack 97 200 500
2 Max 100 300 600
3 David 100 400 700
3. How to drop duplicate columns in Pandas dataframe
In this example we have to pass the keep parameter values keep=’last’ to keep the last occurrence of duplicate columns and delete all except this. To keep the first occurrence of duplicate columns and remove all, we need to change the keep parameter values to keep=’First’.
import pandas as pd
data = {
'Name': ['Jack', 'Rack', 'Max', 'David'],
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.drop_duplicates(keep='last')
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Rama 97 100 400
1 Rack 97 200 500
2 Max 100 300 600
3 David 100 400 700
4. How to drop duplicate columns Pandas dataframe
The df. columns.duplicated() return a boolean series of True and False. In the case of False values mean columns are unique and the True mean data frame column is duplicate.The code [~dfobj.columns.duplicated()] will select the non-duplicate columns.
import pandas as pd
data = {
'Name': ['Jack', 'Rack', 'Max', 'David'],
'Name': ['Jack', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.loc[:,~dfobj.columns.duplicated()].copy()
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Jack 97 100 400
1 Rack 97 200 500
2 Max 100 300 600
3 David 100 400 700
5. Lambda and apply to drop duplicate columns Pandas dataframe
The python apply() function takes a function and applies it to the entire dataframe on the requested axis. In this below example we are applying duplicated() function to all columns of the dataframe to drop duplicated columns of the dataframe.
import pandas as pd
data = {
'Name': ['Jack', 'Rack', 'Max', 'David'],
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
dfobj = dfobj.loc[:,~dfobj.apply(lambda x: x.duplicated(),axis=1).all()].copy()
print(dfobj)
Output
Name Marks Fee Tution_Fee
0 Rama 97 100 400
1 Rack 97 200 500
2 Max 100 300 600
3 David 100 400 700
6. Pandas drop duplicate after merging
Sometimes we have a requirement to drop the duplicated columns while merging two data frames in Pandas. In the below example we have two dataframe that contain the same columns while merging them creates a DataFrame with duplicate columns. We are doping the duplicate columns while merging in the below program.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
data_one = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
dfobj2 = pd.DataFrame(data_one)
Merge_df = dfobj.merge(dfobj2,left_index=True, right_index=True,
how='outer', suffixes=('', '_DROP')).filter(regex='^(?!.*_DROP)')
print(Merge_df)
Output
Name Marks Fee Tution_Fee
0 Rama 97 100 400
1 Rack 97 200 500
2 Max 100 300 600
3 David 100 400 700
Summary
In this post, we have learned how to drop duplicate columns Pandas dataframe with examples by using different methods of Pandas dataframe that include drop_duplicates() along with apply() and lambda.