In this post, we will learn how to Remove rows with duplicate indices in Pandas or how to remove duplicate first or last index. We will use the Pandas library, so to use it first we have to install it on the local system by using the pip command “pip install pandas” and import it into our code by using “import pandas as pd” to use its functions.
1. Remove rows with duplicate indices Pandas
In this example,First we have got indices from dataframe using df.index and used dfobj[~dfobj.index.duplicated(keep=’first’)] method that returns a series of booleans that show whether each index is duplicated in the dataframe. The not operator is used to reverse the each values in resulted series and return a subset of original dataframe keeping first occurrence of duplicate and remove except this.
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
df3 = dfobj[~dfobj.index.duplicated(keep='first')]
print('dataframe after remove duplicate index:\n',df3)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
1 Jack 100 100 Math
2 Max 100 300 Music
- Drop rows by multiple conditions in Pandas Dataframe
- Drop one or multiple rows in Pandas Dataframe
- How to drop rows with NAN values in DataFrame
- Drop index of Pandas DataFrame in Python
2.Remove rows with duplicate indices Pandas
The pandas reset_index() is used to reset the index and drop_duplicates() is used to drop/remove duplicates from the dataframe. During data analysis, these functions return index objects after removing duplicates. Even we can have the choice to choose which duplicate we want to keep in the dataframe.
Syntax
index.drop_duplicates(keep='last')
Parameters
- The keep parameters takes any of these values and defualt is ‘first’
- ‘first’: Remove all duplicate and keep first occurrence
- ‘Last‘: Remove all duplicates and keep last occurrence.
- ‘False‘: Remove all duplicates from dataframe.
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
result = dfobj.reset_index().drop_duplicates(subset='index', keep='first').set_index('index')
print('dataframe after remove duplicate index:\n',result)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
index
1 Jack 100 100 Math
2 Max 100 300 Music
3.Remove rows with duplicate indices Pandas
In this example we have used index to get indices from dataframe. The dfobj.query[~index.duplicated(keep=’first’)] method returns a series of booleans that shows whether each index has duplicate in dataframe. The not operator is used to reverse each value in the resulted series and return a subset of original dataframe without duplicate.
- To get the last values we need to replace the last statement with this (keep=’last’)”)
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
#
result = dfobj.query("~index.duplicated(keep='first')")
print('dataframe after remove duplicate index:\n',result)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
1 Jack 100 100 Math
2 Max 100 300 Music
4.Remove rows with duplicate indices Pandas
In this example, we have used group by with level =0.In case of Mutiindex groupby function group on particular level or levels. The group.last() function is used to get the last values from each group.
- In case of getting the first occurrence of indexes from each group and removing duplicates, we need to change the statement groupby(level=0).first()
- Other way to select first occurrence of indexes and remove duplicate by using group by is dfobj = dfobj.groupby(dfobj.index).first()
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
result = dfobj.groupby(level=0).last()
print('dataframe after remove duplicate index:\n',result)
#another way to use group by to remove duplicate indices
dfobj = dfobj.groupby(dfobj.index).last()
print('dataframe after remove duplicate index:\n',dfobj)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
1 Rack 100 200 Math
2 Kom 100 400 Phy
5.Remove rows with duplicate indices Pandas keep first
In this example we are selecting the index first indices and removing the duplicate indices.The np.unique() selects the first unique indices and dataframe.iloc[] is used to get the subset of dataframe based on selected indices.
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
idx = np.unique(dfobj.index.values, return_index = True )[1]
dfobj = dfobj.iloc[idx]
print('dataframe after remove duplicate index:\n',dfobj)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
1 Jack 100 100 Math
2 Max 100 300 Music
6.Remove rows with duplicate indices Pandas indices keeping last
In this example we are selecting the unique last indices by using np.unique() and selecting the unique indices values by using iloc[] function of dataframe.
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Jack', 'Rack', 'Max','Kom'],
'Marks':[100,100, 100,100],
'Fee':[100,200,300,400],
'Subject': ['Math', 'Math', 'Music','Phy']
}
dfobj = pd.DataFrame(Student_dict,index =[1,1,2,2])
dfobj = dfobj[::-1]
dfobj = dfobj.iloc[ np.unique( dfobj.index.values, return_index = True )[1] ]
print('dataframe after remove duplicate index:\n',dfobj)
Output
dataframe after remove duplicate index:
Name Marks Fee Subject
1 Rack 100 200 Math
2 Kom 100 400 Phy