Pandas merge Dataframes on multiple columns

Python

In this post, we are going to understand how to do Pandas merge Dataframes on multiple columns with code examples that include Pandas Merge without key column name, Pandas Merge on multiple columns with the same name, Pandas Merge on multiple columns with different names. First, we need to understand the merge() function of the pandas dataframe.

Python Pandas merge() function


The merge() function takes data frames and column names as arguments on which the merge has to be performed. The default join is performed by the merge() function is an inner join. The list of columns that are passed for merge should be present in both dataframe.

Syntax

pd.merge(df_left,df_right, how='inner', on=None, left_on=None, right_on=None)

Parameters

  • df_left :The left dataframe.
  • df_right : The right dataframe.
  • how : It can be any join for these inner join,left join,full join,right join
  • on : if columns name are same in both dataframes ,on parameter is used to pass the list of columns.
  • if the column names are different Then ‘left_on’ and ‘right_on’ parmeters is use to pass columns list.
    • ‘left_on’ : is the name of the columns in the left dataframe.
    • ‘right_on’ : is the name of columns in the right dataframe.

1. Pandas Merge without key column name


In this example, we are merging the dataframe without specifying the column name. The merge() method gather all the common columns in both dataframe and give result as a single dataframe.

We have not specified any join, so by default merge() function will perform an inner join.

Program Example

#program to merge Dataframe on mutiple columns in Pandas 

import pandas as pd
Employee = pd.DataFrame({
    'Emp_ID':[211,212,213,214,215,216],
    'Emp_Dept':['IT','IT','IT','admin','HR','admin'],
    'Dept_ID':[1,1,1,2,3,2],
    'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
    'Emp_Salary':[1000,1300,1500,2999,14999,1600]

})
 

 
 
Dept = pd.DataFrame({
    'Dept_ID':[1,2,3,4,5,6],
    'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
    'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
      
        
})
print(pd.merge(Employee, Dept))

Output

 Emp_ID Emp_Dept  Dept_ID Emp_Name  Emp_Salary Dept_Name Dept_manager
0     211       IT        1     Jack        1000        IT          Tom
1     212       IT        1     Rack        1300        IT          Tom
2     213       IT        1      Tax        1500        IT          Tom
3     214    admin        2      Max        2999     Admin         Yack
4     216    admin        2      TOM        1600     Admin         Yack
5     215       HR        3      REX       14999        HR         Tawn

2.Pandas Merge on multiple columns with same name


In these examples, the names of the columns are present in both data frames and have the same name so we have used the ON parameter to pass the list of columns that we need to merge.

We have specified the left join by using the parameter how = ‘left’.We can change to any join as per need.

Program Example

#program to merge Dataframe on mutiple columns have same columnname in Pandas 

import pandas as pd
Employee = pd.DataFrame({
    'Emp_ID':[211,212,213,214,215,216],
    'Dept_Name':['IT','IT','IT','admin','HR','admin'],
    'Dept_ID':[1,1,1,2,3,2],
    'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
    'Emp_Salary':[1000,1300,1500,2999,14999,1600]

})
 

 
 
Dept = pd.DataFrame({
    'Dept_ID':[1,2,3,4,5,6],
    'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
    'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
      
        
})
print(pd.merge(Employee, Dept, on=['Dept_ID', 'Dept_Name'], how='left'))

Output

     Emp_ID Dept_Name  Dept_ID Emp_Name  Emp_Salary Dept_manager
0     211        IT        1     Jack        1000          Tom
1     212        IT        1     Rack        1300          Tom
2     213        IT        1      Tax        1500          Tom
3     214     admin        2      Max        2999          NaN
4     215        HR        3      REX       14999         Tawn
5     216     admin        2      TOM        1600          NaN     

3.Pandas Merge on mutiple columns with different names


In these examples, the names of the columns are present in both data frames but with a different name so we have used the left_on for the left dataframe column name and right_on to pass the list of columns in right dataframe that we need to merge.

Program Example

#program to merge Dataframe on mutiple columns have different columnname in Pandas

import pandas as pd
Employee = pd.DataFrame({
    'Emp_ID':[211,212,213,214,215,216],
    'Emp_Dept':['IT','IT','IT','admin','HR','admin'],
    'Dept_ID':[1,1,1,2,3,2],
    'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
    'Emp_Salary':[1000,1300,1500,2999,14999,1600]

})
 

 
 
Dept = pd.DataFrame({
    'Dept_ID':[1,2,3,4,5,6],
    'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
 'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
      
        
})
print(pd.merge(Employee, Dept, left_on=['Dept_ID', 'Emp_Dept'],right_on=['Dept_ID','Dept_Name'], how='left'))

Output

   Emp_ID Emp_Dept  Dept_ID Emp_Name  Emp_Salary Dept_Name Dept_manager
0     211       IT        1     Jack        1000        IT          Tom
1     212       IT        1     Rack        1300        IT          Tom
2     213       IT        1      Tax        1500        IT          Tom
3     214    admin        2      Max        2999       NaN          NaN
4     215       HR        3      REX       14999        HR         Tawn
5     216    admin        2      TOM        1600       NaN          NaN

Summary

In this post we have explored 3 different ways of how to do Pandas merge on multiple columns with code examples that include Pandas Merge without key column name, Pandas Merge on multiple columns with the same name, Pandas Merge on multiple columns with different names.