In this post, we will understand How to merge DataFrames on multiple columns Pandas with code examples that include Pandas Merge without key column name, Pandas Merging in various columns with the same name, and Pandas Merging on numerous columns with different names. First, we need to understand the merge() function of the pandas dataframe.
Python Pandas merge() function
The merge() function takes data frames and column names as arguments on which the merge has to be performed. The default join is performed by the merge() function is an inner join. The list of columns that are passed for merge should be present in both dataframe.
Syntax
pd.merge(df_left,df_right, how='inner', on=None, left_on=None, right_on=None)
Parameters
- df_left :The left dataframe.
- df_right : The right dataframe.
- how : It can be any join for these inner join,left join,full join,right join
- on : if columns name are same in both dataframes ,on parameter is used to pass the list of columns.
- if the column names are different Then ‘left_on’ and ‘right_on’ parmeters is use to pass columns list.
- ‘left_on’ : is the name of the columns in the left dataframe.
- ‘right_on’ : is the name of columns in the right dataframe.
1. How to merge DataFrames on multiple columns Pandas without column name
In this example, we are merging the dataframe without specifying the column name. The merge() method gather all the common columns in both dataframe and give result as a single dataframe.
We have not specified any join, so by default merge() function will perform an inner join.
Program Example
#program to merge Dataframe on mutiple columns in Pandas
import pandas as pd
Employee = pd.DataFrame({
'Emp_ID':[211,212,213,214,215,216],
'Emp_Dept':['IT','IT','IT','admin','HR','admin'],
'Dept_ID':[1,1,1,2,3,2],
'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
'Emp_Salary':[1000,1300,1500,2999,14999,1600]
})
Dept = pd.DataFrame({
'Dept_ID':[1,2,3,4,5,6],
'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
})
print(pd.merge(Employee, Dept))
Output
Emp_ID Emp_Dept Dept_ID Emp_Name Emp_Salary Dept_Name Dept_manager
0 211 IT 1 Jack 1000 IT Tom
1 212 IT 1 Rack 1300 IT Tom
2 213 IT 1 Tax 1500 IT Tom
3 214 admin 2 Max 2999 Admin Yack
4 216 admin 2 TOM 1600 Admin Yack
5 215 HR 3 REX 14999 HR Tawn
- Select rows and columns by name or index in Pandas
- Select rows by multiple conditions in Pandas
- Find unique value in column of Pandas DataFrame
- 6 ways to Get List of column names in Pandas
- How to check Pandas Dataframe is empty
- Find max value index in rows and columns of Dataframe
2. How to merge DataFrames on multiple columns Pandas with same name
In these examples, the names of the columns are present in both data frames and have the same name so we have used the ON parameter to pass the list of columns we need to merge.
We have specified the left join by using the parameter how = ‘left’.We can change to any join as per need.
Program Example
#program to merge Dataframe on mutiple columns have same columnname in Pandas
import pandas as pd
Employee = pd.DataFrame({
'Emp_ID':[211,212,213,214,215,216],
'Dept_Name':['IT','IT','IT','admin','HR','admin'],
'Dept_ID':[1,1,1,2,3,2],
'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
'Emp_Salary':[1000,1300,1500,2999,14999,1600]
})
Dept = pd.DataFrame({
'Dept_ID':[1,2,3,4,5,6],
'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
})
print(pd.merge(Employee, Dept, on=['Dept_ID', 'Dept_Name'], how='left'))
Output
Emp_ID Dept_Name Dept_ID Emp_Name Emp_Salary Dept_manager
0 211 IT 1 Jack 1000 Tom
1 212 IT 1 Rack 1300 Tom
2 213 IT 1 Tax 1500 Tom
3 214 admin 2 Max 2999 NaN
4 215 HR 3 REX 14999 Tawn
5 216 admin 2 TOM 1600 NaN
3. How to merge DataFrames on multiple columns Pandas with different names
In these examples, the names of the columns are present in both data frames but with a different name so we have used the left_on for the left dataframe column name and right_on to pass the list of columns in right dataframe that we need to merge.
Program Example
#program to merge Dataframe on mutiple columns have different columnname in Pandas
import pandas as pd
Employee = pd.DataFrame({
'Emp_ID':[211,212,213,214,215,216],
'Emp_Dept':['IT','IT','IT','admin','HR','admin'],
'Dept_ID':[1,1,1,2,3,2],
'Emp_Name':['Jack','Rack','Tax','Max','REX','TOM'],
'Emp_Salary':[1000,1300,1500,2999,14999,1600]
})
Dept = pd.DataFrame({
'Dept_ID':[1,2,3,4,5,6],
'Dept_Name':['IT','Admin','HR','Marketing','ERD','CPM'],
'Dept_manager':['Tom','Yack','Tawn','bruise','Hon','Sam']
})
print(pd.merge(Employee, Dept, left_on=['Dept_ID', 'Emp_Dept'],right_on=['Dept_ID','Dept_Name'], how='left'))
Output
Emp_ID Emp_Dept Dept_ID Emp_Name Emp_Salary Dept_Name Dept_manager
0 211 IT 1 Jack 1000 IT Tom
1 212 IT 1 Rack 1300 IT Tom
2 213 IT 1 Tax 1500 IT Tom
3 214 admin 2 Max 2999 NaN NaN
4 215 HR 3 REX 14999 HR Tawn
5 216 admin 2 TOM 1600 NaN NaN
Summary
In this post we have explored 3 different ways of How to merge DataFrames on multiple columns Pandas with code examples that include Pandas Merging without key column name, Pandas Merge on multiple columns with the same name, Pandas Merge on multiple columns with different names.