We can join two or more DataFrames by using merge, We need to identify a matching column in both dataframes to use merger. The linking columns need not have same name.

DataFrame.merge(self,right, how, on, left_on, right_on,left_index, right_index, sort, suffix, copy, indicator,validation)
self: The first DataFrame or left side one.
right : The second DataFrame or right side one
how :left, right, inner ( default ) , outer. Type of merge
on : label or list , column name using which merge will apply
left_on: label or list if name in left DataFrame is different than right
ight_on: label or list if name in right DataFrame is different than left
sort: Boolean , sort the join keys
suffixes: Suffix to apply on overlapping columns.
copy: Boolean,
indicator: (Boolean, default False ) Add indicator to column
validate: check the merger type

Type | Table 1 | Table 2 | Details |
LEFT | A + B | D | All from Left ( Table 1 ) and matching from right (Table 2 ) |
RIGHT | A | C + D | All from Right ( Table 2 ) and matching from left (Table 1 ) |
INNER | A | D | Matching from Left ( Table 1 ) and right (Table 2 ) |
OUTER | A + B | C + D | All from Left ( Table 1 ) and all from right (Table 2 ) |
« Download sample DataFrame for Sales, Product and Customer here.
how = 'left'

All the rows of left table ( or table 1 ) and matching rows of right ( Table 2 ) . In this case all rows of sales DataFrame have matching product in product table. Now we have extra columns like price required for our further calculation.
my_data=pd.merge(sales,products,on='p_id',how='left')
print(my_data)
Output is here
sale_id c_id p_id product_x qty store product_y price
0 1 2 3 Monitor 2 ABC Monitor 75
1 2 2 4 CPU 1 DEF CPU 55
2 3 1 3 Monitor 3 ABC Monitor 75
3 4 4 2 RAM 2 DEF RAM 90
4 5 2 3 Monitor 3 ABC Monitor 75
5 6 3 3 Monitor 2 DEF Monitor 75
6 7 2 2 RAM 3 ABC RAM 90
7 8 3 2 RAM 2 DEF RAM 90
8 9 2 3 Monitor 2 ABC Monitor 75
We will change the tables to know about the products which are not there in our sales dataframe.
my_data=pd.merge(products,sales,on='p_id',how='left')
Output is here
p_id product_x price sale_id c_id product_y qty store
0 1 Hard Disk 80 NaN NaN NaN NaN NaN
1 2 RAM 90 4.0 4.0 RAM 2.0 DEF
2 2 RAM 90 7.0 2.0 RAM 3.0 ABC
3 2 RAM 90 8.0 3.0 RAM 2.0 DEF
4 3 Monitor 75 1.0 2.0 Monitor 2.0 ABC
5 3 Monitor 75 3.0 1.0 Monitor 3.0 ABC
6 3 Monitor 75 5.0 2.0 Monitor 3.0 ABC
7 3 Monitor 75 6.0 3.0 Monitor 2.0 DEF
8 3 Monitor 75 9.0 2.0 Monitor 2.0 ABC
9 4 CPU 55 2.0 2.0 CPU 1.0 DEF
10 5 Keyboard 20 NaN NaN NaN NaN NaN
11 6 Mouse 10 NaN NaN NaN NaN NaN
12 7 Motherboard 50 NaN NaN NaN NaN NaN
13 8 Power supply 20 NaN NaN NaN NaN NaN
Now in our products DataFrame there is no matching row in sales DataFrame, so we are getting NaN as value.
Watch the product_x and product_y columns
my_data=pd.merge(products,sales,on='p_id',how='left')
print(my_data[my_data['sale_id'].isnull()])
Output
p_id product_x price sale_id c_id product_y qty store
0 1 Hard Disk 80 NaN NaN NaN NaN NaN
10 5 Keyboard 20 NaN NaN NaN NaN NaN
11 6 Mouse 10 NaN NaN NaN NaN NaN
12 7 Motherboard 50 NaN NaN NaN NaN NaN
13 8 Power supply 20 NaN NaN NaN NaN NaN
Read more on isnull() here.
how='right'

With how='right' all rows of right DataFrame and matching rows of left DataFrame will be displayed.
my_data=pd.merge(sales,products,on='p_id',how='right')
print(my_data)
Output
sale_id c_id p_id product_x qty store product_y price
0 1.0 2.0 3 Monitor 2.0 ABC Monitor 75
1 3.0 1.0 3 Monitor 3.0 ABC Monitor 75
2 5.0 2.0 3 Monitor 3.0 ABC Monitor 75
3 6.0 3.0 3 Monitor 2.0 DEF Monitor 75
4 9.0 2.0 3 Monitor 2.0 ABC Monitor 75
5 2.0 2.0 4 CPU 1.0 DEF CPU 55
6 4.0 4.0 2 RAM 2.0 DEF RAM 90
7 7.0 2.0 2 RAM 3.0 ABC RAM 90
8 8.0 3.0 2 RAM 2.0 DEF RAM 90
9 NaN NaN 1 NaN NaN NaN Hard Disk 80
10 NaN NaN 5 NaN NaN NaN Keyboard 20
11 NaN NaN 6 NaN NaN NaN Mouse 10
12 NaN NaN 7 NaN NaN NaN Motherboard 50
13 NaN NaN 8 NaN NaN NaN Power supply 20
how='inner'

Matching rows of both the tables.
my_data=pd.merge(sales,products,on='p_id',how='inner')
Output
sale_id c_id p_id product_x qty store product_y price
0 1 2 3 Monitor 2 ABC Monitor 75
1 3 1 3 Monitor 3 ABC Monitor 75
2 5 2 3 Monitor 3 ABC Monitor 75
3 6 3 3 Monitor 2 DEF Monitor 75
4 9 2 3 Monitor 2 ABC Monitor 75
5 2 2 4 CPU 1 DEF CPU 55
6 4 4 2 RAM 2 DEF RAM 90
7 7 2 2 RAM 3 ABC RAM 90
8 8 3 2 RAM 2 DEF RAM 90
how='outer'

Joins all the records of both the tables irrespective of matching or not.
my_data=pd.merge(sales,products,on='p_id',how='outer')
Output
sale_id c_id p_id product_x qty store product_y price
0 1.0 2.0 3 Monitor 2.0 ABC Monitor 75
1 3.0 1.0 3 Monitor 3.0 ABC Monitor 75
2 5.0 2.0 3 Monitor 3.0 ABC Monitor 75
3 6.0 3.0 3 Monitor 2.0 DEF Monitor 75
4 9.0 2.0 3 Monitor 2.0 ABC Monitor 75
5 2.0 2.0 4 CPU 1.0 DEF CPU 55
6 4.0 4.0 2 RAM 2.0 DEF RAM 90
7 7.0 2.0 2 RAM 3.0 ABC RAM 90
8 8.0 3.0 2 RAM 2.0 DEF RAM 90
9 NaN NaN 1 NaN NaN NaN Hard Disk 80
10 NaN NaN 5 NaN NaN NaN Keyboard 20
11 NaN NaN 6 NaN NaN NaN Mouse 10
12 NaN NaN 7 NaN NaN NaN Motherboard 50
13 NaN NaN 8 NaN NaN NaN Power supply 20
Merge Without common key
We may require one cross join where we don't have any key to use on. For this we will add one new key to both DataFrames and then merge the two. After merging we can remove the newly added column.
import pandas as pd
d1={'NAME':['Alex','Ravi','John'],'AGE':[22,23,21]}
d2={'SUBJECT':['Hindi','English']}
df1=pd.DataFrame(data=d1)
df2=pd.DataFrame(data=d2)
df1['key']=1
df2['key']=1
df1.merge(df2,how='outer',on='key').drop("key", 1)
output
NAME AGE SUBJECT
0 Alex 22 Hindi
1 Alex 22 English
2 Ravi 23 Hindi
3 Ravi 23 English
4 John 21 Hindi
5 John 21 English
« Pandas
« Pandas DataFrame
← Subscribe to our YouTube Channel here