We can join two or more DataFrames by using merge, We need to identify a matching column in both dataframes to use merger. The linking columns need not have same name.
DataFrame.merge(self,right, how, on, left_on, right_on,left_index, right_index, sort, suffix, copy, indicator,validation)
self: The first DataFrame or left side one. right : The second DataFrame or right side one how :left, right, inner ( default ) , outer. Type of merge on : label or list , column name using which merge will apply left_on: label or list if name in left DataFrame is different than right ight_on: label or list if name in right DataFrame is different than left sort: Boolean , sort the join keys suffixes: Suffix to apply on overlapping columns. copy: Boolean, indicator: (Boolean, default False ) Add indicator to column validate: check the merger type
Type
Table 1
Table 2
Details
LEFT
A + B
D
All from Left ( Table 1 ) and matching from right (Table 2 )
RIGHT
A
C + D
All from Right ( Table 2 ) and matching from left (Table 1 )
INNER
A
D
Matching from Left ( Table 1 ) and right (Table 2 )
OUTER
A + B
C + D
All from Left ( Table 1 ) and all from right (Table 2 )
All the rows of left table ( or table 1 ) and matching rows of right ( Table 2 ) . In this case all rows of sales DataFrame have matching product in product table. Now we have extra columns like price required for our further calculation.
p_id product_x price sale_id c_id product_y qty store
0 1 Hard Disk 80 NaN NaN NaN NaN NaN
1 2 RAM 90 4.0 4.0 RAM 2.0 DEF
2 2 RAM 90 7.0 2.0 RAM 3.0 ABC
3 2 RAM 90 8.0 3.0 RAM 2.0 DEF
4 3 Monitor 75 1.0 2.0 Monitor 2.0 ABC
5 3 Monitor 75 3.0 1.0 Monitor 3.0 ABC
6 3 Monitor 75 5.0 2.0 Monitor 3.0 ABC
7 3 Monitor 75 6.0 3.0 Monitor 2.0 DEF
8 3 Monitor 75 9.0 2.0 Monitor 2.0 ABC
9 4 CPU 55 2.0 2.0 CPU 1.0 DEF
10 5 Keyboard 20 NaN NaN NaN NaN NaN
11 6 Mouse 10 NaN NaN NaN NaN NaN
12 7 Motherboard 50 NaN NaN NaN NaN NaN
13 8 Power supply 20 NaN NaN NaN NaN NaN
Now in our products DataFrame there is no matching row in sales DataFrame, so we are getting NaN as value.
Watch the product_x and product_y columns
p_id product_x price sale_id c_id product_y qty store
0 1 Hard Disk 80 NaN NaN NaN NaN NaN
10 5 Keyboard 20 NaN NaN NaN NaN NaN
11 6 Mouse 10 NaN NaN NaN NaN NaN
12 7 Motherboard 50 NaN NaN NaN NaN NaN
13 8 Power supply 20 NaN NaN NaN NaN NaN
sale_id c_id p_id product_x qty store product_y price
0 1.0 2.0 3 Monitor 2.0 ABC Monitor 75
1 3.0 1.0 3 Monitor 3.0 ABC Monitor 75
2 5.0 2.0 3 Monitor 3.0 ABC Monitor 75
3 6.0 3.0 3 Monitor 2.0 DEF Monitor 75
4 9.0 2.0 3 Monitor 2.0 ABC Monitor 75
5 2.0 2.0 4 CPU 1.0 DEF CPU 55
6 4.0 4.0 2 RAM 2.0 DEF RAM 90
7 7.0 2.0 2 RAM 3.0 ABC RAM 90
8 8.0 3.0 2 RAM 2.0 DEF RAM 90
9 NaN NaN 1 NaN NaN NaN Hard Disk 80
10 NaN NaN 5 NaN NaN NaN Keyboard 20
11 NaN NaN 6 NaN NaN NaN Mouse 10
12 NaN NaN 7 NaN NaN NaN Motherboard 50
13 NaN NaN 8 NaN NaN NaN Power supply 20
sale_id c_id p_id product_x qty store product_y price
0 1.0 2.0 3 Monitor 2.0 ABC Monitor 75
1 3.0 1.0 3 Monitor 3.0 ABC Monitor 75
2 5.0 2.0 3 Monitor 3.0 ABC Monitor 75
3 6.0 3.0 3 Monitor 2.0 DEF Monitor 75
4 9.0 2.0 3 Monitor 2.0 ABC Monitor 75
5 2.0 2.0 4 CPU 1.0 DEF CPU 55
6 4.0 4.0 2 RAM 2.0 DEF RAM 90
7 7.0 2.0 2 RAM 3.0 ABC RAM 90
8 8.0 3.0 2 RAM 2.0 DEF RAM 90
9 NaN NaN 1 NaN NaN NaN Hard Disk 80
10 NaN NaN 5 NaN NaN NaN Keyboard 20
11 NaN NaN 6 NaN NaN NaN Mouse 10
12 NaN NaN 7 NaN NaN NaN Motherboard 50
13 NaN NaN 8 NaN NaN NaN Power supply 20
Merge Without common key
We may require one cross join where we don't have any key to use on. For this we will add one new key to both DataFrames and then merge the two. After merging we can remove the newly added column.