merge: Pandas DataFrame

We can join two or more DataFrames by using merge, We need to identify a matching column in both dataframes to use mreger. The linking columns need not have same name.
Merge DataFrame on a column
DataFrame.merge(self,right, how, on, left_on, right_on,left_index, right_index, sort, suffix, copy, indicator,validation)

self: The first DataFrame or left side one.
right : The second DataFrame or right side one
how :left, reight, inner ( default ) , outer. Type of merge
on : label or list , column name using which merge will apply
left_on: label or list if name in left DataFrame is different than right
ight_on: label or list if name in right DataFrame is different than left
sort: boolean , sort the join keys
suffixes: Suffix to apply on overlapping columns.
copy: bookean,
indicator: (boolean, default False ) Add indicator to column
validate: check the merger type

Merge DataFrame Inner, left, right ,outer
TypeTable 1Table 2Details
LEFTA + B D All from Left ( Table 1 ) and matching from right (Table 2 )
RIGHTA C + D All from Right ( Table 2 ) and matching from left (Table 1 )
INNERA D Matching from Left ( Table 1 ) and right (Table 2 )
OUTERA + B C + D All from Left ( Table 1 ) and all from right (Table 2 )
Download sample DataFrame for Sales, Product and Customer here.

how = 'left'

Merge how = left
All the rows of left table ( or table 1 ) and matching rows of right ( Table 2 ) . In this case all rows of sales DataFrame have matching product in product table. Now we have extra columns like price required for our further calculation.
my_data=pd.merge(sales,products,on='p_id',how='left')
print(my_data)
Output is here
   sale_id  c_id  p_id product_x  qty store product_y  price
0        1     2     3   Monitor    2   ABC   Monitor     75
1        2     2     4       CPU    1   DEF       CPU     55
2        3     1     3   Monitor    3   ABC   Monitor     75
3        4     4     2       RAM    2   DEF       RAM     90
4        5     2     3   Monitor    3   ABC   Monitor     75
5        6     3     3   Monitor    2   DEF   Monitor     75
6        7     2     2       RAM    3   ABC       RAM     90
7        8     3     2       RAM    2   DEF       RAM     90
8        9     2     3   Monitor    2   ABC   Monitor     75
We will change the tables to know about the products which are not there in our sales dataframe.
my_data=pd.merge(products,sales,on='p_id',how='left')
Output is here
    p_id     product_x  price  sale_id  c_id product_y  qty store
0      1     Hard Disk     80      NaN   NaN       NaN  NaN   NaN
1      2           RAM     90      4.0   4.0       RAM  2.0   DEF
2      2           RAM     90      7.0   2.0       RAM  3.0   ABC
3      2           RAM     90      8.0   3.0       RAM  2.0   DEF
4      3       Monitor     75      1.0   2.0   Monitor  2.0   ABC
5      3       Monitor     75      3.0   1.0   Monitor  3.0   ABC
6      3       Monitor     75      5.0   2.0   Monitor  3.0   ABC
7      3       Monitor     75      6.0   3.0   Monitor  2.0   DEF
8      3       Monitor     75      9.0   2.0   Monitor  2.0   ABC
9      4           CPU     55      2.0   2.0       CPU  1.0   DEF
10     5      Keyboard     20      NaN   NaN       NaN  NaN   NaN
11     6         Mouse     10      NaN   NaN       NaN  NaN   NaN
12     7   Motherboard     50      NaN   NaN       NaN  NaN   NaN
13     8  Power supply     20      NaN   NaN       NaN  NaN   NaN
Now in our products DataFrame there is no matching row in sales DataFrame, so we are getting NaN as value.
Watch the product_x and product_y columns
my_data=pd.merge(products,sales,on='p_id',how='left')
print(my_data[my_data['sale_id'].isnull()])
Output
    p_id     product_x  price  sale_id  c_id product_y  qty store
0      1     Hard Disk     80      NaN   NaN       NaN  NaN   NaN
10     5      Keyboard     20      NaN   NaN       NaN  NaN   NaN
11     6         Mouse     10      NaN   NaN       NaN  NaN   NaN
12     7   Motherboard     50      NaN   NaN       NaN  NaN   NaN
13     8  Power supply     20      NaN   NaN       NaN  NaN   NaN

how='right'

Merge how = right
With how='right' all rows of right DataFrame and matching rows of left DataFrame will be displayed.
my_data=pd.merge(sales,products,on='p_id',how='right')
print(my_data)
Output

    sale_id  c_id  p_id product_x  qty store     product_y  price
0       1.0   2.0     3   Monitor  2.0   ABC       Monitor     75
1       3.0   1.0     3   Monitor  3.0   ABC       Monitor     75
2       5.0   2.0     3   Monitor  3.0   ABC       Monitor     75
3       6.0   3.0     3   Monitor  2.0   DEF       Monitor     75
4       9.0   2.0     3   Monitor  2.0   ABC       Monitor     75
5       2.0   2.0     4       CPU  1.0   DEF           CPU     55
6       4.0   4.0     2       RAM  2.0   DEF           RAM     90
7       7.0   2.0     2       RAM  3.0   ABC           RAM     90
8       8.0   3.0     2       RAM  2.0   DEF           RAM     90
9       NaN   NaN     1       NaN  NaN   NaN     Hard Disk     80
10      NaN   NaN     5       NaN  NaN   NaN      Keyboard     20
11      NaN   NaN     6       NaN  NaN   NaN         Mouse     10
12      NaN   NaN     7       NaN  NaN   NaN   Motherboard     50
13      NaN   NaN     8       NaN  NaN   NaN  Power supply     20

how='inner'

Merge how = inner
Matching rows of both the tables.
my_data=pd.merge(sales,products,on='p_id',how='inner')
Output
   sale_id  c_id  p_id product_x  qty store product_y  price
0        1     2     3   Monitor    2   ABC   Monitor     75
1        3     1     3   Monitor    3   ABC   Monitor     75
2        5     2     3   Monitor    3   ABC   Monitor     75
3        6     3     3   Monitor    2   DEF   Monitor     75
4        9     2     3   Monitor    2   ABC   Monitor     75
5        2     2     4       CPU    1   DEF       CPU     55
6        4     4     2       RAM    2   DEF       RAM     90
7        7     2     2       RAM    3   ABC       RAM     90
8        8     3     2       RAM    2   DEF       RAM     90

how='outer'

Merge how = outer
Joins all the records of both the tables irrespective of matching or not.
my_data=pd.merge(sales,products,on='p_id',how='outer')
Output
    sale_id  c_id  p_id product_x  qty store     product_y  price
0       1.0   2.0     3   Monitor  2.0   ABC       Monitor     75
1       3.0   1.0     3   Monitor  3.0   ABC       Monitor     75
2       5.0   2.0     3   Monitor  3.0   ABC       Monitor     75
3       6.0   3.0     3   Monitor  2.0   DEF       Monitor     75
4       9.0   2.0     3   Monitor  2.0   ABC       Monitor     75
5       2.0   2.0     4       CPU  1.0   DEF           CPU     55
6       4.0   4.0     2       RAM  2.0   DEF           RAM     90
7       7.0   2.0     2       RAM  3.0   ABC           RAM     90
8       8.0   3.0     2       RAM  2.0   DEF           RAM     90
9       NaN   NaN     1       NaN  NaN   NaN     Hard Disk     80
10      NaN   NaN     5       NaN  NaN   NaN      Keyboard     20
11      NaN   NaN     6       NaN  NaN   NaN         Mouse     10
12      NaN   NaN     7       NaN  NaN   NaN   Motherboard     50
13      NaN   NaN     8       NaN  NaN   NaN  Power supply     20
Pandas DataFrame


plus2net.com



Post your comments , suggestion , error , requirements etc here




We use cookies to improve your browsing experience. . Learn more
HTML MySQL PHP JavaScript ASP Photoshop Articles FORUM . Contact us
©2000-2019 plus2net.com All rights reserved worldwide Privacy Policy Disclaimer