Exercise on using Pandas DataFrame with merge and groupby

We have three CSV files.

customer.csv : Two columns, customer id and customer name
products.csv : Three columns ,p_id( product id) , product ( name) and price
sales.csv Six columns , sale_id, c_id ( customer id ), p_id (product_id), qty ( quantity sold) ,store ( name )

Sample data is kept inside the csv files, click the csv file name to download.

Download sample DataFrame for Sales, Product and Customer

From the above data we will produce following reports.

Exercise on Pandas DataFrame
List of products sold
List of quantity sold against each product.
List of quantity and total sales against each product
List of quantity sold against each product and against each store.
List of quantity sold against each Store with total turnover of the store.
List of products which are not sold
List of customers who have not purchased any product.

We will be using Pandas DataFrame methods merger and groupby to generate these reports.

In all the sample codes below if you are using the sample DataFrames of three tables then reading from CSV file is not required. List of products sold

List of products sold

Our sales table will have list of products sold.

import pandas as pd 
sales=pd.read_csv("sales.csv") # reading from csv file
print(sales)

Output

   sale_id  c_id  p_id  product  qty store
0        1     2     3  Monitor    2   ABC
1        2     2     4      CPU    1   DEF
2        3     1     3  Monitor    3   ABC
3        4     4     2      RAM    2   DEF
4        5     2     3  Monitor    3   ABC
5        6     3     3  Monitor    2   DEF
6        7     2     2      RAM    3   ABC
7        8     3     2      RAM    2   DEF
8        9     2     3  Monitor    2   ABC

2. List of quantity sold against each products

We will use groupby to count total sale against each product.

print(sales.groupby(['product','p_id'])[['qty']].sum())

Output

              qty
product p_id     
CPU     4       1
Monitor 3      12
RAM     2       7

3. List of quantity and total sales against each product

You can un-comment the print commands and check the intermediate results. We used merge to join two DataFrames and to get Product details ( price of the product ).
We added one more column total_sales by multiplying total sales with price.

import pandas as pd 
sales=pd.read_csv("sales.csv") 
#print(sales)
# using groupby get the list of products and its sum sold
my_sale=sales.groupby(['product','p_id', 'store'])[['qty']].sum()
#print(my_sale)
product=pd.read_csv("products.csv")
#print(product)
my_sum=pd.merge(my_sale,product,how='left',on='p_id')
#print(my_sum)
#We added one more column total_sales by multiplying total sales with price. 
my_sum['total_sale']=my_sum['qty']*my_sum['price']
print(my_sum)

Output

   p_id  qty  product  price  total_sale
0     4    1      CPU     55          55
1     3   10  Monitor     75         750
2     3    2  Monitor     75         150
3     2    3      RAM     90         270
4     2    4      RAM     90         360

In above code the output of groupby() is to be indexed by using reset_index()

my_sale=sales.groupby(['product','p_id', 'store'])[['qty']].sum().reset_index()

  product_x  p_id store  qty product_y  price  total_sale
0       CPU     4   DEF    1       CPU     55          55
1   Monitor     3   ABC   10   Monitor     75         750
2   Monitor     3   DEF    2   Monitor     75         150
3       RAM     2   ABC    3       RAM     90         270
4       RAM     2   DEF    4       RAM     90         360

4. List of quantity sold against each product and against each store.

import pandas as pd 
sales=pd.read_csv("sales.csv") 
print(sales.groupby(['product','p_id','store'])[['qty']].sum())

Output

                    qty
product p_id store     
CPU     4    DEF      1
Monitor 3    ABC     10
             DEF      2
RAM     2    ABC      3
             DEF      4

5. List of quantity sold against each Store with total turnover of the store.

We will first find out the total price ( or turnover ) against each sale by multiplying quantity sold with price of each unit.
To get price of each unit ( which is in product dataframe ) we have to merge sales with product DataFrame using product id ( p_id ).

Once we get the total turnover against each sale, we will use groupby method to find out total sales ( turnover ) and quantity against each store.

import pandas as pd 
sales=pd.read_csv("sales.csv") 
#print(sales)
product=pd.read_csv("products.csv")
my_sum=pd.merge(sales,product,how='left',on=['p_id'])
my_sum['sales_total']=my_sum['qty']*my_sum['price']
print(my_sum.groupby(['store'])[['qty','sales_total']].sum())

Output

       qty  sales_total
store                  
ABC     13         1020
DEF      7          565

6.List of products which are not sold

7.List of customers who have not purchased any product.

import pandas as pd 
sales=pd.read_csv("sales.csv") 
customer=pd.read_csv("customer.csv") 
my_data=pd.merge(sales,customer,on='c_id',how='right')
my_data=my_data[my_data['sale_id'].isnull()] # products which are not sold
#print(my_data)
print(my_data.loc[:,'Customer']) # to display customers who has not purchased

Output

9     King
10    Ronn
11     Jem
12     Tom
Name: Customer, dtype: object

Pandas Pandas DataFrame

Numpy arrays Python & MySQL Python- Tutorials

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.