import pandas as pd
my_dict={'sale_id':[1,2,3,4,5,6,7,8,9],
'c_id':[2,2,1,4,2,3,2,3,2],
'p_id':[3,4,3,2,3,3,2,2,3],
'product':['Monitor','CPU','Monitor','RAM','Monitor','Monitor','RAM','RAM','Monitor'],
'qty':[2,1,3,2,3,2,3,2,2],
'store':['ABC','DEF','ABC','DEF','ABC','DEF','ABC','DEF','ABC']}
sales = pd.DataFrame(data=my_dict)
print(sales)
using groupby on product
import pandas as pd
sales=pd.read_csv("sales.csv")
my_sale=sales.groupby(['product'])
print(my_sale.first())
Output is here
sale_id c_id p_id qty store
product
CPU 2 2 4 1 DEF
Monitor 1 2 3 2 ABC
RAM 4 4 2 2 DEF
Along with product we are also getting first matching row values.
my_sale=sales.groupby(['product','store'])
Output
sale_id c_id p_id qty
product store
CPU DEF 2 2 4 1
Monitor ABC 1 2 3 2
DEF 6 3 3 2
RAM ABC 7 2 2 3
DEF 4 4 2 2
Using as_index=False
. Note the difference in product column.
my_sale=sales.groupby(['product','store'],as_index=False)
product store sale_id c_id p_id qty
0 CPU DEF 2 2 4 1
1 Monitor ABC 1 2 3 2
2 Monitor DEF 6 3 3 2
3 RAM ABC 7 2 2 3
4 RAM DEF 4 4 2 2
We can apply all above parameters
my_sale=sales.groupby(['product'],observed=False,squeeze=False,as_index=False,group_keys=True,sort=False)
Output
product sale_id c_id p_id qty store
0 Monitor 1 2 3 2 ABC
1 CPU 2 2 4 1 DEF
2 RAM 4 4 2 2 DEF
my_sale=sales.groupby(['product'])
print(my_sale.get_group('Monitor'))
Output ( rows of all Monitor under product )
sale_id c_id p_id product qty store
0 1 2 3 Monitor 2 ABC
2 3 1 3 Monitor 3 ABC
4 5 2 3 Monitor 3 ABC
5 6 3 3 Monitor 2 DEF
8 9 2 3 Monitor 2 ABC
my_sale=sales.groupby(['product'])[['qty']].sum()
Output
qty
product
CPU 1
Monitor 12
RAM 7
We can use two columns for grouping and get the breakup of sum of quantity sold
my_sale=sales.groupby(['product','store'])[['qty']].sum()
Output
qty
product store
CPU DEF 1
Monitor ABC 10
DEF 2
RAM ABC 3
DEF 4
my_sale=sales.groupby(['product'])[['qty']].max()
Output
qty
product
CPU 1
Monitor 3
RAM 3
my_sale=sales.groupby(['product'])[['qty']].min()
Output
qty
product
CPU 1
Monitor 2
RAM 2
my_sale=sales.groupby(['product'])[['qty']].mean()
Output
qty
product
CPU 1.000000
Monitor 2.400000
RAM 2.333333
my_sale=sales.groupby(['product'])[['qty']].std()
qty
product
CPU NaN
Monitor 0.547723
RAM 0.577350
my_sale=sales.groupby(['product'])[['qty']].count()
Output
qty
product
CPU 1
Monitor 5
RAM 3
my_sale=sales.groupby(['product'])
print(my_sale.apply(lambda x: x['qty'].sum() > 2))
Output
product
CPU False
Monitor True
RAM True
dtype: bool
as_index=False
is kept to keep the product values in all rows.
import pandas as pd
my_dict={'sale_id':[1,2,3,4,5,6,7,8,9],
'c_id':[2,2,1,4,2,3,2,3,2],
'p_id':[3,4,3,2,3,3,2,2,3],
'price':[30,20,40,20,30,40,20,30,40],
'product':['Monitor','CPU','Monitor','RAM','Monitor','Monitor','RAM','RAM','Monitor'],
'qty':[2,1,3,2,3,2,3,2,2],
'store':['ABC','DEF','ABC','DEF','ABC','DEF','ABC','DEF','ABC']}
sales = pd.DataFrame(data=my_dict)
my_sale=sales.groupby(['product','store'],as_index=False).agg({'qty':'sum','price':'mean'})
print(my_sale)
Output
product store qty price
0 CPU DEF 1 20
1 Monitor ABC 10 35
2 Monitor DEF 2 40
3 RAM ABC 3 20
4 RAM DEF 4 25
We can use dictionary to apply aggregate function on multiple columns.
d={'qty':'sum','price':'mean'} # dictionary to use
my_sale=sales.groupby(['product','store'],as_index=False).agg(d)
Mulitple aggregate values of same column.
d={'qty':['sum'],'price':['mean','max','min']}
Output is here
product store qty price
sum mean max min
0 CPU DEF 1 20 20 20
1 Monitor ABC 10 35 40 30
2 Monitor DEF 2 40 40 40
3 RAM ABC 3 20 20 20
4 RAM DEF 4 25 30 20
dropna=False
.
my_sale=sales.groupby(['product','store'],dropna=False).agg(d)
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.