Pivot tables are summarized data where data is grouped for giving meaning full insights.
pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False, dropna=True,
margins_name='All', observed=False)
data: DataFrame
values: Columns to aggregate
index: columns or grouper
columns: columns or grouper
aggfunc: defalut numpy.mean , functions
fill_value: replace missing value
margins : bool, default False, subtotal grand total
dropna: bool , default True , remove columns if all are NaN.
observed bool, default false
This is our sample DataFrame and we created by using sales.csv file or use this
import pandas as pd
my_dict={'sale_id':[1,2,3,4,5,6,7,8,9],
'c_id':[2,2,1,4,2,3,2,3,2],
'p_id':[3,4,3,2,3,3,2,2,3],
'product':['Monitor','CPU','Monitor','RAM','Monitor','Monitor','RAM','RAM','Monitor'],
'qty':[2,1,3,2,3,2,3,2,2],
'store':['ABC','DEF','ABC','DEF','ABC','DEF','ABC','DEF','ABC']}
sales = pd.DataFrame(data=my_dict)
print(sales)
sale_id | c_id |
p_id | product | qty | store |
1 |
2 |
3 |
Monitor | 2 | ABC |
2 | 2 | 4 | CPU | 1 | DEF |
3 | 1 | 3 | Monitor | 3 | ABC |
4 | 4 | 2 | RAM | 2 | DEF |
5 | 2 | 3 | Monitor | 3 | ABC |
6 | 3 | 3 | Monitor | 2 | DEF |
7 | 2 | 2 | RAM | 3 | ABC |
8 | 3 | 2 | RAM | 2 | DEF |
9 | 2 | 3 | Monitor | 2 | ABC |
Using this DataFrame we will use the method pivot_table()
Here we are getting the average value of quantity ( qty ) against each product. By using aggfunc we can get sum, max, min value. Here we used one single column product as index.
pvt=pd.pivot_table(sales,values='qty',index=['product'])
| qty |
product | |
CPU | 1.000000 |
Monitor | 2.400000 |
RAM | 2.333333 |
We can use more than one column as index. Here product and store are used.
pvt=pd.pivot_table(sales,values='qty',index=['product','store'])
| | qty |
product | store | |
CPU | DEF | 1.0 |
Monitor | ABC | 2.5 |
DEF | 2.0 |
RAM | ABC | 3.0 |
DEF | 2.0 |
aggfunc
default value for aggfunc is mean
( np.mean) , we can specify others line sum , max , min etc or use numpy function ( import numpy before using numpy functions). Here we are using sum to get the sum of quanity ( qty ) column.
pvt=pd.pivot_table(sales,values='qty',
index=['product','store'],aggfunc=sum)
| | qty |
product | store | |
CPU | DEF | 1 |
Monitor | ABC | 10 |
DEF | 2 |
RAM | ABC | 3 |
DEF | 4 |
margins
We can display total or subtotal by using margins=True
, by default it is False.
pvt=pd.pivot_table(sales,values='qty',
index=['product','store'],aggfunc=sum,margins=True)
| | qty |
product | store | |
CPU | DEF | 1 |
Monitor | ABC | 10 |
DEF | 2 |
RAM | ABC | 3 |
DEF | 4 |
All | | 20 |
columns
pvt=pd.pivot_table(sales,index=['product'],columns=['store'],
values=['qty'])
| qty |
store | ABC | DEF |
product | | |
CPU | NaN | 1.0 |
Monitor | 2.5 | 2.0 |
RAM | 3.0 | 2.0 |
Difference between pivot_table and pivot
pivot_table() : Index column pair need not be unique and it can be used with aggragrate function list sum , average etc
pivot() :If the index column pair has multiple values then it will generate value error. We can't use aggragrate functions.
« Pandas
« pivot()
« Pandas DataFrame
contains() to display and delete row based on Conditions »
← Subscribe to our YouTube Channel here