« Pandas « Pandas Date & time
Frequency conversion and resampling of time series data.
resample(rule, axis=0, closed=None, label=None, convention='start',
kind=None, loffset=None, base=None, on=None, level=None, origin='start_day',
offset=None)
Creating DataFrame and adding datetime index.
import pandas as pd
my_dict={
'my_date':['2022-06-01 00:00:00','2022-06-01 00:01:00',
'2022-06-01 00:02:00','2022-06-01 00:03:00',
'2022-06-01 00:04:00','2022-06-01 00:05:00'],
'value':[2,2,3,1,3,2]
}
df = pd.DataFrame(data=my_dict) # create dataframe
df['my_date']=pd.to_datetime(df['my_date']) # column to datetime dtype
df.set_index('my_date',inplace=True) # add index to column
print(df)
Output
value
my_date
2022-06-01 00:00:00 2
2022-06-01 00:01:00 2
2022-06-01 00:02:00 3
2022-06-01 00:03:00 1
2022-06-01 00:04:00 3
2022-06-01 00:05:00 2
By using to_datetime() we got datetime column my_date
We will resample the above DataFrame using 2 minutes interval and using the aggregate function sum() for the value column. ( we can use other aggregate functions like mean, max , min , std() here. )
df = pd.DataFrame(data=my_dict)
df['my_date']=pd.to_datetime(df['my_date'])
df.set_index('my_date',inplace=True)
df=df.resample('2min').sum()
print(df)
Output is here
value
my_date
2022-06-01 00:00:00 4
2022-06-01 00:02:00 4
2022-06-01 00:04:00 5
Here the value rule='2min'
can be changed with different frequencies . Here is the list of formats can be used to resample the DataFrame in different units.
Arguments
rule: We already used this in above code. We can use the format or its combinations to create the rule. Here is one sample.
df=df.resample(rule='2D5H2min10S').mean() # 2 days, 5 hours, 2 min 10 Seconds
closed: Which side is included ? Defalut value is None. If it is right then left side is not included.
df['c_left']=df.resample('2min',closed='left').mean()
Output
value c_left
my_date
2022-06-01 00:00:00 2 2.0
2022-06-01 00:01:00 2 NaN
2022-06-01 00:02:00 3 2.0
2022-06-01 00:03:00 1 NaN
2022-06-01 00:04:00 3 2.5
2022-06-01 00:05:00 2 NaN
df['c_right']=df.resample('2min',closed='right').mean()
Output
value c_right
my_date
2022-06-01 00:00:00 2 2.5
2022-06-01 00:01:00 2 NaN
2022-06-01 00:02:00 3 2.0
2022-06-01 00:03:00 1 NaN
2022-06-01 00:04:00 3 2.0
2022-06-01 00:05:00 2 NaN
As per the manual
Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for 'M', 'A, 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.
label: As per the manual ..
Which bin edge label to label bucket with. The default is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.
df['l_left']=df.resample('2min',label='left').sum()
Output
value l_left
my_date
2022-06-01 00:00:00 2 4.0
2022-06-01 00:01:00 2 NaN
2022-06-01 00:02:00 3 4.0
2022-06-01 00:03:00 1 NaN
2022-06-01 00:04:00 3 5.0
2022-06-01 00:05:00 2 NaN
df['l_right']=df.resample('2min',label='right').sum()
Output
value l_right
my_date
2022-06-01 00:00:00 2 NaN
2022-06-01 00:01:00 2 NaN
2022-06-01 00:02:00 3 4.0
2022-06-01 00:03:00 1 NaN
2022-06-01 00:04:00 3 4.0
2022-06-01 00:05:00 2 NaN
on: In our examples above the date time column ( my_date ) is the index coloumn. If the resample is to be applied to any other column ( must be date-time ) then on can be used to provide column name.
offset: Offset timedelta is added to origin.
df['offset_3min']=df.resample(rule='2min',offset='3min').sum()
« Pandas date & time
to_datetime()
period_range() date_range()
strftime()
← Subscribe to our YouTube Channel here