Resample or changing frequency of time-series data of DataFrame

Frequency conversion and resampling of time series data.

resample(rule, axis=0, closed=None, label=None, convention='start',
 kind=None, loffset=None, base=None, on=None, level=None, origin='start_day',
 offset=None)

Creating DataFrame and adding datetime index.

import pandas as pd 
my_dict={
	'my_date':['2022-06-01 00:00:00','2022-06-01 00:01:00',
    '2022-06-01 00:02:00','2022-06-01 00:03:00',
    '2022-06-01 00:04:00','2022-06-01 00:05:00'],
	'value':[2,2,3,1,3,2]
	}
df = pd.DataFrame(data=my_dict) # create dataframe
df['my_date']=pd.to_datetime(df['my_date']) # column to datetime dtype
df.set_index('my_date',inplace=True) # add index to column 
print(df)

Output

                     value
my_date
2022-06-01 00:00:00      2
2022-06-01 00:01:00      2
2022-06-01 00:02:00      3
2022-06-01 00:03:00      1
2022-06-01 00:04:00      3
2022-06-01 00:05:00      2

By using to_datetime() we got datetime column my_date
We will resample the above DataFrame using 2 minutes interval and using the aggregate function sum() for the value column. ( we can use other aggregate functions like mean, max , min , std() here. )

df = pd.DataFrame(data=my_dict)
df['my_date']=pd.to_datetime(df['my_date'])
df.set_index('my_date',inplace=True)
df=df.resample('2min').sum()
print(df)

Output is here

                     value
my_date
2022-06-01 00:00:00      4
2022-06-01 00:02:00      4
2022-06-01 00:04:00      5

Here the value rule='2min' can be changed with different frequencies . Here is the list of formats can be used to resample the DataFrame in different units.

Arguments

rule: We already used this in above code. We can use the format or its combinations to create the rule. Here is one sample.

df=df.resample(rule='2D5H2min10S').mean() # 2 days, 5 hours, 2 min 10 Seconds

closed: Which side is included ? Defalut value is None. If it is right then left side is not included.

df['c_left']=df.resample('2min',closed='left').mean()

Output

                     value  c_left
my_date
2022-06-01 00:00:00      2     2.0
2022-06-01 00:01:00      2     NaN
2022-06-01 00:02:00      3     2.0
2022-06-01 00:03:00      1     NaN
2022-06-01 00:04:00      3     2.5
2022-06-01 00:05:00      2     NaN

df['c_right']=df.resample('2min',closed='right').mean()

Output

                     value  c_right
my_date
2022-06-01 00:00:00      2      2.5
2022-06-01 00:01:00      2      NaN
2022-06-01 00:02:00      3      2.0
2022-06-01 00:03:00      1      NaN
2022-06-01 00:04:00      3      2.0
2022-06-01 00:05:00      2      NaN

As per the manual

Which side of bin interval is closed. The default is ‘left’ for all frequency offsets except for 'M', 'A, 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.

label: As per the manual ..

Which bin edge label to label bucket with. The default is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'.

df['l_left']=df.resample('2min',label='left').sum()

Output

                     value  l_left
my_date
2022-06-01 00:00:00      2     4.0
2022-06-01 00:01:00      2     NaN
2022-06-01 00:02:00      3     4.0
2022-06-01 00:03:00      1     NaN
2022-06-01 00:04:00      3     5.0
2022-06-01 00:05:00      2     NaN

df['l_right']=df.resample('2min',label='right').sum()

Output

                     value  l_right
my_date
2022-06-01 00:00:00      2      NaN
2022-06-01 00:01:00      2      NaN
2022-06-01 00:02:00      3      4.0
2022-06-01 00:03:00      1      NaN
2022-06-01 00:04:00      3      4.0
2022-06-01 00:05:00      2      NaN

on: In our examples above the date time column ( my_date ) is the index coloumn. If the resample is to be applied to any other column ( must be date-time ) then on can be used to provide column name.

offset: Offset timedelta is added to origin.

df['offset_3min']=df.resample(rule='2min',offset='3min').sum()

Practice this exercise to understand how to use date and time in Pandas DataFrame.
Exercise3 on Date and time
Exercise3-2 on basics of Date and time

Pandas date & time to_datetime() period_range() date_range() strftime()

Pandas Python & MySQL Python- Tutorials

Subscribe to our YouTube Channel here

DataFrame.resample()

Arguments

Subscribe