« Pandas date & time « Pandas « Numpy
Date and time calculations using Numpy timedelta64.
Different units are used with timedelta64 for calculations, the list of units are given at the end of this tutorial.
Let us create DataFrame with two datetime columns to calculate the difference.
import pandas as pd
my_dict={'NAME':['Ravi','Raju','Alex'],
'dt_start':['1/1/2020','2/1/2020','5/1/2020'],
'dt_end':['6/15/2022','7/22/2022','11/15/2023']
}
my_data = pd.DataFrame(data=my_dict)
my_data['dt_start'] = pd.to_datetime(my_data['dt_start'])
my_data['dt_end'] = pd.to_datetime(my_data['dt_end'])
print(my_data)
Output
NAME dt_start dt_end
0 Ravi 2020-01-01 2022-06-15
1 Raju 2020-02-01 2022-07-22
2 Alex 2020-05-01 2023-11-15
The column dt_start and dt_end is changed to datetime type.
Difference in Days
We will add one column diff_days to our dataframe and add difference in days to it.
my_data['diff_days']=my_data['dt_end']-my_data['dt_start']
print(my_data)
Output
NAME dt_start dt_end diff_days
0 Ravi 2020-01-01 2022-06-15 896 days
1 Raju 2020-02-01 2022-07-22 902 days
2 Alex 2020-05-01 2023-11-15 1293 days
Here our column diff_days is not an integer field, so we can use any comparison to get days higher than x ( > 50 days ) . For this we have to convert diff_days to integer ( or float if NaN is present ) by dividing with np.timedelta64(1, 'D')
To use Numpy, first import it by adding this line. import numpy as np
my_data['diff_days'] = my_data['diff_days']/ np.timedelta64(1, 'D')
Read how np.timedelta64() is used in our Exercise3-4 to convert date to float value
Now we can filter based on number of days .
my_data=my_data[my_data['diff_days']>900]
Let us findout difference in months. We will add Numpy library as we are going to use timedelta64 now.
import pandas as pd
import numpy as np
my_dict={'NAME':['Ravi','Raju','Alex'],
'dt_start':['1/1/2020','2/1/2020','5/1/2020'],
'dt_end':['6/15/2022','7/22/2022','11/15/2023']
}
my_data = pd.DataFrame(data=my_dict)
my_data['dt_start'] = pd.to_datetime(my_data['dt_start'])
my_data['dt_end'] = pd.to_datetime(my_data['dt_end'])
my_data['diff_days']=my_data['dt_end']-my_data['dt_start']
my_data['diff_months']=my_data['diff_days']/np.timedelta64(1, 'M')
#my_data['diff_months']=my_data['diff_months'].astype(int)#to integer
print(my_data)
We divided the difference with np.timedelta64(1,'M') to findout the difference in months.
Output , you can display the diff_months as integer by un commenting the line above.
NAME dt_start dt_end diff_days diff_months
0 Ravi 2020-01-01 2022-06-15 896 days 29.437976
1 Raju 2020-02-01 2022-07-22 902 days 29.635105
2 Alex 2020-05-01 2023-11-15 1293 days 42.481365
Difference in Years
We will use np.timedelta64(1,'Y') to find out difference in Years. Here is the change in code.
my_data['diff_years']=my_data['diff_days']/np.timedelta64(1, 'Y')
Output is here
NAME dt_start dt_end diff_days diff_years
0 Ravi 2020-01-01 2022-06-15 896 days 2.453165
1 Raju 2020-02-01 2022-07-22 902 days 2.469592
2 Alex 2020-05-01 2023-11-15 1293 days 3.540114
Difference in Weeks
We will use np.timedelta64(1,'W') to find out difference in Weeks . Here is the change in code.
my_data['diff_weeks']=my_data['diff_days']/np.timedelta64(1, 'W')
Output
NAME dt_start dt_end diff_days diff_weeks
0 Ravi 2020-01-01 2022-06-15 896 days 128.000000
1 Raju 2020-02-01 2022-07-22 902 days 128.857143
2 Alex 2020-05-01 2023-11-15 1293 days 184.714286
Here also we can display difference in weeks by changing the data to integer by adding the line like this.
my_data['diff_weeks']=my_data['diff_days']/np.timedelta64(1, 'W')
my_data['diff_weeks']=my_data['diff_weeks'].astype(int) # to integer
Now output will change like this.
NAME dt_start dt_end diff_days diff_weeks
0 Ravi 2020-01-01 2022-06-15 896 days 128
1 Raju 2020-02-01 2022-07-22 902 days 128
2 Alex 2020-05-01 2023-11-15 1293 days 184
Difference in Time
Difference in Hours we can get by using np.timedelta64(1,'h'). Here is the code
my_data['diff_hours']=my_data['diff_days']/np.timedelta64(1, 'h')
Output
NAME dt_start dt_end diff_days diff_hours
0 Ravi 2020-01-01 2022-06-15 896 days 21504.0
1 Raju 2020-02-01 2022-07-22 902 days 21648.0
2 Alex 2020-05-01 2023-11-15 1293 days 31032.0
Similarly you can findout difference in minutes, seconds etc. Here is a table to use the date and time units
Time | Date |
h | Hour |
m | Minute |
s | Second |
ms | MilliSecond |
us | MicroSecond |
ns | NanoSecond |
ps | PicoSecond |
fs | FemtoSecond |
as | AttoSecond |
|
|
Add or subtract Date parts by using DateOffset()
« Pandas date& time
date_range()
period_range()
strftime()
← Subscribe to our YouTube Channel here