Date and time calculations using Numpy timedelta64.
Different units are used with timedelta64 for calculations, the list of units are given at the end of this tutorial.
Let us create DataFrame with two datetime columns to calculate the difference.
NAME dt_start dt_end diff_days
0 Ravi 2020-01-01 2022-06-15 896 days
1 Raju 2020-02-01 2022-07-22 902 days
2 Alex 2020-05-01 2023-11-15 1293 days
Here our column diff_days is not an integer field, so we can use any comparison to get days higher than x ( > 50 days ) . For this we have to convert diff_days to integer ( or float if NaN is present ) by dividing with np.timedelta64(1, 'D')
To use Numpy, first import it by adding this line. import numpy as np
We divided the difference with np.timedelta64(1,'M') to findout the difference in months.
Output , you can display the diff_months as integer by un commenting the line above.
NAME dt_start dt_end diff_days diff_months
0 Ravi 2020-01-01 2022-06-15 896 days 29.437976
1 Raju 2020-02-01 2022-07-22 902 days 29.635105
2 Alex 2020-05-01 2023-11-15 1293 days 42.481365
Difference in Years
We will use np.timedelta64(1,'Y') to find out difference in Years. Here is the change in code.
NAME dt_start dt_end diff_days diff_years
0 Ravi 2020-01-01 2022-06-15 896 days 2.453165
1 Raju 2020-02-01 2022-07-22 902 days 2.469592
2 Alex 2020-05-01 2023-11-15 1293 days 3.540114
Difference in Weeks
We will use np.timedelta64(1,'W') to find out difference in Weeks . Here is the change in code.
NAME dt_start dt_end diff_days diff_weeks
0 Ravi 2020-01-01 2022-06-15 896 days 128.000000
1 Raju 2020-02-01 2022-07-22 902 days 128.857143
2 Alex 2020-05-01 2023-11-15 1293 days 184.714286
Here also we can display difference in weeks by changing the data to integer by adding the line like this.
my_data['diff_weeks']=my_data['diff_days']/np.timedelta64(1, 'W')
my_data['diff_weeks']=my_data['diff_weeks'].astype(int) # to integer
Now output will change like this.
NAME dt_start dt_end diff_days diff_weeks
0 Ravi 2020-01-01 2022-06-15 896 days 128
1 Raju 2020-02-01 2022-07-22 902 days 128
2 Alex 2020-05-01 2023-11-15 1293 days 184
Difference in Time
Difference in Hours we can get by using np.timedelta64(1,'h'). Here is the code
NAME dt_start dt_end diff_days diff_hours
0 Ravi 2020-01-01 2022-06-15 896 days 21504.0
1 Raju 2020-02-01 2022-07-22 902 days 21648.0
2 Alex 2020-05-01 2023-11-15 1293 days 31032.0
Similarly you can findout difference in minutes, seconds etc. Here is a table to use the date and time units