Create a Duration Column in Pandas


Are you working with a pandas dataset that has two datetime columns and you want to know the time difference between them? Perhaps one column is a pick-up datetime and the other column is a drop-off datetime column. How many minutes did it take to deliver? You dataset is called df0. You want to add a new column to the DataFrame. The function assign will do this, but it returns a new object, so you might want to assign that new object back to the original DataFrame (df0 in this case). I’ve done that in the code shown below. Data scientists call this process feature engineering.

import datetime as dt
df0 = df0.assign(durationMin = (df0['dropoff'] - df0['pickup']).dt.total_seconds() / 60)

Here is some more example code. Both dropoff and pickup are datetime data types. This code creates a new column called duration in the DataFrame called df.

df['duration'] = (df['dropoff'] - df['pickup'])/np.timedelta64(1,'m')

Leave a Reply