'How to sample a python df on daily rate when it is greater than 500 yrs

I need to sample a dataframe that has a date range of 100 years at a daily rate because I want to get yearly totals (so I thought resample at daily rate then sum the yearly totals).

I tried

d0=start_date
# set date to model start date
d=d0
ind =Time_data2['datetime']
df_out=pd.DataFrame(index=range((max(ind)-d0).days),columns= 
['datetime','year','value'])

for i in range((max(ind)-d0).days):             # for every day in the total number of days in the simulation 
d = d0 + datetime.timedelta(days=i)         # get a particular day (= start_date + timedelta)
df_out.loc[i,'datetime']=d                  # assign datetime for each day
df_out.loc[i,'year']=d.year                 # assign year for each day
# Assign value based on the first value in the raw timeseries that proceeds the day being filled, this is equivilent to a backfill with the pandas resample
for t in model_flow_ts.index:
    dt = t-d               # calcualtes a timedelta between each index value in model_flow_ts and the particular day in the simulation
    if dt.days < 0:        
        continue
    else:
        v = model_flow_ts.loc[t]  # get the value
        break
df_out.loc[i,'value']=v
if i/50000==int(i/50000):
    print(i)

But it takes a really long time because there are so many days to sample...

Does anyone have any suggestions on how to speed it up?

cheers



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source