'Pandas: compute daily statistics when chunking

Consider a postgres table where for the date 2022-05-01 we have 200 values for various times:

time                        value                                                                                
2022-05-01 00:17:20+00:00  17175 
2022-05-01 13:33:56+00:00  18000
...

I need to read data chunk by chunk with a chunk_size = 50. Doing resampling and aggregation to compute daily statistics, results in the four same indexes where each one contains a portion of the aggregated value.

with engine.connect().execution_options(stream_results=True) as conn:
for chunk_df in pd.read_sql(query, engine, chunksize=50):
    chunk_df.index = pd.to_datetime(chunk_df.time, utc=pytz.utc)
    chunk_df.sort_index(inplace=True)
    result_df = chunk_df.resample('1D').agg('sum')

time                        value                                                                                
2022-05-01 00:00:00+00:00  52175 


time                        value                                                                                
2022-05-01 00:00:00+00:00  12001 


time                        value                                                                                
2022-05-01 00:00:00+00:00  3506 


time                        value                                                                                
2022-05-01 00:00:00+00:00  45623 

I was wondering is there any solution that directly computes the correct aggregated value. In other words, how we can set the chunk size according to the time interval of the resampling process.

time                        value                                                                                
2022-05-01 00:00:00+00:00  113305 


Solution 1:[1]

If I got what you want right, a query like this should do the trick:

select date_trunc('day', time), sum(value) from table_name group by 1;

You can also add

  • order by 1 asc/desc to sort it
  • where date_trunc('day', time) = '2020-03-16 00:00:00' to filter by date

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Evgeniy Chekan