'Selecting first and last row within a time interval

I have a table named trades for holding currency trading data with the following schema:

id        - uuid
timestamp - timestamp without time zone
price     - numeric

I would like to be able to query in a way that I can build a candle chart. For this I need the first price, the last price, the max price and the min price, grouped by time intervals. So far I have this:

CREATE FUNCTION ts_round( timestamptz, INT4 ) RETURNS TIMESTAMPTZ AS $$
SELECT 'epoch'::timestamptz
     + '1 second'::INTERVAL * ( $2 * ( extract( epoch FROM $1 )::INT4 / $2 ) );
$$ LANGUAGE SQL;

SELECT ts_round( timestamp, 300 ) AS interval_timestamp
     , max(price) AS max, min(price) AS min
FROM trades
GROUP BY interval_timestamp
ORDER BY interval_timestamp DESC

How do I get the first price and last price within these intervals?



Solution 1:[1]

I think this is the query you want:

SELECT ts_round( timestamp, 300 ) AS interval_timestamp,
       max(firstprice) as firstprice,
       max(lastprice) as lastprice,
       max(price) AS maxprice, min(price) AS minprice
FROM (SELECT t.*,
             first_value(price) over (partition by ts_round(timestamp, 300) order by timestamp) as firstprice,
             first_value(price) over (partition by ts_round(timestamp, 300) order by timestamp desc) as lastprice
      FROM trades t
     ) t
GROUP BY interval_timestamp
ORDER BY interval_timestamp DESC;

Solution 2:[2]

This uses a single window for all window functions and no subquery. Should be faster than the currently accepted answer.

SELECT DISTINCT ON (1)
       ts_round(timestamp, 300) AS interval_timestamp
     , min(price)         OVER w AS min_price
     , max(price)         OVER w AS max_price
     , first_value(price) OVER w AS first_price
     , last_value(price)  OVER w AS last_price
FROM   trades
WINDOW w AS (PARTITION BY ts_round(timestamp, 300) ORDER BY timestamp
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER  BY 1 DESC;

To define "first" and "last" per timestamp, this column needs to be unique or the query is ambiguous and yo get an arbitrary pick from equal peers.

Similar answer with explanation for the custom window frame:

Explanation for the reference by ordinal numbers:

Aside: don't use "timestamp" as identifier. It's a basic type name, which is error-prone.

Solution 3:[3]

This can also be done without creating a confusing function and just using the built-in floor function with a unix timestamp.

SELECT symbol, timestamp AS interval_timestamp,
   max(firstprice) as firstprice,
   max(lastprice) as lastprice,
   max(price) AS maxprice, min(price) AS minprice, 
   max(vol)-min(vol) as volume
FROM (SELECT t.*,
first_value(price) over (partition by 
floor(unix_timestamp(timestamp)/(5*60)) order by timestamp) as 
firstprice,
first_value(price) over (partition by 
floor(unix_timestamp(timestamp)/(5*60)) order by timestamp desc) as 
lastprice
  FROM trades t
 ) t
GROUP BY floor(unix_timestamp(timestamp)/(5*60))
ORDER BY timestamp

Note here you are turning the timestamp into a unix timestamp (if you are storing unix timestmaps no need to convert) and then dividing that by number of minutes * 60. So, in this example we are returning 5 minutes intervals. I also added in the trade volume for the time interval because, why not?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Erwin Brandstetter
Solution 2 Community
Solution 3 Mitchell Leefers