'how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.

I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.

I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.

My current query is this:

SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers) 
   FROM `big_query.a_table_name.houseprices` 
   WHERE ((offers * accepted_sale_price < 5000000) 
AND (offers < 15) 
AND (house_bid = sale_price OR 
      house_ask = sale_price))) as bid_ask_off,    
(SELECT SUM(offers) 
 FROM `big_query.a_table_name.houseprices`) as 
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100

Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:

table_1

but am instead receiving

table_2

I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.

It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.

Any help or advice would be greatly appreciated, and I hope I've provided enough information.

Here is a sample dataframe.

neighborhood    date_of_sale    offers  accepted_sale_price house_bid   house_ask
bronx           4/1/2022        3       323                 320 323
manhattan       4/1/2022        4       244                 230 244
manhattan       4/1/2022        8       856                 856 900
queens          4/1/2022        15      110                 110 135
brooklyn        4/2/2022        12      115                 100 115
manhattan       4/2/2022        9       255                 255 275
bronx           4/2/2022        6       330                 300 330
queens          4/2/2022        10      405                 395 405
brooklyn        4/2/2022        4       254                 254 265
staten_island   4/3/2022        2       442                 430 442
staten_island   4/3/2022        13      195                 195 225
bronx           4/3/2022        4       650                 650 690
manhattan       4/3/2022        2       286                 266 286
manhattan       4/3/2022        6       356                 356 400
staten_island   4/4/2022        4       361                 361 401
staten_island   4/4/2022        5       348                 348 399
bronx           4/4/2022        8       397                 340 397
manhattan       4/4/2022        9       333                 333 394
manhattan       4/4/2022        11      392                 325 392


Solution 1:[1]

I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.

SELECT 
  h.neighborhood,
  DATE(h.date_of_sale) AS date_,
  s.bids AS bid_ask_off,    
  SUM(h.offers) AS total_offers,
FROM 
  `big_query.a_table_name.houseprices` h
  LEFT JOIN
  (SELECT
     neighborhood,
     SUM(offers) AS bids
   FROM 
      `big_query.a_table_name.houseprices` 
   WHERE offers * accepted_sale_price < 5000000
   AND offers < 15 
   AND (house_bid = sale_price OR 
      house_ask = sale_price)
   GROUP BY neighborhood) s 
ON h.neighborhood = s.neighborhood
GROUP BY 
  h.neighborhood, 
  DATE(date_of_sale),
  s.bids
LIMIT 100;

Or the following which modifies more the initial query but may be more like what you need.

SELECT 
  h.neighborhood,
  DATE(h.date_of_sale) AS date_,
  s.bids AS bid_ask_off,    
  SUM(h.offers) AS total_offers,
FROM 
  `big_query.a_table_name.houseprices` h
  LEFT JOIN
  (SELECT
     date_of_sale dos,
     neighborhood,
     SUM(offers) AS bids
   FROM 
      `big_query.a_table_name.houseprices` 
   WHERE offers * accepted_sale_price < 5000000
   AND offers < 15 
   AND (house_bid = sale_price OR 
      house_ask = sale_price)
   GROUP BY 
      neighborhood,
      date_of_sale) s 
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY 
  h.neighborhood, 
  DATE(date_of_sale),
  s.bids
LIMIT 100;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1