'Spark SQL grouping: Add to group by or wrap in first() if you don't care which value you get.;
I have a query in Spark SQL like
select count(ts), truncToHour(ts)
from myTable
group by truncToHour(ts).
Where ts is of timestamp type, truncToHour is a UDF that truncates the timestamp to hour. This query does not work. If I try,
select count(ts), ts from myTable group by truncToHour(ts)
I got expression 'ts' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;, but first() is not defined if I do:
select count(ts), first(ts) from myTable group by truncToHour(ts)
Anyway to get what I wanted without using a subquery? Also, why does it say "wrap in first()" but the first() is not defined?
Solution 1:[1]
Solution 2:[2]
I got a solution:
SELECT max(truncHour(ts)), COUNT(ts) FROM myTable GROUP BY truncHour(ts)
or
SELECT truncHour(max(ts)), count(ts) FROM myTable GROUP BY truncHour(ts)
Is there any better solution?
Solution 3:[3]
This seems better but requires nesting
select truncHrTs, count(ts)
from(
select ts, truncToHour(ts) AS truncHrTs
from myTable
)
group by truncHrTs
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kumar Deepak |
| Solution 2 | Mike Sukmanowsky |
| Solution 3 | alwaysLearning |
