'How to flatten a latlong array in bigquery to produce a linestring?
I have a nested table structure, like this:
[
{
"startTime": "2017-09-02 09:08:00:000",
"endTime": "2017-09-02 09:09:00:000",
"startTimeMillis": "1504343280000",
"endTimeMillis": "1504343340000",
"uuid": "1748750880",
"country": "CI",
"city": "Punta Arenas",
"x": "-70.906904",
"y": "-53.133514"
},
{
"startTime": "2017-09-02 09:08:00:000",
"endTime": "2017-09-02 09:09:00:000",
"startTimeMillis": "1504343280000",
"endTimeMillis": "1504343340000",
"uuid": "1748750880",
"country": "CI",
"city": "Punta Arenas",
"x": "-70.907353",
"y": "-53.133253"
},
{
"startTime": "2017-09-02 09:08:00:000",
"endTime": "2017-09-02 09:09:00:000",
"startTimeMillis": "1504343280000",
"endTimeMillis": "1504343340000",
"uuid": "1748750880",
"country": "CI",
"city": "Punta Arenas",
"x": "-70.90771",
"y": "-53.133041"
},
{
"startTime": "2017-09-02 09:08:00:000",
"endTime": "2017-09-02 09:09:00:000",
"startTimeMillis": "1504343280000",
"endTimeMillis": "1504343340000",
"uuid": "1748750880",
"country": "CI",
"city": "Punta Arenas",
"x": "-70.908979",
"y": "-53.132287"
}
]
A resulting table is something like this:
Row|startTime|endTime|startTimeMillis|endTimeMillis|uuid|country|city|x|y|
1|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.906904|-53.133514|
2|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.907353|-53.133253|
3|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.90771|-53.133041|
4|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.908979|-53.132287|
I'd like to concat the repeated fields x and y to produce a GIS linestring, in a single line, like this:
Row|startTime|endTime|startTimeMillis|endTimeMillis|uuid|country|city|linestring
1|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|LINESTRING(-70.906904 -53.133514, -70.907353 -53.133253, -70.90771 -53.133041, -70.908979 -53.132287)
How can I do this? The original x and y values are floats.
Solution 1:[1]
You could use the ARRAY_AGG function available in Standard SQL, something like:
#standardSQL
WITH data AS(
SELECT "2017-09-02 09:08:00:000" AS startTime, "2017-09-02 09:09:00:000" endTime, "1504343280000" AS startTimeMillis, "1504343340000" endTimeMillis, "1748750880" AS uuid, "CI" AS country, "Punta Arenas" AS city, "-70.906904" AS x, "-53.133514" AS y UNION ALL
SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.907353", "-53.133253" UNION ALL
SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.90771", "-53.133041" UNION ALL
SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.908979", "-53.132287"
)
SELECT
startTime,
endTime,
startTimeMillis,
endTimeMillis,
uuid,
country,
city,
ARRAY_AGG(STRUCT(x, y)) AS LINESTRING
FROM data
GROUP BY
startTime,
endTime,
startTimeMillis,
endTimeMillis,
uuid,
country,
city
Result:
Even though result is an ARRAY with the elements x and y, notice that they have been structured together as a STRUCT which will allow you to access each field by its respective name.
Solution 2:[2]
Thank you all!
I'm using Mikhail Berlyant solution!
SELECT
w.startTime, w.endTime, w.startTimeMillis, w.endTimeMillis,
jams_u.uuid, jams_u.country, jams_u.city, jams_u.street,
jams_u.roadType, jams_u.turnType,
jams_u.type, jams_u.length, jams_u.speed, jams_u.level, jams_u.delay,
jams_u.startNode, jams_u.endNode, jams_u.pubMillis,
TIMESTAMP_MILLIS(jams_u.pubMillis) as pubdatetime_utc,
STRING_AGG(CONCAT(CAST(line_u.x AS STRING),' ',CAST(line_u.y AS STRING))) linestring_4326
FROM
a_import.table w,
UNNEST(jams) jams_u,
UNNEST(line) line_u
GROUP BY
w.startTime, w.endTime, w.startTimeMillis, w.endTimeMillis,
jams_u.uuid, jams_u.country, jams_u.city, jams_u.street,
jams_u.roadType, jams_u.turnType,
jams_u.type, jams_u.length, jams_u.speed, jams_u.level, jams_u.delay,
jams_u.startNode, jams_u.endNode, jams_u.pubMillis,
pubdatetime_utc
Solution 3:[3]
One concern with the proposed solutions that use GROUP BY only - without an ORDER BY operator within group, the order of the elements in the GROUP BY group is undefined. So you can get arbitrary order of points in the linestring, which is probably not what you want. Unfortunately, with small inline datasets you get stable results, but this might break once you have real data.
To solve this you need to define which attributes define group, and which define order. E.g. if uuid defines a linestring, and start timestamp defines order (they would need to be different, unlike in your sample), your query might group by uuid, and sort by timestamp.
I also prefer to use new Geospatial functions to construct WKT linestring, rather than string concatenation, which gives:
#standardSQL
WITH `yourTable` AS (
SELECT * FROM UNNEST([
STRUCT('2017-09-02 09:08:00:000' AS startTime, '2017-09-02 09:09:00:000' AS endTime, 1504343280002 AS startTimeMillis, 1504343340000 AS endTimeMillis,
1748750880 AS uuid, 'CI' AS country, 'Punta Arenas' AS city, -70.906904 AS x, -53.133514 AS y),
STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280001, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.907353, -53.133253),
STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280004, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.90771, -53.133041),
STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280003, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.908979, -53.132287)])
)
SELECT uuid, MIN(startTime) startTime, MAX(endTime) endTime,
ANY_VALUE(country), ANY_VALUE(city),
ST_MakeLine(ARRAY_AGG(ST_GeogPoint(x, y)
ORDER BY startTime, startTimeMillis)) line
FROM `yourTable`
GROUP BY uuid
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Willian Fuks |
| Solution 2 | user2308155 |
| Solution 3 | Michael Entin |

