'How to cluster rows in dataset with two geospatial attributes

Title might be misleading as I wasn't sure how to properly summarize the problem.

I have a dataset of trips with two locations (source and destination) and also other attributes (about customer, cargo, equipment, etc).

Are there any algorithms that I could apply in order to cluster those trips, given that I want to use both spatial points (source and destination) for clustering, not just one.

Let's say if I have following trips:

A1 -> B1
A2 -> B2
A1 -> C1
A2 -> C2

I want to get clusters like:

A -> B
A -> C

Solution 1:^[1]

A very simple solution I can think about is to cluster each location independently, and then use cluster ids to group by both clusters.

Something like (this was tested in Google BigQuery):

with data as (
  select st_geogpoint(100, 50) a, st_geogpoint(101, 51) b
  union all
  select st_geogpoint(100.01, 50) a, st_geogpoint(101.01, 51) b
  union all
  select st_geogpoint(100, 50.01) a, st_geogpoint(90, 51) b
  union all
  select st_geogpoint(100.01, 50.01) a, st_geogpoint(90, 51.01) b
),
clusters as (
  select 
      a, b, 
      st_clusterdbscan(a, 1e4, 1) OVER() a_id, 
      st_clusterdbscan(b, 1e4, 1) OVER() b_id
  from data
)
select 
    a_id, b_id, 
    st_centroid_agg(a) a_center, 
    st_centroid_agg(b) b_center
from clusters
group by a_id, b_id


a_id    b_id    a_center                        b_center
0       0       POINT(100.005 50.0000001074259) POINT(101.005 51.0000001066994)
0       1       POINT(100.005 50.0100001074192) POINT(90 51.005)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Michael Entin

'How to cluster rows in dataset with two geospatial attributes

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]