'Calculate the minimum radius around a point within which there are X other points

I have a dataset with 500k coordinates (latitude and longitude). I'd like to calculate the minimum radius for each point within which there are X neighbouring points.

I've used scipy.spatial.KDTree to count the number of points within a defined radius. Is there a way to use this package to do the inverse? Or another package or feasible brute force method that can achieve this?



Solution 1:[1]

  • have used cities as sample data. This data set is far smaller than one that you have stated.
  • extended on approach you started. Start by creating KDTree from points in geodataframe
  • take result from sparse_distance_matrix() into a pandas dataframe
  • pick nth() point where v will be distance you want
  • have used UTM CRS so that distances are in meters for points. This could have accuracy issues with points that span many UTM zones.
  • pulled it back together and also to validate generated a list of points that are within this distance
import geopandas as gpd
from scipy.spatial import KDTree
import pandas as pd

# get some points...
gdf = (
    gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
    .sjoin(
        gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).loc[
            lambda d: d["continent"].eq("Europe")
        ]
    )
    .reset_index(drop=True)
).loc[:, ["name_left", "geometry"]]

# 2d array of points
# use UTM so distances will be in meters
points = (
    gdf.to_crs(gdf.estimate_utm_crs())["geometry"]
    # gdf["geometry"]
    .apply(lambda p: {"x": p.x, "y": p.y})
    .apply(pd.Series)
    .values
)
kdtree = KDTree(points)

# how many points to consider for distance
N = 5
# create a dataframe from sparse distance matrix.
# nth(N+1) as self is included, so want N other points
df = (
    pd.DataFrame(kdtree.sparse_distance_matrix(kdtree, 10 ** 7, output_type="ndarray"))
    .sort_values(["i", "v"])
    .groupby("i")
    .nth(N + 1)
)

# join back to geodataframe
gdf = gdf.join(df)

# no need to do this,  but helps validate results...
gdf["nearest"] = gdf.apply(
    lambda r: kdtree.query_ball_point(points[r.name], r.v), axis=1
)

gdf

sample output

name_left geometry j v nearest
0 Vatican City POINT (12.45338654497177 41.90328217996012) 25 532056 [6, 1, 0, 2, 9, 25, 19]
1 San Marino POINT (12.44177015780014 43.936095834768) 3 422411 [6, 3, 1, 0, 2, 9, 19]
2 Rome POINT (12.481312562874 41.89790148509894) 25 530240 [6, 1, 0, 2, 9, 25, 19]
3 Vaduz POINT (9.516669472907267 47.13372377429357) 32 490694 [6, 12, 3, 1, 9, 5, 32]
4 Vienna POINT (16.36469309674374 48.20196113681686) 21 493170 [9, 32, 21, 19, 26, 10, 4]
5 Luxembourg POINT (6.130002806227083 49.61166037912108) 42 490936 [8, 12, 3, 5, 35, 39]
6 Monaco POINT (7.406913173465057 43.73964568785249) 7 499408 [7, 6, 12, 3, 1, 0, 2]
7 Andorra POINT (1.51648596050552 42.5000014435459) 5 868547 [8, 7, 37, 6, 12, 3]
8 Paris POINT (2.33138946713035 48.86863878981461) 3 569852 [8, 12, 5, 35, 42, 39]
9 Ljubljana POINT (14.51496903347413 46.0552883087945) 25 392927 [1, 9, 25, 19, 26, 10, 4]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rob Raymond