'Calculate the minimum radius around a point within which there are X other points
I have a dataset with 500k coordinates (latitude and longitude). I'd like to calculate the minimum radius for each point within which there are X neighbouring points.
I've used scipy.spatial.KDTree to count the number of points within a defined radius. Is there a way to use this package to do the inverse? Or another package or feasible brute force method that can achieve this?
Solution 1:[1]
- have used cities as sample data. This data set is far smaller than one that you have stated.
- extended on approach you started. Start by creating KDTree from points in geodataframe
- take result from
sparse_distance_matrix()into a pandas dataframe - pick
nth()point where v will be distance you want - have used UTM CRS so that distances are in meters for points. This could have accuracy issues with points that span many UTM zones.
- pulled it back together and also to validate generated a list of points that are within this distance
import geopandas as gpd
from scipy.spatial import KDTree
import pandas as pd
# get some points...
gdf = (
gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
.sjoin(
gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).loc[
lambda d: d["continent"].eq("Europe")
]
)
.reset_index(drop=True)
).loc[:, ["name_left", "geometry"]]
# 2d array of points
# use UTM so distances will be in meters
points = (
gdf.to_crs(gdf.estimate_utm_crs())["geometry"]
# gdf["geometry"]
.apply(lambda p: {"x": p.x, "y": p.y})
.apply(pd.Series)
.values
)
kdtree = KDTree(points)
# how many points to consider for distance
N = 5
# create a dataframe from sparse distance matrix.
# nth(N+1) as self is included, so want N other points
df = (
pd.DataFrame(kdtree.sparse_distance_matrix(kdtree, 10 ** 7, output_type="ndarray"))
.sort_values(["i", "v"])
.groupby("i")
.nth(N + 1)
)
# join back to geodataframe
gdf = gdf.join(df)
# no need to do this, but helps validate results...
gdf["nearest"] = gdf.apply(
lambda r: kdtree.query_ball_point(points[r.name], r.v), axis=1
)
gdf
sample output
| name_left | geometry | j | v | nearest | |
|---|---|---|---|---|---|
| 0 | Vatican City | POINT (12.45338654497177 41.90328217996012) | 25 | 532056 | [6, 1, 0, 2, 9, 25, 19] |
| 1 | San Marino | POINT (12.44177015780014 43.936095834768) | 3 | 422411 | [6, 3, 1, 0, 2, 9, 19] |
| 2 | Rome | POINT (12.481312562874 41.89790148509894) | 25 | 530240 | [6, 1, 0, 2, 9, 25, 19] |
| 3 | Vaduz | POINT (9.516669472907267 47.13372377429357) | 32 | 490694 | [6, 12, 3, 1, 9, 5, 32] |
| 4 | Vienna | POINT (16.36469309674374 48.20196113681686) | 21 | 493170 | [9, 32, 21, 19, 26, 10, 4] |
| 5 | Luxembourg | POINT (6.130002806227083 49.61166037912108) | 42 | 490936 | [8, 12, 3, 5, 35, 39] |
| 6 | Monaco | POINT (7.406913173465057 43.73964568785249) | 7 | 499408 | [7, 6, 12, 3, 1, 0, 2] |
| 7 | Andorra | POINT (1.51648596050552 42.5000014435459) | 5 | 868547 | [8, 7, 37, 6, 12, 3] |
| 8 | Paris | POINT (2.33138946713035 48.86863878981461) | 3 | 569852 | [8, 12, 5, 35, 42, 39] |
| 9 | Ljubljana | POINT (14.51496903347413 46.0552883087945) | 25 | 392927 | [1, 9, 25, 19, 26, 10, 4] |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rob Raymond |
