This function applies a distance-based thinning algorithm using a kd-tree or brute-force approach. Two modified algorithms based on kd-trees (local kd-trees and estimating the maximum number of neighbors) are implemented which scale better for large datasets. The function removes points that are closer than a specified distance to each other while maximizing spatial representation.
Arguments
- coordinates
A matrix of coordinates to thin, with two columns representing longitude and latitude.
- thin_dist
A positive numeric value representing the thinning distance in kilometers.
- trials
An integer specifying the number of trials to run for thinning. Default is 10.
- all_trials
A logical indicating whether to return results of all attempts (`TRUE`) or only the best attempt with the most points retained (`FALSE`). Default is `FALSE`.
- search_type
A character string indicating the neighbor search method `c("local_kd_tree", "k_estimation", "kd_tree", "brute")`. The defult value is `local_kd_tree`. See details.
- target_points
Optional integer specifying the number of points to retain. If `NULL` (default), the function tries to maximize the number of points retained.
- distance
Distance metric to use `c("haversine", "euclidean")`. Default is Haversine for geographic coordinates.
- R
Radius of the Earth in kilometers (default: 6371 km).
- n_cores
Number of cores for parallel processing (only for `"local_kd_tree"`). Default is 1.
Value
A list. If `all_trials` is `FALSE`, the list contains a single logical vector indicating which points are kept in the best trial. If `all_trials` is `TRUE`, the list contains a logical vector for each trial.
Details
- `"kd_tree"`: Uses a single kd-tree for efficient nearest-neighbor searches. - `"local_kd_tree"`: Builds multiple smaller kd-trees for better scalability. - `"k_estimation"`: Approximates a maximum number of neighbors per point to reduce search complexity. - `"brute"`: Computes all pairwise distances (inefficient for large datasets).
Examples
# Generate sample coordinates
set.seed(123)
result <- matrix(runif(20, min = -180, max = 180), ncol = 2) # 10 random points
# Perform thinning with local kd-trees
result_partitioned <- distance_thinning(result , thin_dist = 5000, trials = 5,
search_type = "local_kd_tree", all_trials = TRUE)
print(result_partitioned)
#> [[1]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
#> [[2]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
#> [[3]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
#> [[4]]
#> [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
#>
#> [[5]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
# Perform thinning estimating max number of neighbors
result_estimated <- distance_thinning(result , thin_dist = 5000, trials = 5,
search_type = "k_estimation", all_trials = TRUE)
print(result_estimated)
#> [[1]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
#> [[2]]
#> [1] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
#>
#> [[3]]
#> [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#>
#> [[4]]
#> [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
#>
#> [[5]]
#> [1] FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE
#>