Perform K-D Tree ANN Thinning — kd_tree

This function applies the K-D tree Approximate Nearest Neighbors (ANN) thinning algorithm on a set of spatial coordinates. It can optionally use space partitioning to improve the thinning process, which is particularly useful for large datasets.

Usage

kd_tree_thinning(
  coordinates,
  thin_dist = 10,
  trials = 10,
  all_trials = FALSE,
  space_partitioning = FALSE,
  euclidean = FALSE,
  R = 6371
)

Arguments

coordinates: A matrix of coordinates to thin, with two columns representing longitude and latitude.
thin_dist: A numeric value representing the thinning distance in kilometers. Points closer than this distance to each other are considered redundant and may be removed.
trials: An integer specifying the number of trials to run for thinning. Multiple trials can help achieve a better result by randomizing the thinning process. Default is 10.
all_trials: A logical value indicating whether to return results of all attempts (`TRUE`) or only the best attempt with the most points retained (`FALSE`). Default is `FALSE`.
space_partitioning: A logical value indicating whether to use space partitioning to divide the coordinates into grid cells before thinning. This can improve efficiency in large datasets. Default is `FALSE`.
euclidean: Logical value indicating whether to compute the Euclidean distance (`TRUE`) or Haversine distance (`FALSE`, default).
R: A numeric value representing the radius of the Earth in kilometers. The default is 6371 km.

Value

A list. If `all_trials` is `FALSE`, the list contains a single logical vector indicating which points are kept in the best trial. If `all_trials` is `TRUE`, the list contains a logical vector for each trial.

Examples

# Generate sample coordinates
set.seed(123)
coordinates <- matrix(runif(20, min = -180, max = 180), ncol = 2) # 10 random points

# Perform K-D Tree thinning without space partitioning
result <- kd_tree_thinning(coordinates, thin_dist = 10, trials = 5, all_trials = FALSE)
print(result)
#> [[1]]
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 

# Perform K-D Tree thinning with space partitioning
result_partitioned <- kd_tree_thinning(coordinates, thin_dist = 5000, trials = 5,
                                       space_partitioning = TRUE, all_trials = TRUE)
print(result_partitioned)
#> [[1]]
#>  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
#> 
#> [[2]]
#>  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 
#> [[3]]
#>  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 
#> [[4]]
#>  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 
#> [[5]]
#>  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 

# Perform K-D Tree thinning with Cartesian coordinates
cartesian_coordinates <- long_lat_to_cartesian(coordinates[, 1], coordinates[, 2])
result_cartesian <- kd_tree_thinning(cartesian_coordinates, thin_dist = 10, trials = 5,
                                     euclidean = TRUE)
print(result_cartesian)
#> [[1]]
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#>