Skip to contents

This function performs spatial thinning of geographic points to reduce point density while maintaining spatial representation. Points are thinned based on a specified distance, grid, or precision, and multiple trials can be performed to identify the best thinned dataset.

Usage

thin_points(
  data,
  lon_col = "lon",
  lat_col = "lat",
  group_col = NULL,
  method = c("distance", "grid", "precision"),
  trials = 10,
  all_trials = FALSE,
  seed = NULL,
  verbose = FALSE,
  ...
)

Arguments

data

A data frame or tibble containing the points to thin. Must contain longitude and latitude columns.

lon_col

Name of the column with longitude coordinates (default: "lon").

lat_col

Name of the column with latitude coordinates (default: "lat").

group_col

Name of the column for grouping points (e.g., species name, year). If NULL, no grouping is applied.

method

Thinning method to use `c("distance", "grid", "precision")`.

trials

Number of thinning iterations to perform (default: 10). Must be a positive nummber.

all_trials

If TRUE, returns results of all attempts; if FALSE, returns the best attempt with the most points retained (default: FALSE).

seed

Optional; an integer seed for reproducibility of results.

verbose

If TRUE, prints progress messages (default: FALSE).

...

Additional parameters passed to specific thinning methods. See Details.

Value

A list with a data.frame/matrix/tibble of thinned points if `all_trials = TRUE`, or a combined result of all attempts if `all_trials = TRUE`.

Details

The thinning methods available are:

`distance`

Forces a specific minimum distance between points.

`grid`

Applies a grid-based thinning method.

`precision`

Utilizes precision-based thinning.

Distance-based thinning

The specific parameters for distance-based thinning are:

`thin_dist`

A positive numeric value representing the thinning distance in kilometers.

`search_type`

A character string indicating the neighbor search method 'c("local_kd_tree", "k_estimation", "kd_tree", "brute")'. The defult value is 'local_kd_tree'.

`distance`

Distance metric to use 'c("haversine", "euclidean")'. Default is Haversine for geographic coordinates.

`R`

The radius of the Earth in kilometers. Default is 6371 km.

`target_points`

Optional integer specifying the number of points to retain. If 'NULL' (default), the function tries to maximize the number of points retained.

`n_cores`

Number of cores for parallel processing (only for '"local_kd_tree"'). Default is 1.

Grid-based thinning

The specific parameters for grid-based thinning are:

`thin_dist`

A positive numeric value representing the thinning distance in kilometers.

`resolution`

A numeric value representing the resolution (in degrees) of the raster grid. If provided, this takes priority over 'thin_dist'.

`origin`

A numeric vector of length 2 (e.g., 'c(0, 0)'), specifying the origin of the raster grid (optional).

`raster_obj`

An optional 'terra::SpatRaster' object to use for grid thinning. If provided, the raster object will be used instead of creating a new one.

`n`

A positive integer specifying the maximum number of points to retain per grid cell (default: 1).

`crs`

An optional CRS (Coordinate Reference System) to project the coordinates and raster (default WGS84, 'epsg:4326'). This can be an EPSG code, a PROJ.4 string, or a 'terra::crs' object.

`priority`

A numeric vector of the same length as the number of points with numerical values indicating the priority of each point. Instead of eliminating points randomly, higher values are preferred during thinning.

Precision-based thinning

The specific parameters for precision-based thinning are:

`precision`

A positive integer specifying the number of decimal places to which coordinates should be rounded. Default is 4.

`priority`

A numeric vector of the same length as the number of points with numerical values indicating the priority of each point. Instead of eliminating points randomly, higher values are preferred during thinning.

For more information on specific thinning methods and inputs, refer to their respective documentation:

  • `distance_thinning()`

  • `grid_thinning()`

  • `precision_thinning()`

Examples

# Generate sample data
set.seed(123)
sample_data <- data.frame(
  lon = runif(100, -180, 180),
  lat = runif(100, -90, 90)
)

# Perform thinning using distance method
thinned_data <- thin_points(sample_data,
                             lon_col = "lon",
                             lat_col = "lat",
                             method = "distance",
                             trials = 5,
                             verbose = TRUE)
#> Starting spatial thinning at 2025-03-27 19:20:41 
#> Thinning using method: distance 
#> Thinning process completed.
#> Total execution time: 0 seconds

# Perform thinning with grouping
sample_data$species <- sample(c("species_A", "species_B"), 100, replace = TRUE)
thinned_grouped_data <- thin_points(sample_data,
                                     lon_col = "lon",
                                     lat_col = "lat",
                                     group_col = "species",
                                     method = "distance",
                                     trials = 10)