--- title: "Retrieve Distribution with wcvp_distribution()" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Retrieve Distribution with wcvp_distribution()} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) if (file.exists("../DESCRIPTION") && requireNamespace("pkgload", quietly = TRUE)) { pkgload::load_all("..", export_all = FALSE, helpers = FALSE, quiet = TRUE) } else { library(wcvpmatch) } library(tibble) library(dplyr) ``` `wcvp_distribution()` retrieves geographic distribution from a pair of WCVP-like tables: one names table and one distribution table. In regular use those tables can come from `wcvpdata::wcvp_checklist_names` and `wcvpdata::wcvp_checklist_distribution`, but the function also works with user-supplied tables that follow the same schema. This vignette uses a compact reproducible example so every code block runs without depending on external data packages. ## Example backbone ```{r} make_distribution_names <- function() { tibble( plant_name_id = c(1, 2, 3, 4, 5, 6), accepted_plant_name_id = c(NA, 3, NA, NA, 1, NA), taxon_rank = c("Species", "Species", "Species", "Species", "Species", "Species"), taxon_status = c("Accepted", "Synonym", "Accepted", "Accepted", "Synonym", "Accepted"), family = c("Cactaceae", "Cactaceae", "Cactaceae", "Fagaceae", "Cactaceae", "Cactaceae"), genus = c("Opuntia", "Nopalea", "Opuntia", "Quercus", "Opuntia", "Mammillaria"), species = c("ficus-indica", "cochenillifera", "cochenillifera", "robur", "tuna", "elongata"), taxon_name = c( "Opuntia ficus-indica", "Nopalea cochenillifera", "Opuntia cochenillifera", "Quercus robur", "Opuntia tuna", "Mammillaria elongata" ) ) } make_distribution_records <- function() { tibble( plant_locality_id = 1:7, plant_name_id = c(1, 2, 3, 3, 4, 5, 6), continent_code_l1 = c("8", "8", "8", "4", "1", "8", "8"), continent = c( "SOUTHERN AMERICA", "SOUTHERN AMERICA", "SOUTHERN AMERICA", "NORTHERN AMERICA", "EUROPE", "SOUTHERN AMERICA", "SOUTHERN AMERICA" ), region_code_l2 = c("83", "83", "83", "41", "10", "85", "83"), region = c( "Western South America", "Western South America", "Western South America", "Mexico", "Europe", "Southern South America", "Western South America" ), area_code_l3 = c("MEX", "PER", "COL", "MEX", "ESP", "GAL", "MEX"), area = c("Mexico", "Peru", "Colombia", "Mexico", "Spain", "Galapagos", "Mexico"), introduced = c(0, 0, 0, 1, 0, 0, 0), extinct = c(0, 0, 0, 0, 0, 0, 0), location_doubtful = c(0, 0, 0, 0, 0, 0, 0) ) } distribution_names <- make_distribution_names() distribution_records <- make_distribution_records() distribution_names distribution_records ``` ## Species-level retrieval At species level, `wcvp_distribution()` parses input names, matches them through the same backend used by `wcvp_matching()`, resolves accepted names when possible, and then joins the result to the distribution table. ```{r} species_out <- wcvp_distribution( c("Nopalea cochenilliferaa", "Taxon inexistente"), taxon_rank = "species", wcvp_names = distribution_names, wcvp_distributions = distribution_records ) species_out |> select( submited_name, matched_taxon, accepted_taxon_name, area_code_l3, area, distribution_status ) ``` The first query is fuzzy-matched to a synonym and then resolved to the accepted taxon `Opuntia cochenillifera`. The second query is preserved in the output with `distribution_status = "no_match"`. ## One row per submitted name When `summarise_by_input = TRUE`, the function collapses the area-related columns to one row per input taxon. ```{r} species_summary <- wcvp_distribution( c("Nopalea cochenilliferaa", "Taxon inexistente"), taxon_rank = "species", summarise_by_input = TRUE, wcvp_names = distribution_names, wcvp_distributions = distribution_records ) species_summary |> select( submited_name, accepted_taxon_name, distribution_status, area_codes, distribution, n_areas ) ``` This format is useful for reporting and export because `distribution`, `areas`, `regions`, and `continents` are returned as collapsed text values separated by `" - "`. ## Genus-level retrieval At genus level the function aggregates all accepted taxa in the genus and then summarises occurrences by area. ```{r} genus_out <- wcvp_distribution( "Opuntia", taxon_rank = "genus", introduced = FALSE, wcvp_names = distribution_names, wcvp_distributions = distribution_records ) genus_out |> select(matched_taxon, area_code_l3, area, occurrence_type, distribution_status) ``` The introduced Mexico record is excluded here because `introduced = FALSE`. ## Family-level retrieval Family-level lookup can also use fuzzy matching through `fozziejoin`. ```{r} family_out <- wcvp_distribution( "Cactacee", taxon_rank = "family", max_dist = 1, wcvp_names = distribution_names, wcvp_distributions = distribution_records ) family_out |> select(matched_taxon, match_distance, area_code_l3, area) |> distinct() ``` ## Fallback from species to genus If a species query cannot recover species-level distribution, the function can fall back to the matched genus without interrupting execution. ```{r} fallback_out <- wcvp_distribution( "Opuntia especieinventada", taxon_rank = "species", wcvp_names = distribution_names, wcvp_distributions = distribution_records ) fallback_out |> select(submited_name, matched_taxon, area_code_l3, area, distribution_status) ``` This fallback is marked with `distribution_status = "genus_distribution_fallback"`. ## Practical notes - Use `summarise_by_input = TRUE` when you need one row per submitted name. - Keep the default output when you need one row per taxon-area combination. - The columns `native`, `introduced`, `extinct`, and `location_doubtful` can be used as filters before the final summary is returned. - If `wcvpdata` is installed, you can omit `wcvp_names` and `wcvp_distributions` to use the default backbone.