--- title: "Resolve Names with wcvp_matching()" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Resolve Names with wcvp_matching()} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) if (file.exists("../DESCRIPTION") && requireNamespace("pkgload", quietly = TRUE)) { pkgload::load_all("..", export_all = FALSE, helpers = FALSE, quiet = TRUE) } else { library(wcvpmatch) } library(tibble) library(dplyr) ``` `wcvp_matching()` is the main reconciliation function in `wcvpmatch`. It takes parsed names or a minimal genus/species table, matches those names against a WCVP-like backbone, and returns both the matched taxon and accepted-name context. This vignette uses a compact in-memory backbone so every example runs quickly and reproducibly. ## Example backbone ```{r} make_matching_backbone <- function() { tibble( genus = c("Aniba", "Jaltomata", "Veronica", "Veronica"), species = c("heterotepala", "sagastegui", "vulcanica", "spathulata"), infraspecific_rank = NA_character_, infraspecies = NA_character_, plant_name_id = c(1, 2, 10, 200), taxon_name = c( "Aniba heterotepala", "Jaltomata sagastegui", "Veronica vulcanica", "Veronica spathulata" ), taxon_authors = c("A.Author", "B.Author", "C.Author", "D.Author"), taxon_status = c("Accepted", "Accepted", "Synonym", "Accepted"), accepted_plant_name_id = c(1, 2, 200, 200) ) } matching_backbone <- make_matching_backbone() matching_backbone ``` ## Parse the input names `classify_spnames()` standardizes the incoming names before matching. It is the recommended entry point when you start from raw taxon strings. ```{r} parsed_names <- classify_spnames( c("Aniba heterotepala", "Jaltometa sagasteguii", "Veronica vulcanica") ) parsed_names |> select(Input.Name, Orig.Genus, Orig.Species, Rank) ``` ## Run the matching pipeline The example below shows three common outcomes: - an exact accepted match - a fuzzy species recovery within the matched genus - an exact synonym that resolves to a different accepted name ```{r} matched <- wcvp_matching( parsed_names, target_df = matching_backbone, allow_duplicates = TRUE, max_dist = 2, method = "osa", add_name_distance = TRUE, output_name_style = "snake_case" ) matched |> select( input_name, matched_taxon_name, accepted_taxon_name, taxon_status, matched_dist ) ``` ## Read the pathway flags The logical stage flags show how each input was recovered. These are useful when you want to audit matching behavior or separate exact from fuzzy recoveries. ```{r} matched |> select( input_name, direct_match, genus_match, fuzzy_match_genus, direct_match_species_within_genus, suffix_match_species_within_genus, fuzzy_match_species_within_genus, matched ) ``` In this example, `Aniba heterotepala` is recovered directly, while `Jaltometa sagasteguii` is resolved through fuzzy matching. ## Accepted-name resolution `wcvp_matching()` returns both the matched taxon and the accepted taxon. That is important when the submitted name is a synonym. ```{r} matched |> filter(input_name == "Veronica vulcanica") |> select( input_name, matched_taxon_name, accepted_taxon_name, matched_taxon_authors, accepted_taxon_authors, taxon_status, is_accepted_name ) ``` Here the matched taxon is `Veronica vulcanica`, but the accepted taxon is `Veronica spathulata`. ## Duplicate handling Duplicate input rows are allowed when `allow_duplicates = TRUE`. In that case the function preserves row identity with `input_index`. ```{r} duplicate_input <- tibble( Genus = c("Aniba", "Aniba"), Species = c("heterotepala", "heterotepala"), Input.Name = c("Aniba heterotepala", "Aniba heterotepala") ) wcvp_matching( duplicate_input, target_df = matching_backbone, allow_duplicates = TRUE, output_name_style = "snake_case" ) |> select(input_index, input_name, matched_taxon_name, accepted_taxon_name) ``` ## Profiling For larger jobs, `profile = TRUE` attaches timing information for each stage of the pipeline. ```{r} profiled <- wcvp_matching( parsed_names, target_df = matching_backbone, allow_duplicates = TRUE, max_dist = 2, method = "osa", add_name_distance = TRUE, output_name_style = "snake_case", profile = TRUE ) attr(profiled, "timings") |> select(stage, rows, elapsed_seconds) ``` ## Practical notes - Start from `classify_spnames()` when your input is raw text. - Use `allow_duplicates = TRUE` for production data that may contain repeated names. - Set `output_name_style = "snake_case"` if you want easier downstream use with `dplyr`. - Use `add_name_distance = TRUE` when you want a compact numeric summary of how far each submitted name is from the matched name.