| Title: | Check Threatened Plant Species Status Against Peru's Supreme Decree 043-2006-AG |
|---|---|
| Description: | Provides tools to match plant species names against the official threatened species list of Peru (Supreme Decree 043-2006-AG, 2006). Implements a hierarchical matching pipeline with exact, fuzzy, and suffix matching algorithms to handle naming variations and taxonomic changes. Supports both the original 2006 nomenclature and updated taxonomic names, allowing users to check protection status regardless of nomenclatural changes since the decree's publication. Threat categories follow International Union for Conservation of Nature standards (Critically Endangered, Endangered, Vulnerable, Near Threatened). |
| Authors: | Paul E. Santos Andrade [aut, cre] (ORCID: <https://orcid.org/0000-0002-6635-0375>) |
| Maintainer: | Paul E. Santos Andrade <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.3 |
| Built: | 2026-05-16 05:18:58 UTC |
| Source: | https://github.com/PaulESantos/peruflorads43 |
Simplified interface for checking DS 043-2006-AG status with automatic consolidation of original and updated nomenclature.
check_ds043(splist, return_simple = FALSE)check_ds043(splist, return_simple = FALSE)
splist |
Character vector of species names |
return_simple |
Logical. If TRUE, returns only "Protected" or "Not protected" |
Character vector with protection status
## Not run: species <- c("Brassia ocanensis", "Persea americana") check_ds043(species) ## End(Not run)## Not run: species <- c("Brassia ocanensis", "Persea americana") check_ds043(species) ## End(Not run)
Creates a side-by-side comparison table useful for understanding nomenclatural changes and their impact on DS 043-2006-AG status.
comparison_table_ds043(splist)comparison_table_ds043(splist)
splist |
Character vector of species names |
Tibble with comparison
Extracts information about ambiguous matches (multiple candidates with tied distances) from matching results. This is useful for quality control and manual curation of uncertain matches.
get_ambiguous_matches( match_result, type = c("genus", "species", "infraspecies", "all"), save_to_file = FALSE, output_dir = tempdir() )get_ambiguous_matches( match_result, type = c("genus", "species", "infraspecies", "all"), save_to_file = FALSE, output_dir = tempdir() )
match_result |
A tibble returned by matching functions such as
|
type |
Character. Type of ambiguous matches to retrieve:
|
save_to_file |
Logical. If TRUE, saves results to a CSV file. Default is FALSE (CRAN compliant - no automatic file writing). |
output_dir |
Character. Directory to save the file if save_to_file = TRUE.
Defaults to |
During fuzzy matching, multiple candidates may have identical string distances, making the choice of match ambiguous. The matching algorithm automatically selects the first candidate, but this function allows you to:
Review all ambiguous matches for quality control
Export them for manual curation
Make informed decisions about match quality
A tibble with ambiguous match details, or NULL if no ambiguous matches exist. Columns depend on the match type but typically include original names, matched names, and distance metrics.
When save_to_file = TRUE, a timestamped CSV file is created:
Filename format: "threatenedperu_ambiguous_[type]_[timestamp].csv"
Location: output_dir (defaults to tempdir())
Contains all ambiguous matches with metadata
Provides summary statistics for the threatened species databases.
get_database_summary(type = c("both", "original", "updated"))get_database_summary(type = c("both", "original", "updated"))
type |
Character string: "original", "updated", or "both" (default). |
A tibble with summary statistics.
# Get summary of both databases summary_stats <- get_database_summary() print(summary_stats) # Get summary of just the original summary_original <- get_database_summary("original") print(summary_original)# Get summary of both databases summary_stats <- get_database_summary() print(summary_stats) # Get summary of just the original summary_original <- get_database_summary("original") print(summary_original)
Retrieves the threatened plant species database for Peru. This function provides controlled access to the internal datasets used by the package.
get_threatened_database(type = c("original", "updated"))get_threatened_database(type = c("original", "updated"))
type |
Character string specifying which database version to retrieve. Options are:
|
A tibble containing the threatened species database.
**Original Database** (type = "original"):
~777 species as listed in DS 043-2006-AG
Supports quaternomial names (Rank 4)
Includes both accepted names and synonyms
Columns: scientific_name, genus, species, tag, infraspecies, tag_2, infraspecies_2, threat_category, accepted_name_author, taxonomic_status, accepted_name, family, protected_ds_043
**Updated Database** (type = "updated"):
Updated nomenclature using WCVP and POWO
Supports trinomial names (Rank 3 maximum)
Only accepted names (synonyms resolved)
Columns: scientific_name, genus, species, tag_acc, infraspecies, threat_category, accepted_name_author, taxonomic_status, accepted_name, family, protected_ds_043
Critically Endangered
Endangered
Vulnerable
Near Threatened
Data based on Supreme Decree DS 043-2006-AG, Ministry of Agriculture, Peru (July 13, 2006), which establishes the official list of threatened wild flora species in Peru.
This function is primarily for advanced users who need direct access
to the database structure. For most use cases, use the higher-level
functions: is_threatened_peru or is_ds043_2006_ag.
is_threatened_peru to check threat status of species
is_ds043_2006_ag to check DS 043 protection status
# Get original database db_original <- get_threatened_database(type = "original") str(db_original) nrow(db_original) # Get updated database db_updated <- get_threatened_database(type = "updated") str(db_updated) # Compare number of species n_original <- nrow(db_original) n_updated <- nrow(db_updated) cat("Original:", n_original, "| Updated:", n_updated, "\n") # Count by threat category table(db_original$threat_category) # Find critically endangered orchids orchids <- db_original[db_original$family == "ORCHIDACEAE" & db_original$threat_category == "CR", ] head(orchids$scientific_name)# Get original database db_original <- get_threatened_database(type = "original") str(db_original) nrow(db_original) # Get updated database db_updated <- get_threatened_database(type = "updated") str(db_updated) # Compare number of species n_original <- nrow(db_original) n_updated <- nrow(db_updated) cat("Original:", n_original, "| Updated:", n_updated, "\n") # Count by threat category table(db_original$threat_category) # Find critically endangered orchids orchids <- db_original[db_original$family == "ORCHIDACEAE" & db_original$threat_category == "CR", ] head(orchids$scientific_name)
Performs consolidated matching that searches species names in both the original DS 043-2006-AG list (2006 names) and the updated nomenclature database. This ensures that users with updated names can still identify if their species are protected under the DS 043-2006-AG, even if the nomenclature has updated.
is_ds043_2006_ag(splist, prioritize = "original", return_details = FALSE)is_ds043_2006_ag(splist, prioritize = "original", return_details = FALSE)
splist |
Character vector of species names to check |
prioritize |
Character. Which result to prioritize when both databases match: "original" (default) or "updated" |
return_details |
Logical. Return detailed matching information |
The function performs a two-stage search:
1. Searches in original DS 043-2006-AG (names as listed in 2006) 2. Searches in updated nomenclature database (current accepted names) 3. Consolidates results with clear indication of which database provided the match 4. Identifies if original names are now synonyms
This approach handles cases where: - User provides original name from 2006: Found in original database - User provides updated name: Found in updated database and linked to DS 043-2006-AG list - Name matches in both: Returns most relevant result based on priority - Original name is now a synonym: Indicated with "(synonym)" marker
If return_details = FALSE: Character vector with consolidated threat status. If return_details = TRUE: Tibble with detailed reconciliation information.
## Not run: # Species with nomenclatural changes species <- c( "Haageocereus acranthus subsp. olowinskianus", # Original name "Brassia ocanensis", # Updated name (was Ada) "Ida locusta", # Updated name "Lycaste locusta", # Now a synonym "Persea americana" # Not threatened ) # Get consolidated status status <- is_ds043_2006_ag(species) # Get detailed information details <- is_ds043_2006_ag(species, return_details = TRUE) View(details) ## End(Not run)## Not run: # Species with nomenclatural changes species <- c( "Haageocereus acranthus subsp. olowinskianus", # Original name "Brassia ocanensis", # Updated name (was Ada) "Ida locusta", # Updated name "Lycaste locusta", # Now a synonym "Persea americana" # Not threatened ) # Get consolidated status status <- is_ds043_2006_ag(species) # Get detailed information details <- is_ds043_2006_ag(species, return_details = TRUE) View(details) ## End(Not run)
This function checks if a list of species names are threatened according to the Peruvian threatened species database. The function allows fuzzy matching for species names with a maximum distance threshold to handle potential typos or variations in species names.
is_threatened_peru(splist, source = "original", return_details = FALSE)is_threatened_peru(splist, source = "original", return_details = FALSE)
splist |
A character vector containing the list of species names to be checked for threatened status in Peru. |
source |
Character string specifying which database version to use. Options are:
|
return_details |
Logical. If TRUE, returns detailed matching results. If FALSE (default), returns only the threat status vector. |
If return_details = FALSE: A character vector indicating the threat status of each species ("Not threatened", "Threatened - CR", "Threatened - EN", "Threatened - VU", "Threatened - NT", or "Threatened - Unknown category").
If return_details = TRUE: A tibble with detailed matching results including matched names, threat categories, and matching process information.
# Example 1: Basic usage with valid species names species_list <- c("Cattleya maxima", "Polylepis incana", "Fake species") # Simple status check threat_status <- tryCatch( is_threatened_peru(species_list), error = function(e) { message("Error in matching: ", e$message) rep("Error", length(species_list)) } ) print(threat_status) # Example 2: Detailed results detailed_results <- tryCatch( is_threatened_peru(species_list, return_details = TRUE), error = function(e) { message("Error in detailed matching: ", e$message) NULL } ) if (!is.null(detailed_results)) { print(detailed_results) } # Example 3: Handling NA values gracefully species_with_na <- c("Cattleya maxima", NA, "Polylepis incana") status_with_na <- is_threatened_peru(species_with_na) print(status_with_na) # Example 4: Empty input handling empty_result <- is_threatened_peru(character(0)) print(empty_result) # Should return character(0) # Example 5: Using updated database updated_results <- tryCatch( is_threatened_peru(species_list, source = "updated"), error = function(e) { message("Error with updated database: ", e$message) rep("Error", length(species_list)) } ) print(updated_results)# Example 1: Basic usage with valid species names species_list <- c("Cattleya maxima", "Polylepis incana", "Fake species") # Simple status check threat_status <- tryCatch( is_threatened_peru(species_list), error = function(e) { message("Error in matching: ", e$message) rep("Error", length(species_list)) } ) print(threat_status) # Example 2: Detailed results detailed_results <- tryCatch( is_threatened_peru(species_list, return_details = TRUE), error = function(e) { message("Error in detailed matching: ", e$message) NULL } ) if (!is.null(detailed_results)) { print(detailed_results) } # Example 3: Handling NA values gracefully species_with_na <- c("Cattleya maxima", NA, "Polylepis incana") status_with_na <- is_threatened_peru(species_with_na) print(status_with_na) # Example 4: Empty input handling empty_result <- is_threatened_peru(character(0)) print(empty_result) # Should return character(0) # Example 5: Using updated database updated_results <- tryCatch( is_threatened_peru(species_list, source = "updated"), error = function(e) { message("Error with updated database: ", e$message) rep("Error", length(species_list)) } ) print(updated_results)
This function matches given species names against the internal database of threatened plant species in Peru. It uses a hierarchical matching strategy that includes direct matching, genus-level matching, fuzzy matching, and suffix matching to maximize successful matches while maintaining accuracy.
matching_threatenedperu( splist, source = c("original", "updated"), quiet = TRUE )matching_threatenedperu( splist, source = c("original", "updated"), quiet = TRUE )
splist |
A character vector containing the species names to be matched. Can include duplicate names - results will be expanded to match the input. |
source |
Character string specifying which database version to use. Options are:
|
quiet |
Logical, default TRUE. If FALSE, prints informative messages. |
**Duplicate Handling:** When the input contains duplicate names, the function automatically:
Detects duplicates and creates a tracking column (sorters)
Processes only unique names (efficient matching)
Expands results to restore all original positions
Preserves original input order via sorter column
The duplicate handling uses a 'sorters' column that concatenates all original sorter values for duplicate names (e.g., "1 - 3" for a name appearing at positions 1 and 3), enabling accurate result expansion.
**Matching Strategy:** 1. Direct exact matching 2. Genus-level matching (exact and fuzzy) 3. Species-level matching within genus 4. Infraspecies-level matching (up to 2 levels for original database)
**Rank Validation:** The algorithm implements strict rank validation to prevent false positives.
A tibble with detailed matching results including:
Integer. Original position in input vector
Character. Original input name (standardized)
Character. Matched name from database or "—"
Character. IUCN threat category or "Not threatened"
Integer. Input taxonomic rank (1-4)
Integer. Matched taxonomic rank
Logical. Whether ranks match exactly
Character. Description of match quality
Logical. Whether a match was found
is_threatened_peru for a simplified interface
get_ambiguous_matches to retrieve ambiguous match details
get_threatened_database to access the raw databases
## Not run: # Basic usage species_list <- c("Cattleya maxima", "Polylepis incana") results <- matching_threatenedperu(species_list, source = "original") # With duplicates species_dup <- c("Cattleya maxima", "Polylepis incana", "Cattleya maxima") results_dup <- matching_threatenedperu(species_dup) nrow(results_dup) == 3 # TRUE - preserves duplicates # Access metadata attr(results, "match_rate") # Check for ambiguous matches get_ambiguous_matches(results, type = "infraspecies") ## End(Not run)## Not run: # Basic usage species_list <- c("Cattleya maxima", "Polylepis incana") results <- matching_threatenedperu(species_list, source = "original") # With duplicates species_dup <- c("Cattleya maxima", "Polylepis incana", "Cattleya maxima") results_dup <- matching_threatenedperu(species_dup) nrow(results_dup) == 3 # TRUE - preserves duplicates # Access metadata attr(results, "match_rate") # Check for ambiguous matches get_ambiguous_matches(results, type = "infraspecies") ## End(Not run)