| Title: | Taxonomic Backbone and Name Validation Tools for Mammals of Peru |
|---|---|
| Description: | Provides a curated taxonomic backbone of mammal species from Peru based on Pacheco et al. (2021) "Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización" <doi:10.15381/rpb.v28i4.21019>. The package includes standardized species data, occurrence by ecoregions, endemism status, and tools for validating and matching scientific names through exact and fuzzy procedures. It is designed as a lightweight and dependable reference for ecological, environmental, biogeographic, and conservation workflows that require reliable species information for Peruvian mammals. |
| Authors: | Paul E. Santos Andrade [aut, cre] (ORCID: <https://orcid.org/0000-0002-6635-0375>), Fiorella N. Gonzales Guillen [ctb] (ORCID: <https://orcid.org/0000-0001-5240-2464>) |
| Maintainer: | Paul E. Santos Andrade <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.1 |
| Built: | 2026-05-25 07:30:53 UTC |
| Source: | https://github.com/PaulESantos/perumammals |
Simplified boolean check for species presence in Peru mammals database. Useful for filtering and logical operations.
found_in_peru(splist, exact_only = FALSE)found_in_peru(splist, exact_only = FALSE)
splist |
Character vector of species names |
exact_only |
Logical. If TRUE, only exact matches return TRUE (default: FALSE) |
Logical vector (TRUE = found, FALSE = not found)
species <- c("Panthera onca", "Tremarctos orrnatus", "Tremarctos orrnatos", "Felis catus") # Check presence (includes fuzzy matches) found_in_peru(species) tibble::tibble(splist = species) |> dplyr::mutate(endemic = found_in_peru(splist))species <- c("Panthera onca", "Tremarctos orrnatus", "Tremarctos orrnatos", "Felis catus") # Check presence (includes fuzzy matches) found_in_peru(species) tibble::tibble(splist = species) |> dplyr::mutate(endemic = found_in_peru(splist))
Extracts information about ambiguous matches (multiple candidates with tied distances) from matching results. Useful for quality control and manual curation. Adapted for peru_mammals (genus and species only).
get_ambiguous_matches( match_result, type = c("genus", "species", "all"), save_to_file = FALSE, output_dir = tempdir() )get_ambiguous_matches( match_result, type = c("genus", "species", "all"), save_to_file = FALSE, output_dir = tempdir() )
match_result |
A tibble returned by matching functions. |
type |
Character. Type of ambiguous matches to retrieve:
|
save_to_file |
Logical. If TRUE, saves results to CSV. Default is FALSE (CRAN compliant). |
output_dir |
Character. Directory to save file if save_to_file = TRUE.
Defaults to |
During fuzzy matching, multiple candidates may have identical string distances. The matching algorithm automatically selects the first candidate, but this function allows you to review all alternatives for quality control.
A tibble with ambiguous match details, or NULL if none exist. Includes original names, matched names, distances, and database metadata.
Returns taxonomic classification and common names for species validated against the Peru mammals database.
get_common_names_peru(splist, return_details = FALSE)get_common_names_peru(splist, return_details = FALSE)
splist |
Character vector of species names |
return_details |
Logical. If TRUE, includes full taxonomic information (default: FALSE) |
If return_details = FALSE: Character vector with common names If return_details = TRUE: Tibble with taxonomic and common name information
species <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor", "Myotis bakeri") # Get common names # Vector get_common_names_peru(species) # tibble tibble::tibble(splist = species) |> dplyr::mutate(endemic = get_common_names_peru(splist)) # Get full taxonomic information taxonomy <- get_common_names_peru(species, return_details = TRUE) taxonomyspecies <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor", "Myotis bakeri") # Get common names # Vector get_common_names_peru(species) # tibble tibble::tibble(splist = species) |> dplyr::mutate(endemic = get_common_names_peru(splist)) # Get full taxonomic information taxonomy <- get_common_names_peru(species, return_details = TRUE) taxonomy
Simplified wrapper specifically for checking endemism status of mammals in Peru. Only evaluates species that are confirmed to occur in Peru.
is_endemic_peru(splist, return_logical = FALSE, filter_exact = FALSE)is_endemic_peru(splist, return_logical = FALSE, filter_exact = FALSE)
splist |
Character vector of species names |
return_logical |
Logical. If TRUE, returns logical vector (TRUE/FALSE/NA). If FALSE, returns descriptive character vector (default: FALSE) |
filter_exact |
Logical. If TRUE, only considers exact matches (default: FALSE) |
If return_logical = FALSE: Character vector with endemism status If return_logical = TRUE: Logical vector (TRUE = endemic, FALSE = not endemic, NA = not found or endemism unknown)
species <- c("Panthera onca", "Atelocynus microtis", "Felis catus", "Myotis bakeri") is_endemic_peru(species) # Descriptive output tibble::tibble(splist = species) |> dplyr::mutate(endemic = is_endemic_peru(splist))species <- c("Panthera onca", "Atelocynus microtis", "Felis catus", "Myotis bakeri") is_endemic_peru(species) # Descriptive output tibble::tibble(splist = species) |> dplyr::mutate(endemic = is_endemic_peru(splist))
Main wrapper function that validates species names against the Peru mammals database with various output options for match quality, endemism status, and detailed information.
is_peru_mammal( splist, return_details = FALSE, match_type = "status", filter_exact = FALSE )is_peru_mammal( splist, return_details = FALSE, match_type = "status", filter_exact = FALSE )
splist |
Character vector of species names to check |
return_details |
Logical. If TRUE, returns full validation tibble. If FALSE, returns simplified status vector (default: FALSE) |
match_type |
Character. Type of information to return when return_details = FALSE:
|
filter_exact |
Logical. If TRUE, only returns exact matches (genus_dist = 0 AND species_dist = 0). Fuzzy matches are treated as "Not found" (default: FALSE) |
This function wraps validate_peru_mammals() to provide flexible output
formats for different use cases:
Basic presence/absence checking
Match quality assessment (exact vs fuzzy)
Endemism status queries
The function handles taxonomic matching with fuzzy string matching to accommodate minor spelling variations while maintaining data quality.
When filter_exact = TRUE, only matches with zero edit distance in both genus and species names are considered valid matches. All fields related to fuzzy matches are set to NA or "—" to maintain consistency.
If return_details = FALSE: Character vector with requested information. If return_details = TRUE: Tibble with complete validation information.
species <- c( "Panthera onca", # Exact match "Pantera onca", # Fuzzy match (genus misspelled) "Tremarctos orrnatus", # Fuzzy match (species misspelled) "Felis domesticus", # Not in Peru "Myotis bakeri" ) # Check if species are found (includes fuzzy matches) is_peru_mammal(species) # Check with exact matches only is_peru_mammal(species, filter_exact = TRUE) # Check match quality is_peru_mammal(species, match_type = "match_quality") # Check endemism is_peru_mammal(species, match_type = "endemic") # Get detailed information is_peru_mammal(species, return_details = TRUE) # Get detailed information with exact matches only is_peru_mammal(species, return_details = TRUE, filter_exact = TRUE)species <- c( "Panthera onca", # Exact match "Pantera onca", # Fuzzy match (genus misspelled) "Tremarctos orrnatus", # Fuzzy match (species misspelled) "Felis domesticus", # Not in Peru "Myotis bakeri" ) # Check if species are found (includes fuzzy matches) is_peru_mammal(species) # Check with exact matches only is_peru_mammal(species, filter_exact = TRUE) # Check match quality is_peru_mammal(species, match_type = "match_quality") # Check endemism is_peru_mammal(species, match_type = "endemic") # Get detailed information is_peru_mammal(species, return_details = TRUE) # Get detailed information with exact matches only is_peru_mammal(species, return_details = TRUE, filter_exact = TRUE)
Returns the quality of taxonomic name matching (exact vs fuzzy) for species validated against the Peru mammals database.
match_quality_peru(splist, return_details = FALSE)match_quality_peru(splist, return_details = FALSE)
splist |
Character vector of species names |
return_details |
Logical. If TRUE, includes distance metrics and matching information (default: FALSE) |
Match quality categories:
"Exact": Perfect match with no spelling differences (genus_dist = 0, species_dist = 0)
"Fuzzy": Match found with minor spelling variations (genus_dist > 0 or species_dist > 0)
"Not found": No match in database
The function uses string distance metrics to quantify matching quality:
genus_dist: Edit distance for genus name
species_dist: Edit distance for species epithet
If return_details = FALSE: Character vector with match quality If return_details = TRUE: Tibble with detailed matching information
species <- c( "Panthera onca", # Exact "Tremarctos orrnatus", # Fuzzy (spelling error) "Felis domesticus", # Not found "Myotis bakeri" ) # Simple quality check match_quality_peru(species) # Detailed information with edit distances details <- match_quality_peru(species, return_details = TRUE) detailsspecies <- c( "Panthera onca", # Exact "Tremarctos orrnatus", # Fuzzy (spelling error) "Felis domesticus", # Not found "Myotis bakeri" ) # Simple quality check match_quality_peru(species) # Detailed information with edit distances details <- match_quality_peru(species, return_details = TRUE) details
A backbone of the terrestrial and marine mammal species known for Peru, compiled from Pacheco et al. (2021) "Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización".
data("peru_mammals")data("peru_mammals")
A tibble with 573 rows and 12 variables:
Character. Internal stable identifier for the species, combining the original numeric id and an abbreviation of the genus. Intended for internal linking between tables.
Character. Taxonomic order (e.g. Didelphimorphia, Rodentia, Chiroptera).
Character. Taxonomic family.
Character. Genus name.
Character. Specific epithet.
Character. Binomial scientific name (Genus species), without authorship. This is the main field used for name validation.
Character. Full scientific name including authorship and year, as provided in the original annex.
Character. Authorship and year of the species name.
Character. Common name in Spanish, when available.
Logical. TRUE if the species is considered endemic
to Peru in Pacheco et al. (2021), FALSE otherwise.
Character. Comma-separated codes of Peruvian
ecoregions where the species occurs, using the abbreviations
defined by Pacheco et al. (2021) (e.g. "YUN, SB, SP"). See
peru_mammals_ecoregions_meta for code definitions.
Character. Bibliographic notes or specific references supporting the presence or taxonomy of the species.
Each row corresponds to a single species as listed in the original annex of the paper. This dataset is the main taxonomic backbone used by the perumammals package.
Pacheco, V., Cadenillas, R., Zeballos, H., Hurtado, C. M., Ruelas, D., & Pari, A. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización.
A one-row tibble with metadata about the taxonomic backbone used in perumammals, including its bibliographic source, year, number of species and the date when the internal data objects were created.
data("peru_mammals_backbone")data("peru_mammals_backbone")
A tibble with 1 row and 4 variables:
Character. Short bibliographic reference to the backbone source (Pacheco et al. 2021).
Integer. Publication year of the backbone source (2021).
Integer. Number of species included in the backbone
(as rows in peru_mammals).
Date. Date when the backbone data objects were generated (in the package build process).
This object is intended for internal bookkeeping and for functions that report the origin and version of the backbone.
A long-format table linking each mammal species to the Peruvian ecoregions where it occurs, based on Pacheco et al. (2021).
data("peru_mammals_ecoregions")data("peru_mammals_ecoregions")
A tibble with one row per species–ecoregion combination and 3 variables:
Character. Internal species identifier, matching
peru_mammals.
Character. Binomial scientific name (Genus species).
Character. Abbreviation of the ecoregion where
the species occurs (e.g. "YUN", "SB", "COS").
See peru_mammals_ecoregions_meta for code definitions.
Each row corresponds to a single combination of species and ecoregion.
This dataset is derived from the ecoregions field of
peru_mammals.
Pacheco et al. (2021).
peru_mammals,
peru_mammals_ecoregions_meta
Definitions of the ecoregion codes used in peru_mammals
and peru_mammals_ecoregions. The codes follow the
abbreviations used by Pacheco et al. (2021), based on Peruvian
ecoregion schemes.
data("peru_mammals_ecoregions_meta")data("peru_mammals_ecoregions_meta")
A tibble with one row per ecoregion code and 2 variables:
Character. Ecoregion abbreviation. The codes used in the dataset are:
"OCE" – Oceánica
"BPP" – Bosque Pluvial del Pacífico
"BSE" – Bosque Seco Ecuatorial
"COS" – Costa
"VOC" – Vertiente Occidental
"PAR" – Páramo
"PUN" – Puna
"YUN" – Yungas
"SB" – Selva Baja
"SP" – Sabana de Palmera
Character. Human-readable label/description of the ecoregion in Spanish.
Pacheco et al. (2021).
peru_mammals,
peru_mammals_ecoregions
Displays summary information about the taxonomic backbone used in perumammals. The backbone is based on the taxonomic checklist published by Pacheco et al. (2021), which was digitised from the original PDF publication into a structured tibble format.
pm_backbone_info()pm_backbone_info()
Invisibly returns a tibble with one row containing the backbone
metadata. The same structure as peru_mammals_backbone.
Called primarily for its side effect of printing the summary information.
Pacheco Torres, V. R., Diaz, S., Graham Angeles, L. A., Flores-Quispe, M., Calizaya-Mamani, G., Ruelas, D., & Sánchez-Vendizú, P. (2021). Lista actualizada de la diversidad de los mamíferos del Perú y una propuesta para su actualización. Revista Peruana De Biología, 28(4), e21019. doi:10.15381/rpb.v28i4.21019
peru_mammals_backbone for the complete backbone data.
# Display backbone information pm_backbone_info() # Access the data invisibly returned backbone_data <- pm_backbone_info() backbone_data$n_species# Display backbone information pm_backbone_info() # Access the data invisibly returned backbone_data <- pm_backbone_info() backbone_data$n_species
Convenience wrapper to list species occurring in one or more Peruvian
ecoregions. This function uses pm_species() internally and
therefore supports the same taxonomic and endemism filters.
pm_by_ecoregion( ecoregion, order = NULL, family = NULL, genus = NULL, endemic = NULL )pm_by_ecoregion( ecoregion, order = NULL, family = NULL, genus = NULL, endemic = NULL )
ecoregion |
Character vector with one or more ecoregion codes
(e.g. |
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
endemic |
Optional logical. If |
A tibble with a subset of rows from peru_mammals
corresponding to species present in at least one of the requested
ecoregions. Returns an empty tibble if no species match the criteria.
pm_list_ecoregions() to see available ecoregion codes,
pm_species() for the underlying function.
# All species in Yungas pm_by_ecoregion("YUN") # Endemic species in Selva Baja (SB) pm_by_ecoregion("SB", endemic = TRUE) # Rodents in Costa and Vertiente Occidental pm_by_ecoregion(c("COS", "VOC"), order = "Rodentia") # Bats in multiple ecoregions pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera") pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera", endemic = TRUE)# All species in Yungas pm_by_ecoregion("YUN") # Endemic species in Selva Baja (SB) pm_by_ecoregion("SB", endemic = TRUE) # Rodents in Costa and Vertiente Occidental pm_by_ecoregion(c("COS", "VOC"), order = "Rodentia") # Bats in multiple ecoregions pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera") pm_by_ecoregion(c("YUN", "SB"), order = "Chiroptera", endemic = TRUE)
Computes a summary of species richness and endemism for each ecoregion in the Peruvian mammal backbone.
pm_ecoregion_summary(sort_by = c("code", "species", "endemic", "label"))pm_ecoregion_summary(sort_by = c("code", "species", "endemic", "label"))
sort_by |
Character string indicating how to sort the results. Options are:
|
The summary is based on the long-format table
peru_mammals_ecoregions and joins metadata from
peru_mammals_ecoregions_meta and endemism information
from peru_mammals.
A tibble with one row per ecoregion and the following columns:
ecoregion_code – ecoregion abbreviation.
ecoregion_label – ecoregion description in Spanish.
n_species – total number of species recorded in the ecoregion.
n_endemic – number of endemic species recorded in the ecoregion.
pct_endemic – percentage of endemic species in the ecoregion.
pm_list_ecoregions() for ecoregion metadata,
pm_by_ecoregion() to list species by ecoregion.
# Get summary for all ecoregions (sorted by code) pm_ecoregion_summary() # Sort by species richness pm_ecoregion_summary(sort_by = "species") # Sort by number of endemic species pm_ecoregion_summary(sort_by = "endemic") # Find ecoregion with highest species richness eco_summary <- pm_ecoregion_summary(sort_by = "species") eco_summary[1, ] # Ecoregions with more than 100 species eco_summary <- pm_ecoregion_summary() subset(eco_summary, n_species > 100) # Compare richness between lowland and highland ecoregions eco_summary <- pm_ecoregion_summary(sort_by = "species") lowland <- eco_summary[eco_summary$ecoregion_code %in% c("SB", "SP"), ] highland <- eco_summary[eco_summary$ecoregion_code %in% c("PUN", "PAR"), ]# Get summary for all ecoregions (sorted by code) pm_ecoregion_summary() # Sort by species richness pm_ecoregion_summary(sort_by = "species") # Sort by number of endemic species pm_ecoregion_summary(sort_by = "endemic") # Find ecoregion with highest species richness eco_summary <- pm_ecoregion_summary(sort_by = "species") eco_summary[1, ] # Ecoregions with more than 100 species eco_summary <- pm_ecoregion_summary() subset(eco_summary, n_species > 100) # Compare richness between lowland and highland ecoregions eco_summary <- pm_ecoregion_summary(sort_by = "species") lowland <- eco_summary[eco_summary$ecoregion_code %in% c("SB", "SP"), ] highland <- eco_summary[eco_summary$ecoregion_code %in% c("PUN", "PAR"), ]
Returns endemic species from the Peruvian mammal backbone, with optional filters by order, family and/or ecoregion.
pm_endemics(order = NULL, family = NULL, genus = NULL, ecoregion = NULL)pm_endemics(order = NULL, family = NULL, genus = NULL, ecoregion = NULL)
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
ecoregion |
Optional character vector with one or more ecoregion
codes (e.g. |
This is a convenience wrapper around pm_species() with
endemic = TRUE.
A tibble with endemic species (subset of peru_mammals).
# All endemic species pm_endemics() # Endemic rodents pm_endemics(order = "Rodentia") # Endemic species in Yungas (YUN) pm_endemics(ecoregion = "YUN")# All endemic species pm_endemics() # Endemic rodents pm_endemics(order = "Rodentia") # Endemic species in Yungas (YUN) pm_endemics(ecoregion = "YUN")
Displays summary information about the ecoregions used in the Peruvian mammal backbone. Ecoregions follow the Brack-Egg (1986) classification system used in Peruvian biogeography to describe the distribution of mammal species across different ecological regions.
pm_list_ecoregions(include_endemic = FALSE)pm_list_ecoregions(include_endemic = FALSE)
include_endemic |
Logical. If |
The ecoregion classification follows Brack-Egg (1986), a widely-used biogeographic framework for Peru that recognizes 10 distinct ecological regions based on climate, vegetation, and elevation. This classification is used in Pacheco et al. (2021) to document the distribution patterns of Peruvian mammals.
The function prints a formatted summary to the console and invisibly returns the complete data for further analysis.
A tibble with one row per ecoregion, arranged in descending order by species richness, with the following columns:
Abbreviated ecoregion code (e.g., "SB", "YUN")
Full ecoregion name in Spanish
Total number of mammal species recorded in the ecoregion
Percentage of Peru's total mammal diversity (0-100)
(Only if include_endemic = TRUE) Number of
endemic species in the ecoregion
(Only if include_endemic = TRUE) Percentage of
endemic species relative to total species in the ecoregion (0-100)
Brack-Egg, A. (1986). Ecología de un país complejo. In J. Mejía Baca (Ed.), Gran Geografía del Perú: Naturaleza y Hombre (Vol. 2, pp. 175-319). Barcelona: Manfer-Mejía Baca.
peru_mammals_ecoregions_meta for the complete ecoregion metadata,
peru_mammals_ecoregions for species-ecoregion associations,
pm_by_ecoregion() to filter species by ecoregion,
pm_ecoregion_summary() for species richness summaries by ecoregion.
# Display ecoregion information pm_list_ecoregions() # Include endemic species information pm_list_ecoregions(include_endemic = TRUE) # Access the data for further analysis ecoregion_data <- pm_list_ecoregions() # Ecoregions with highest species richness ecoregion_data# Display ecoregion information pm_list_ecoregions() # Include endemic species information pm_list_ecoregions(include_endemic = TRUE) # Access the data for further analysis ecoregion_data <- pm_list_ecoregions() # Ecoregions with highest species richness ecoregion_data
Summarises the diversity of endemic mammal species in Peru, grouped by taxonomic order. Provides counts of families, genera, and species that are endemic to Peru within each order. Optionally includes endemism rates relative to total species richness.
pm_list_endemic(include_rate = FALSE)pm_list_endemic(include_rate = FALSE)
include_rate |
Logical. If |
This function focuses exclusively on species that are endemic to Peru (i.e., species found nowhere else in the world). Orders without any endemic species are not included in the output.
When include_rate = FALSE (default), results are sorted by the
number of endemic species in descending order, highlighting which orders
have the highest endemic diversity.
When include_rate = TRUE, results are sorted by total species
richness in descending order, and include endemism rates to show what
proportion of each order's diversity is endemic to Peru. A summary row
labeled "Total" is appended to show overall statistics.
A tibble with one row per order containing endemic species, arranged in descending order by number of endemic species, with the following columns:
Taxonomic order
Number of families with endemic species in the order
Number of genera with endemic species in the order
Number of endemic species in the order
(Only if include_rate = TRUE) Total number of
species in the order
(Only if include_rate = TRUE) Proportion of
endemic species (0-1)
(Only if include_rate = TRUE) Percentage of
endemic species (0-100)
# Summary of endemic species by order pm_list_endemic() # Include endemism rates pm_list_endemic(include_rate = TRUE)# Summary of endemic species by order pm_list_endemic() # Include endemism rates pm_list_endemic(include_rate = TRUE)
Summarises the number of genera, species and endemic species per family. Optionally filters the output to one or more taxonomic orders.
pm_list_families(order = NULL)pm_list_families(order = NULL)
order |
Optional character vector specifying one or more taxonomic
orders to include. If |
A tibble with one row per family, arranged by order and family name, with the following columns:
Taxonomic order
Family name
Number of genera in the family
Number of species in the family
Number of endemic species to Peru in the family
# All families pm_list_families() # Only families within Rodentia pm_list_families(order = "Rodentia") # Multiple orders pm_list_families(order = c("Rodentia", "Chiroptera"))# All families pm_list_families() # Only families within Rodentia pm_list_families(order = "Rodentia") # Multiple orders pm_list_families(order = c("Rodentia", "Chiroptera"))
Summarises the number of species and endemic species per genus. Optionally restricts the output to one or more orders and/or families. Genera with missing values are excluded from the results.
pm_list_genera(order = NULL, family = NULL)pm_list_genera(order = NULL, family = NULL)
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
The function validates input parameters and warns if invalid order or family names are provided. It also warns if the filters result in an empty dataset.
A tibble with one row per genus and the following columns:
order – taxonomic order.
family – family name.
genus – genus name.
n_species – number of species in the genus.
n_endemic – number of endemic species in the genus.
Returns an empty tibble with the same structure if no records match the specified filters.
# All genera pm_list_genera() # Genera within Chiroptera (bats) pm_list_genera(order = "Chiroptera") # Multiple orders pm_list_genera(order = c("Didelphimorphia", "Chiroptera")) # Genera within a specific family bat_genera <- pm_list_genera(family = "Phyllostomidae") # Count total endemic species in a family sum(bat_genera$n_endemic) # Combination of filters pm_list_genera(order = "Chiroptera", family = "Phyllostomidae")# All genera pm_list_genera() # Genera within Chiroptera (bats) pm_list_genera(order = "Chiroptera") # Multiple orders pm_list_genera(order = c("Didelphimorphia", "Chiroptera")) # Genera within a specific family bat_genera <- pm_list_genera(family = "Phyllostomidae") # Count total endemic species in a family sum(bat_genera$n_endemic) # Combination of filters pm_list_genera(order = "Chiroptera", family = "Phyllostomidae")
Summarises the number of families, genera, species and endemic species
per order in peru_mammals.
pm_list_orders()pm_list_orders()
A tibble with one row per order and the following columns:
order – taxonomic order.
n_families – number of families in the order.
n_genera – number of genera in the order.
n_species – number of species in the order.
n_endemic – number of endemic species in the order.
pm_list_orders()pm_list_orders()
Convenience wrapper around peru_mammals to subset species by
taxonomic group, endemism and/or ecoregion.
pm_species( order = NULL, family = NULL, genus = NULL, endemic = NULL, ecoregion = NULL )pm_species( order = NULL, family = NULL, genus = NULL, endemic = NULL, ecoregion = NULL )
order |
Optional character vector with one or more taxonomic orders
to keep. If |
family |
Optional character vector with one or more families to keep.
If |
genus |
Optional character vector with one or more genera to keep.
If |
endemic |
Optional logical. If |
ecoregion |
Optional character vector with one or more ecoregion
codes (e.g. |
A tibble with a subset of rows from peru_mammals.
# All species pm_species() # Only Rodentia pm_species(order = "Rodentia") # Endemic bats (Chiroptera) pm_species(order = "Chiroptera", endemic = TRUE) # Species present in Yungas (YUN) and Selva Baja (SB) pm_species(ecoregion = c("YUN", "SB"))# All species pm_species() # Only Rodentia pm_species(order = "Rodentia") # Endemic bats (Chiroptera) pm_species(order = "Chiroptera", endemic = TRUE) # Species present in Yungas (YUN) and Selva Baja (SB) pm_species(ecoregion = c("YUN", "SB"))
Matches given species names against the official list of mammal species of Peru (Pacheco et al. 2021). Uses a hierarchical matching strategy that includes direct matching, genus-level matching, and fuzzy matching to maximize successful matches while maintaining accuracy.
Peru Mammals Database:
575 mammal species
Binomial nomenclature only (no infraspecific taxa)
Includes 6 undescribed species ("sp." cases)
Fields: genus, species, scientific_name, common_name, family, order, endemic
validate_peru_mammals(splist, quiet = TRUE)validate_peru_mammals(splist, quiet = TRUE)
splist |
A character vector containing the species names to be matched. Names can be in any format (uppercase, lowercase, with underscores, etc.). Duplicate names are preserved in the output. |
quiet |
Logical, default TRUE. If FALSE, prints informative messages about the matching progress. |
Matching Strategy: The function implements a hierarchical matching pipeline:
Node 1 - Direct Match: Exact matching of binomial names (genus + species)
Node 2 - Genus Match: Exact matching at genus level
Node 3 - Fuzzy Genus: Fuzzy matching for genus with typos (max distance = 1)
Node 4 - Fuzzy Species: Fuzzy matching for species within matched genus
Special Cases:
Handles "sp." cases: "Akodon sp. Ancash", "Oligoryzomys sp. B", etc.
Case-insensitive matching
Removes common qualifiers (CF., AFF.)
Standardizes spacing and formatting
Rank System:
Rank 1: Genus level only (e.g., "Panthera")
Rank 2: Binomial (genus + species, e.g., "Panthera onca")
Ambiguous Matches:
When multiple candidates have identical fuzzy match scores, a warning is
issued and the first match is selected. Use get_ambiguous_matches()
to examine these cases.
Input Requirements:
Species names must be provided as binomials (Genus species) WITHOUT:
Author information: Panthera onca Linnaeus"
Infraspecific taxa: "Panthera onca onca"
Parenthetical authors: "Panthera onca (Linnaeus, 1758)"
Valid formats:
Standard binomial: "Panthera onca"
Undescribed species: "Akodon sp. Ancash"
Case-insensitive: "PANTHERA ONCA" or "panthera onca"
Names with 3+ elements will be automatically rejected with a warning.
A tibble with the following columns:
Integer. Original position in input vector
Character. Original input name (standardized)
Character. Matched name from database or "—"
Character. Quality of match ("Exact rank", "No match", etc.)
Logical. Whether a match was found
Integer. Input taxonomic rank (1 or 2)
Integer. Matched taxonomic rank (1 or 2)
Logical. Whether ranks match exactly
Logical. Whether match is valid at correct rank
Character. Input genus (uppercase)
Character. Input species (uppercase)
Character. Taxonomic authority if provided
Character. Matched genus (uppercase)
Character. Matched species (uppercase)
Integer. Edit distance for genus (0=exact, >0=fuzzy, NA=no match)
Integer. Edit distance for species (0=exact, >0=fuzzy, NA=no match or genus-only)
Character. Scientific name from peru_mammals
Character. Common name in Spanish
Character. Family
Character. Order
Logical. Endemic to Peru?
Attributes:
The output includes metadata accessible via attr():
target_database: "peru_mammals"
matching_date: Date of matching
n_input: Number of input names
n_matched: Number of successful matches
match_rate: Percentage of successful matches
n_fuzzy_genus: Number of fuzzy genus matches
n_fuzzy_species: Number of fuzzy species matches
ambiguous_genera: Ambiguous genus matches (if any)
ambiguous_species: Ambiguous species matches (if any)
get_ambiguous_matches to retrieve ambiguous match details
# Basic usage species_list <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor") results <- validate_peru_mammals(species_list) # Check results table(results$matched) table(results$Match.Level) # View matched species results |> dplyr::filter(matched) |> dplyr::select(Orig.Name, Matched.Name, common_name, endemic) # With typos (fuzzy matching) typos <- c("Pumma concolor", "Tremarctos ornatu") # Spelling errors results_fuzzy <- validate_peru_mammals(typos, quiet = FALSE) # Check for ambiguous matches get_ambiguous_matches(results_fuzzy, type = "genus") # Access metadata attr(results, "match_rate") attr(results, "n_fuzzy_genus") # With special "sp." cases sp_cases <- c("Akodon sp. Ancash", "Oligoryzomys sp. B") results_sp <- validate_peru_mammals(sp_cases) # Should match exactly# Basic usage species_list <- c("Panthera onca", "Tremarctos ornatus", "Puma concolor") results <- validate_peru_mammals(species_list) # Check results table(results$matched) table(results$Match.Level) # View matched species results |> dplyr::filter(matched) |> dplyr::select(Orig.Name, Matched.Name, common_name, endemic) # With typos (fuzzy matching) typos <- c("Pumma concolor", "Tremarctos ornatu") # Spelling errors results_fuzzy <- validate_peru_mammals(typos, quiet = FALSE) # Check for ambiguous matches get_ambiguous_matches(results_fuzzy, type = "genus") # Access metadata attr(results, "match_rate") attr(results, "n_fuzzy_genus") # With special "sp." cases sp_cases <- c("Akodon sp. Ancash", "Oligoryzomys sp. B") results_sp <- validate_peru_mammals(sp_cases) # Should match exactly