| Title: | A Glimpse at the Diversity of Peru's Endemic Plants |
|---|---|
| Description: | Introducing a novel and updated database showcasing Peru's endemic plants. This meticulously compiled and revised botanical collection encompasses a remarkable assemblage of over 7,898 distinct species. The data for this resource was sourced from the work of Govaerts, R., Nic Lughadha, E., Black, N. et al., titled 'The World Checklist of Vascular Plants: A continuously updated resource for exploring global plant diversity', published in Sci Data 8, 215 (2021) <doi:10.1038/s41597-021-00997-6>. |
| Authors: | Paul E. Santos Andrade [aut, cre] (ORCID: <https://orcid.org/0000-0002-6635-0375>), Lucely L. Vilca Bustamante [aut] (ORCID: <https://orcid.org/0000-0002-5559-1296>) |
| Maintainer: | Paul E. Santos Andrade <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-05-16 05:18:27 UTC |
| Source: | https://github.com/PaulESantos/ppendemic |
This function checks if a list of species names are endemic in the ppendemic database. The function allows fuzzy matching for species names with a maximum distance threshold to handle potential typos or variations in species names.
is_ppendemic( splist, max_dist = 2, save_ambiguous = FALSE, ambiguous_path = "ambiguous_genera.csv" )is_ppendemic( splist, max_dist = 2, save_ambiguous = FALSE, ambiguous_path = "ambiguous_genera.csv" )
splist |
A character vector containing the list of species names to be checked for endemic in the ppendemic database. |
max_dist |
Maximum edit distance used in fuzzy matching steps. Defaults to 2. |
save_ambiguous |
Logical flag. If |
ambiguous_path |
File path used when |
A character vector indicating if each species is endemic or not endemic.
is_ppendemic(c("Aa aurantiaca", "Aa aurantiaaia", "Werneria nubigena"))is_ppendemic(c("Aa aurantiaca", "Aa aurantiaaia", "Werneria nubigena"))
The ppendemic_tab14 dataset is a tibble (data frame) that provides easy access to a comprehensive database of Peru's endemic plant species. It contains a total of 7,898 records with essential botanical information, including the accepted name, accepted family, genus, species, infraspecific information, taxon authors, primary author, place of publication, volume and page, publication years, and version details.
ppendemic_tab14ppendemic_tab14
A tibble (data frame) with 7,898 rows and 18 columns:
Character vector. The accepted name of the endemic plant species.
Character vector. The taxonomic status of the species (e.g., "Accepted").
Character vector. The family of the accepted name of the endemic plant species.
Character vector. The genus of the endemic plant species.
Character vector. The specific epithet of the endemic plant species.
Character vector. The infraspecific rank (e.g., "subsp.", "var.") when applicable.
Character vector. The infraspecific epithet when applicable.
Character vector. The author(s) of the accepted name of the endemic plant species.
Character vector. The primary author(s) of the publication containing the endemic plant species information.
Character vector. The place of publication of the endemic plant species information.
Character vector. The volume and page number of the publication containing the endemic plant species information.
Character vector. The first published year of the publication containing the endemic plant species information.
Numeric vector. The actual year of publication extracted from first_published.
Numeric vector. The nominal year of publication extracted from first_published.
Character vector. Both actual and nominal years when different, extracted from first_published.
Logical vector. Indicates whether the actual and nominal publication years differ (TRUE when both_years contains the pattern "YYYY|YYYY").
Character vector. The version identifier "V-14" of the ppendemic database.
Character vector. The version date "28-05-2025" indicating when this version was created.
The dataset provides a curated and up-to-date collection of Peru's endemic plant species, gathered from reputable botanical sources and publications. The data for this database was extracted and compiled from the World Checklist of Vascular Plants (WCVP) database, which is a comprehensive and reliable repository of botanical information.
This version (ppendemic_tab14) includes enhanced temporal information with separate numeric fields for actual and nominal publication years. This allows for more precise bibliographic tracking and citation accuracy. The dataset also includes improved infraspecific taxonomy handling with dedicated fields for ranks and epithets.
The year extraction process uses sophisticated pattern matching to distinguish between actual publication years and nominal years, with the has_different_years field automatically flagging records where these differ. This is particularly important for historical botanical publications where publication delays were common.
The dataset has been carefully compiled and updated to offer the latest insights into Peru's endemic plant species. The data is sourced from the World Checklist of Vascular Plants (WCVP) database, an international collaborative programme initiated in 1988 by Rafaƫl Govaerts that provides high-quality expert-reviewed taxonomic data on all vascular plants.
For detailed methodology, see Govaerts et al. (2021) "The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity" in Nature Scientific Data.
# Load the package library(ppendemic) # Access the dataset data("ppendemic_tab14") # View the structure of the dataset str(ppendemic_tab14) # View first few rows head(ppendemic_tab14) # Check for species with different actual and nominal years different_years <- subset(ppendemic_tab14, has_different_years == TRUE) nrow(different_years) # View records with both years information head(ppendemic_tab14$both_years[ppendemic_tab14$has_different_years])# Load the package library(ppendemic) # Access the dataset data("ppendemic_tab14") # View the structure of the dataset str(ppendemic_tab14) # View first few rows head(ppendemic_tab14) # Check for species with different actual and nominal years different_years <- subset(ppendemic_tab14, has_different_years == TRUE) nrow(different_years) # View records with both years information head(ppendemic_tab14$both_years[ppendemic_tab14$has_different_years])
The ppendemic_tab15 dataset is a tibble (data frame) providing a curated
and taxonomically validated list of vascular plant species that occur exclusively
in Peru. The dataset is derived from version 15 of the World Checklist of Vascular
Plants (WCVP), facilitated by the Royal Botanic Gardens, Kew, and corresponds to
the extraction performed on 06 January 2026.
ppendemic_tab15ppendemic_tab15
A tibble (data frame) with 7,892 rows and 18 columns:
Character vector. The full scientific name of the accepted endemic species (binomial or trinomial), constructed from genus, species and, where applicable, infraspecific epithets, following WCVP standards.
Character vector. The taxonomic status of the name according
to WCVP (e.g., "Accepted").
Character vector. The botanical family to which the endemic taxon belongs, following WCVP family circumscription.
Character vector. The genus name of the endemic taxon.
Character vector. The specific epithet of the endemic taxon.
Character vector. The infraspecific rank (e.g.,
"subsp.", "var.") when applicable.
Character vector. The infraspecific epithet when applicable.
Character vector. The standardized authorship of the taxon name, concatenating parenthetical and primary authors following botanical nomenclature conventions.
Character vector. The author(s) who validly published the scientific name.
Character vector. The journal, book or other publication in which the taxon name was effectively published.
Character vector. The volume and page reference of the
original publication of the taxon name (e.g., "13(2): 99").
Character vector. The year of publication of the name, enclosed in parentheses, as reported by WCVP.
Numeric vector. The actual year of publication extracted
from first_published.
Numeric vector. The nominal year of publication extracted
from first_published.
Character vector. A concatenation of actual and nominal years
when these differ (format "YYYY|YYYY").
Logical vector. Indicates whether the actual and nominal publication years differ.
Character vector. The database version identifier "V-15",
corresponding to WCVP version 15.
Character vector. The extraction date of the WCVP source
dataset ("06-01-2026").
This database contains only accepted endemic species names, following the taxonomic backbone of WCVP and filtered by geographic distribution records indicating occurrence restricted to Peru.
The dataset was constructed from the World Checklist of Vascular Plants (WCVP) version 15, a continuously updated global taxonomic resource curated by the Royal Botanic Gardens, Kew. Only taxa of rank species and below with accepted status were considered.
Geographic distribution fields from WCVP were used to identify taxa whose native range is restricted exclusively to Peru. Introduced, cultivated, misapplied, synonymic and unplaced names were excluded.
The nomenclature, authorship, family assignment and publication details strictly follow the International Code of Nomenclature for algae, fungi and plants (ICN) and WCVP editorial standards.
Temporal metadata was processed to extract both nominal and actual publication years, allowing detailed historical bibliographic analysis of Peruvian endemic flora.
Data derived from:
Govaerts, R. (ed.). 2026. WCVP: World Checklist of Vascular Plants. Facilitated by the Royal Botanic Gardens, Kew. Version 15. Extracted 06 January 2026. https://doi.org/10.34885/rvc3-4d77
For methodological details:
Govaerts, R., Nic Lughadha, E., et al. (2021). The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity. Scientific Data, 8, 215. https://doi.org/10.1038/s41597-021-00997-6
Royal Botanic Gardens, Kew (2026). World Checklist of Vascular Plants (WCVP), Version 15. https://wcvp.science.kew.org