Rpadrino: An R package to access and use PADRINO , an open access database of Integral Projection Models

Methods in Ecology and Evolution(2022)

引用 0|浏览1
暂无评分
摘要
Demography provides an excellent approach to examine the ecology (Crone et al., 2011), evolutionary biology (Metcalf & Pavard, 2007), and conservation biology of any species (Doak & Morris, 2002). Environmental conditions and biotic interactions influence vital rates (e.g. survival, development, and reproduction) across the entire life cycle, which then govern its short-term and long-term performance (Caswell, 2001). A variety of methods exist for combining vital rates into demographic models; discrete-time, structured population models are among the most popular (Caswell, 2001; Crone et al., 2011). Indeed, there is a rich history of using such structured population models across a variety of sub-disciplines in ecology (e.g. Adler et al., 2010; Caswell, 2001; Easterling et al., 2000; Ellner et al., 2016). In ecology, matrix projection models (MPMs) are the most widely used structured population model. MPMs divide the population into discrete classes corresponding to some trait value (e.g. developmental state, age, or size), and then model the population using vital rates computed for each class. Researchers have also recognized that, for some species, vital rates are best predicted as a function of one or more continuous traits (e.g. size, height, mass), rather than as a function of discrete classes (Easterling et al., 2000). Integral projection models (IPMs), which are continuously structured population models, have become an increasingly important tool for ecologists interested in addressing broad biological questions through a demographic lens (Gonzalez et al., 2021). IPMs combine vital rate functions of continuous traits into projection kernels, which describe how the abundance and distribution of trait values in a population change in discrete time (Easterling et al., 2000). IPMs have been used to investigate a variety of topics, such as invasive species spread (e.g. Erickson et al., 2017; Jongejans et al., 2011), evolutionary stable strategies (e.g. Childs et al., 2004), the effect of climate drivers on population persistence (Compagnoni, Pardini, & Knight, 2021; Salguero-Gómez et al., 2012), and linking evolutionary feedbacks to population dynamics (Coulson et al., 2011). In order to reconstruct and use an IPM, researchers need, at a minimum, the symbolic representation of the model and the associated parameter values. Existing demographic databases enter transition values directly, rather than a symbolic version of the model and the values associated with the symbols separately. For example, COMPADRE and COMADRE store transition matrices as numeric matrices (sub-matrices corresponding to survival and development ( U ), sexual reproduction ( F ), asexual reproduction ( C ), and their sum ( A ), rather than symbolic matrices with parameter values separately. In general, this data format limits the variety of potential analyses, because individual matrix elements may be composed of multiple vital rates and this information is lost by storing only the resulting values (i.e. the elements of F may be comprised of both probability of reproducing and the per-capita number of propagules produced). To avoid this issue for IPMs, one needs to reconstruct the IPM using the functional form of the kernels and vital rates, as well as the associated parameter estimates. One can use tools that associate the symbols with their values to accomplish this task (e.g. metaprogramming and rlang, Henry & Wickham, 2021). ipmr is an R package for users to interactively develop their own IPMs from symbolic model representations and parameter estimates, and perform downstream analyses (Levin et al., 2021). Rpadrino extends this framework to include reconstructing previously published IPMs that are stored in the PADRINO database. Here, we introduce Rpadrino. Rpadrino provides access to PADRINO, an open access database of IPMs. Specifically, PADRINO houses symbolic representations of IPMs, their parameter values, and associated metadata to aid users in selecting appropriate models. Rpadrino is an R package that enable users to download PADRINO, manage the dataset locally, modify, reconstruct, and analyse IPMs from PADRINO. In the following, we describe how to interact with PADRINO using Rpadrino and discuss future directions for Rpadrino and PADRINO. We also provide two case studies that demonstrate (a) how to use PADRINO and Rpadrino to reconstruct published IPMs, conduct perturbation analyses, compute some life cycle events, and troubleshoot problems, and (b) how to use Rpadrino and ipmr to combine PADRINO IPMs with user-specified IPMs, and then how to use PADRINO data with other databases, using BIEN (Maitner et al., 2017) and COMPADRE (Salguero-Gómez et al., 2014) as examples. The latter is intended to demonstrate the potential for Rpadrino in broad, interoperable, macro-ecological applications. Finally, our supplementary materials also contain a detailed overview of the PADRINO database, along with the associated assumptions and challenges. Before introducing Rpadrino, we provide a brief overview of PADRINO. PADRINO is an open-access database of integral projection models. PADRINO defines a syntax to symbolically represent IPMs as text strings, and stores the values of those symbols in separate tables. The syntax used is very similar to the mathematical notation of IPMs and is largely ‘language-agnostic’ (i.e. aims to avoid idiosyncrasies of specific programming languages). For example, a survival/growth kernel with the form P z ′ , z = s z * G z ′ , z would be P = s * G in PADRINO's syntax. G z ′ , z = f G z ′ | μ g z , σ G (where f G denotes a normal probability density function) becomes G = Norm(mu_g, sd_g). This notation should be translatable to many computing languages beyond just R (e.g. Python or Julia). Additionally, PADRINO stores extensive metadata to help researchers find IPMs that work for their questions. A more complete description of the database, how IPMs are digitized, and the associated challenges is available in the ESM and the project webpage (https://padrinodb.github.io/Padrino/, Table 1, Appendix, Tables S1 and S2). Rpadrino is an R package that contains functions for downloading the PADRINO database, data querying and management, modifying existing functional forms and parameter values, and reconstructing models. Model reconstruction is powered by the ipmr R package (Levin et al., 2021). While users do not need to know how to use ipmr to use Rpadrino, the two packages are designed to work with and enhance each other. This means that users can combine IPMs reconstructed with Rpadrino with IPMs of their own constructed with ipmr in a single, coherent analysis (case study 2). Furthermore, users can go from downloading the database to reconstructing IPM objects in as little as 3 function calls. A more in depth workflow is provided below. The flexibility of IPMs and their broad application across ecology, evolution, and conservation biology mean that there is no fixed set of steps in a workflow using Rpadrino. However, there are generally four steps that a researcher must take when using Rpadrino. The first step is to identify studies of interest (Figure 1, Step 1a), and, optionally, augment PADRINO's metadata with additional information from other sources (e.g. environmental data, GBIF, Figure 1, Step 1). Rpadrino represents PADRINO objects as a list of data.frames (referred to as tables in subsequent text). Rpadrino uses the shared ipm_id column across all tables to track information related to each IPM. Therefore, subsetting relies on identifying the correct ipm_ids, and then using those to select the IPMs of interest (Box 1, case study 1 and 2). data.frames should be familiar to most R users, and the ability to modify them should readily accommodate the range of further analyses that researchers may be interested in. Users may augment any table with additional information corresponding to, for example, spatial or temporal covariates from other open access databases. Furthermore, Rpadrino provides numerous access functions for metadata that streamline subsetting (Box 1). The second step in the Rpadrino workflow is to construct a list of proto_ipm objects using pdb_make_proto_ipm() (Figure 1, Box 1). This function translates PADRINO's syntax into ipmr code, and then builds a proto_ipm object for each unique ipm_id. For some models, users may choose to create deterministic or stochastic IPMs at this step. Rpadrino's default behaviour is to generate deterministic models whenever possible. This behaviour encompasses instances where authors generated models with no time or space varying parameters, and where authors included discretely varying environments. The latter can be implemented as deterministic models because all parameter values are known before the IPM is built. IPMs with continuous environmental variation require sampling the environment at each model iteration, usually by sampling from distributions randomly. These are always considered stochastic models. This is also the step where, if needed, users should combine their own proto_ipm's produced by ipmr with the proto_ipm's produced by Rpadrino. The third step in the Rpadrino workflow is creating IPM objects with pdb_make_ipm() (Figure 1, Box 1). pdb_make_ipm() uses ipmr's make_ipm() function to build IPM objects. Users may specify additional options to pass to make_ipm() (e.g. normalize the population size to always equal 1, return the vital rate function values as well as the sub-kernels and population state). The various arguments users can modify are described in the ipmr documentation for make_ipm(). The fourth and final step in an Rpadrino workflow is to conduct the analyses of interest (Figure 1, Box 1). Rpadrino provides functions to extract per-capita growth rates, eigenvectors (Caswell, 2001; Ellner et al., 2016, Ch. 2, demonstrated in Box 1), assess convergence to asymptotic dynamics (Caswell, 2001), compute mean kernels for stochastic IPMs (Ellner et al., 2016, Ch 7), and modify existing IPMs with new parameter values and functional forms. Additionally, the documentation on the Rpadrino website (https://padrinODB.github.io/Rpadrino/index.html) and the Supplementary Materials for this paper contain details on how to conduct more complicated analyses with IPM objects (e.g. perturbation analyses (Ellner et al., 2016, Ch 4), size at death calculations (Metcalf et al., 2009)). The package documentation and the recent publication describing ipmr also contain code demonstrating analyses on single IPM objects (Levin et al., 2021). These can be extended via the apply family of functions. # Install and load the CRAN version: # install.packages(“Rpadrino”) library (Rpadrino) # Step 1 from main text ----- # pdb_download() downloads a copy PADRINO. We can specify a path to save the # downloaded database using `save = TRUE` and # `destination = ‘path/to/file/’`. We'll call the object we create ‘pdb’, # which is short for Padrino DataBase . pdb < − pdb_download(save = FALSE) # We can use Rpadrino's metadata accessors to get a selection of ipm_ids # that we want to use. For this example, we'll select models for Carpobrotus # species and Geum radiatum. First, extract the ‘species_accepted’ column . # The output of this will be named, and the names are the ipm_id associated # with each piece of metadata. Thus, we can subset the names of the ‘spps’ # object to get the ipm_ids we need . spps <− pdb_species_accepted(pdb) ids < − names(spps)[spps %in% c(“Carpobrotus_spp”, “Geum_radiatum”)] # Step 2 from main text ----- # Next, we create a list of proto_ipm's using pdb_make_proto_ipm() . my_proto_ipms <− pdb_make_proto_ipm(pdb, ipm_id = ids) # Step 3 from main text ----- # After creating the proto_ipm list, we can call pdb_make_ipm() to construct # actual IPM objects . my_ipms <− pdb_make_ipm(my_proto_ipms) # Step 4 from main text ----- # After re-building our published IPMs, the next step is to analyse them . # In this case, we'll just extract the asymptotic population growth rates, # stable size distribution, and the reproductive values. Note that for the # Geum IPMs, there are multiple year-specific values that are returned . # All values related to population-level traits are computed via iteration, # as this approach handles more complicated IPM systems more efficiently # than eigenvector/eigenvalue based approaches for larger IPMs, and # introduces little to no additional computation time for simpler and/or # smaller IPMs . lambdas <− lambda(my_ipms) ssds <− right_ev(my_ipms, iterations = 150, tolerance = 1e-7) repro_vs < − left_ev(my_ipms, iterations = 150, tolerance = 1e-7). The first step in using Rpadrino is to install and load the package. After that, we can use Rpadrino to download PADRINO and, optionally, save it locally on our computer. Once the data are downloaded, we can make use of Rpadrino's metadata accessor functions to quickly select models that meet our criteria (step 1). The concept of the ipm_id is explained in greater detail in the Appendix of this manuscript. The next step is to use these ipm_ids to create a list of proto_ipm's using pdb_make_proto_ipm() (step 2). After this step, we can create actual IPM objects using pdb_make_ipm() (step 3). Once IPM objects are created, the following steps are according to the demands of the research question. In this case, asymptotic population growth rates, stable size distributions, and reproductive values are extracted (step 4). Note that since the Geum radiatum model includes a number of year-specific estimates, multiple values are generated for each quantity we want to extract. The concise representation and reconstruction of models such as this is powered by ipmr's parameter set index notation, which is described in greater detail on the package website (https://levisc8.github.io/ipmr/articles/index-notation.html). However, users do not need to be familiar with this notation unless they wish to modify the IPM in question (see case study 1 for an example of modifying PADRINO IPMs with Rpadrino). There are numerous challenges associated with reproducing published IPMs. Challenges related to digitizing and storing IPMs are discussed in the ESM. Important challenges remain in the reconstruction of IPMs. Semi- or non-parametric models may be used to generate IPMs whose functional form is not known a priori. We have not yet developed a general syntax for representing these models in PADRINO, though work is ongoing. Additionally, ipmr is not yet able to handle two-sex models (e.g. Stubbered et al., 2019), time-lagged models (e.g. Kuss et al., 2008), or periodic models (e.g. Letcher et al., 2014). These types of IPMs do not yet represent a substantial portion of the literature. Nonetheless, it is our intention to continue developing functionality to accommodate them in future releases of Rpadrino, ipmr and PADRINO. Rpadrino presents unique opportunities for synthesis in both theoretical and applied contexts. The expanded range of phylogenetic and geographical coverage can be used in conjunction with other demographic databases (e.g. COM(P)ADRE (Salguero-Gómez et al., 2014; Salguero-Gómez et al., 2016), popler (Compagnoni et al., 2019), DatLife (DatLife, 2021)) to power larger scale syntheses than were possible before (e.g. Compagnoni, Levin, et al., 2021). For example, one could use IPMs from PADRINO and matrix population models from COMPADRE and COMADRE to create life tables (Jones et al., 2021), which could then be combined with life tables from DATLife for further analysis (e.g. Jones et al., 2014). The intermediate life table conversion steps may not be necessary, as many of the same life history traits and population level parameters may be calculated from all of these models (Caswell, 2001; Ellner et al., 2016). Furthermore, recent publications combine biotic and abiotic interactions into demographic models providing a robust theoretical toolbox for exploring species responses to environmental drivers such as climate change (e.g. Abrego et al., 2021; Simmonds et al., 2020). Rpadrino also provides functionality to modify parameter values and functional forms of the IPMs it stores, giving theoreticians a wide array of realistic life histories to experiment with. These features will enable researchers to carry out more detailed and comprehensive analyses at various spatial, temporal, and phylogenetic scales. The examples given here are far from an exhaustive list, but hopefully demonstrates the potential for this new tool in demography, ecology and evolutionary biology (Table 1). S.C.L. designed ipmr and Rpadrino with contributions from all authors, and S.C.L. implemented the packages; S.E., T.P., M.P.G., and S.C.L. entered the data into PADRINO; S.C.L. wrote the first draft of the manuscript and all authors provided comments. We thank L Sfedu for help with designing the figures, and the associate editor and two anonymous reviewers for comments that greatly improved this manuscript. R.S.-G. was supported by a NERC Independent Research Fellowship (NE/M018458/1). S.C.L., A.C., S.E. and T.M.K. were funded by the Alexander von Humboldt Foundation in the framework of the Alexander von Humboldt Professorship of T.M.K. The authors declare no conflict of interest. The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13910. PADRINO (Levin et al., 2022a, 2022b) is available via the Rpadrino R package, as well as on Github (https://github.com/padrinoDB/Padrino) and Zenodo (https://zenodo.org/badge/latestdoi/109448718). Rpadrino (Levin et al., 2022a, 2022b) is available on CRAN (https://cran.r-project.org/package=Rpadrino), Github (https://github.com/padrinoDB/Rpadrino), and Zenodo (https://zenodo.org/badge/latestdoi/124245125). pdbDigitUtils (Levin et al., 2022a, 2022b) is available on Github (https://github.com/padrinoDB/pdbDigitUtils) and Zenodo (https://zenodo.org/badge/latestdoi/348737812). There is no other data associated with this paper. Appendix S1 Case study 1 Appendix S2 Case study 2 Appendix S3 Supporting information Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
更多
查看译文
关键词
models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要