Get Started
Before You Start
Using the Framework
About the Framework |
Explore
Adaptation Objectives
Data …by topic
…by type
|
Create |
Community |
Workspace |
Here is a sample of modeling approaches commonly used to estimate and map species distribution. Practitioners are also encouraged to search for and use existing species distribution maps. However, when using currently available species distribution maps it is important for the practitioners to understand and report the data and methods used to generate the maps.
Generalized linear model (GLM) approaches use least squares methods to fit the relationship between the mean of the response variable and the linear combination of the explanatory variables. The response variables for distribution models are usually represented with simple species presence, presence–absence or abundance observations at geographic locations based on random or stratified field sampling, expert opinion, or observations obtained opportunistically. Explanatory variables in this approach represent environmental data that are assumed to directly or indirectly effect on species (Austin 2007). The assumed relationship between the response and explanatory variables are defined with one of several link functions describing the probability distributions (e.g., normal, Poisson, negative binomial, or gamma distribution)(Guisan et al. 2002).
Logistic regression is a special kind of GLM used to evaluate how a suite of environmental variables predict the presence of a species. The species data are summarized into binomial response (presence or absence) for each sampled area. The logistic regression model constrains the probability of presence and absence between zero and 1 with a logit link function and assumes the error term has a binomial distribution.
Both GLM and logistic regression model require field observations and measurements of environmental factors expected to influence organism's distribution. The environmental data can be collected with in situ field sampling methods or with remote sensing methods.
GLM (and logistic regression) has long been used in biological research for a wide breath of studies to estimate species' distribution (Guisan and Thuiller 2005). The approach allows much flexibility in selecting the environmental data. This approach allows researchers to test several working hypotheses by using maximum likelihood methods to determining the most parsimonious model that best fits the observed data.
GLM models are dependent upon the quality of data and the structure of the candidate models developed by the researcher. The logistic regression analysis is dependent upon the assumption that a species does not occur where it is deemed absent (as opposed to being present but undetected). There may be uncertainty about the whether or not locations were sampled extensively enough to verify that individuals are not present.
Also, spurious results can occur if the environmental variables that have little or no influence on the response variable (Burnham and Anderson 2002).
Linear regressions are parametric statistical analysis limited by the following four main assumptions:
Regression models are relatively easy to construct, run, and interpret with the help of many statistical packages (e.g., SAS and R).
An occupancy modeling approach estimates the distribution or proportion of geographical locations occupied by a species (MacKenzie et al. 2002). Since the probability of observing a species can be < 1 when the species is present, the occupancy model also incorporates the probability of detecting the species within a site along with allowing the probability to vary as a function of site characteristics, time, or environmental variables (MacKenzie et al. 2002). With multiple site visits to detect the species, this approach estimates the probability that a species will be detected at site given a likelihood that it is present (Mackenzie et al. 2005).
A species' occupancy within a site and distribution between sites involves multiple visits to sites when a species may be detectable (MacKenzie et al. 2002). For this approach sites may represent discrete habitat patches or sampling units (e.g., quadrats) regularly visited as part of a large-scale monitoring program. Each survey is conducted on discrete time periods where an investigator records if the species was present or absent at each occasion. The set of detection histories for each site is used to estimate the proportion of sites occupied by the species. Investigators can also collect site-level characteristics (e.g., area and dominant vegetation) and environmental variables expected to influence probability of detecting organisms (e.g., weather conditions and time of sampling).
The occupancy modeling approach can be implemented more easily and less expensively than the methods used for abundance estimation. In addition, occupancy modeling can be applied to large-spatial extent monitoring programs to determine a species spatial distribution throughout a region. Covariates expected to influence detection or occupancy can be easily included in the occupancy model to account for the heterogeneity in probability detection and varied occupancy by site. Missed sampling events can be adjusted for by slightly modifying the maximum likelihood model that estimates likelihood of presence.
One of the main weaknesses in this method is the requirement of many visits to a single site. For some study systems it may be logistically difficult and time consuming. Habitat patches need to be delineated by the investigator. But, increasing the number of visits per site improves the precision of the estimated occupancy rate, and the resulting increase in information improves the accuracy of the estimate when detection probabilities are low (MacKenzie et al. 2002). This approach only provides information on occupancy of a patch and no information about the population dynamics or abundance of the species in the patch. Therefore, it is difficult to use these data to speculate on the viability of the population.
Key assumptions for the occupancy modeling approach include (MacKenzie et al. 2002):
The development of the occupancy modeling approach, has lead to detailed documentation describing sampling procedures and analysis (Mackenzie et al. 2005). In addition, a freely downloadable program PRESENCE is available to analyze the data (version 3.1)
A maximum entropy (MaxEnt) modeling approach using a machine-learning algorithm to predict a species' geographic distribution based on locations of known occurrences and layers of environmental data (Elith et al. 2006, Phillips et al. 2006). The maximum entropy modeling approach estimates the species distribution by finding the maximum entropy (i.e., closest to uniform) distribution, constrained by the environmental data associated with species known locations (Phillips et al. 2006).
Maximum entropy modeling requires two types of input data, the geographic coordinates of species occurrences and geographically explicit environmental variables likely to influence the distribution of a species at the relevant spatial and temporal scale (Phillips et al. 2006). Occurrence locations only need to represent presence only records (e.g., natural history museum or herbarium) records and at least 50 to 100 occurrence locations are recommended to obtain predictions close to optimal distribution (Phillips et al. 2006).
There are many advantages to using the Maximum entropy approach when modeling species distribution (Phillips et al. 2006):
Maps (geo-referenced data) of the environmental variables or covariates representing environmental conditions need to be available for the entire landscape. In addition, the environmental variables and the species occurrence locations should be measured for similar time periods (Phillips et al. 2006). The number of environmental variable used in conjunction with the occurrence locations may not be sufficient to describe the species distribution. The occurrence locations may be biased, spatially auto-correlated, or sampling intensity/methods may have widely varied across the study area (Phillips et al. 2006). For example, museum samples may have been collected near roads and within a small segment of the population. There could also be errors when recording the occurrence locations or the species may have been misidentified during field observations.
Basic knowledge of GIS is needed to ensure that all environmental data have the same format (projection, extent, and resolution). A freely downloadable program Maxent is available to analyze the occurrence locations and environmental grids.
A resource selection probability function (RSPF) is a mathematical function that predicts a species use of resources or habitats relative to availability of the resources or habitats (Manly et al. 2002)—hence a habitat suitability measure. The approach uses species occurrence location data to estimate where habitat use exceeds availability. RSPF can take many mathematical forms (Manly et al. 2002) but logistic regression is the most common form used to estimate habitat suitability.
There are three data requirements to estimate RSPF:
RSPF are flexible enough to parameterize the environmental data with a wide range of functional relationships (e.g., polynomial terms and interactions). This approach easily fits into maximum likelihood framework with model selection to determine which environmental variables influence species distribution. A RSPF approach allows researchers to easily interpret environmental variables estimated in the "best" model. The analysis can be conducted at multiple ecological levels (individual, populations, or species).
Model output can be sensitive to sampling of available locations in relation to observed used locations.
Basic knowledge of GIS. A GIS tool to execute RSPF has been developed by Yellowstone Ecological Research Center.
A commonly used multivariate modeling approach to estimate species distribution is Mahalanobis distance (MD). MD is a dimensionless measure of dissimilarity by representing the standard squared distance between a set of environmental variables and ideal habitat quality (Clark et al. 1993). A distance threshold is then used to define the boundary of the species distribution (Tsoar et al. 2007). When mapping species distribution in relation to habitat quality, the MD metric can be used to rank each cell in the habitat map relative to a statistical description of habitats used by a species. Each cell on the MD habitat map is relative to the vector describing the multivariate characteristics of habitats at cells where the species was located.
Species occurrence data. Environmental data expected to influence species distribution.
Environmental variables can be correlated and the assumption of multivariate normality does not have to be met because MD creates new and uncorrelated variables (Clark et al. 1993, Knick and Dyer 1997). Environmental data can be continuous or categorical.
The MD approach assumes that the species is distributed optimally at the mean environmental conditions, and that any deviation from the mean (optimal) conditions is associated with lower suitability (Farber and Kadmon 2003). Similar to many multivariate analysis, it may be difficult to interpret how the environmental variables directly relate to species distribution.
A surface representing Mahalanobis distance for species distribution can be calculated with statistical software (e.g., R or SAS). This analysis can also be conducted using multivariate statistical software such as PCord.
When data are limited investigators may consult groups of experts to subjectively delineate species distributions or define environmental features that influence species distributions. Expert opinion can be incorporated into species distribution modeling by providing input into data preparation, identifying suspect records of species occurrences, selecting relevant environmental features influencing species distribution, developing various models, or by grouping vegetation into habitat suitability classes (Pearce et al. 2001).
This approach requires limited field data collection. However, it is a time consuming process of identifying and interviewing experts for various ecosystems or species. Published literature (peer-reviewed articles and reports) should also be reviewed to supplement expert opinion information. If expert opinion models are displayed spatially then all relevant environmental features are needed in a spatial data layers such as grids or vectors.
Since there is little or no field data collection this method is relatively inexpensive. For a few species and ecosystems, experts are available with extensive knowledge based on decades of field experience.
There is limited publishing information or available expert knowledge for many rare and federally protected species. When experts are available, the degree of their expertise may be difficult to evaluate and it can be difficult to standardize interview techniques. While it is a cost-effective approach with regard to limited field data collection, incorporating expert opinion into distribution modeling can be a slow and tedious process and is usually performed on a species-by-species basis (Seoane et al. 2005). Distribution models created from expert opinion are rarely validated with independent data. Therefore, a high level of uncertainty is present in the model until observations confirm the presence of the species in relation to environment.
To address some of the uncertainties with varying expert opinions, species distribution models created with expert opinion can be subject to a pairwise comparison technique (Analytic Hierarchy Process) developed by Saaty (Saaty T.L., 1980), whereby experts rank the relative importance of each variable in a pair using a continuous scale. For example, each expert selects the variable deemed to be more important in each of pairwise comparisons and rank how important the selected variable is, compared with the others, on a scale of 1 (equally important) to 9 (extremely more important). The pairwise comparisons are transformed into a matrix of ranks based on the Analytic Hierarchy Process model. Those ranks can be calculated by averaging the survey scores of all respondents for each pairwise comparison to represent the relative importance of each variable against another variable.
Beyond locating and interviewing experts, modeling species distribution with expert opinion usually requires GIS knowledge to compile a map overlaying relevant environmental features expected to influence species distribution.