Your workspace is your dashboard for accessing and managing your content, bookmarks, and groups, as well as viewing messages and seeing your recently viewed content.

You need to be signed in to access your workspace.

Yale Framework |
Approaches and Tools for Conducting Assessments of Species Distributions

Approaches and Tools for Conducting Assessments of Species Distributions

Here is a sample of modeling approaches commonly used to estimate and map species distribution. Practitioners are also encouraged to search for and use existing species distribution maps. However, when using currently available species distribution maps it is important for the practitioners to understand and report the data and methods used to generate the maps.

Generalized linear model and logistic regression model

Overview

Generalized linear model (GLM) approaches use least squares methods to fit the relationship between the mean of the response variable and the linear combination of the explanatory variables. The response variables for distribution models are usually represented with simple species presence, presence–absence or abundance observations at geographic locations based on random or stratified field sampling, expert opinion, or observations obtained opportunistically. Explanatory variables in this approach represent environmental data that are assumed to directly or indirectly effect on species (Austin 2007). The assumed relationship between the response and explanatory variables are defined with one of several link functions describing the probability distributions (e.g., normal, Poisson, negative binomial, or gamma distribution)(Guisan et al. 2002).

Logistic regression is a special kind of GLM used to evaluate how a suite of environmental variables predict the presence of a species. The species data are summarized into binomial response (presence or absence) for each sampled area. The logistic regression model constrains the probability of presence and absence between zero and 1 with a logit link function and assumes the error term has a binomial distribution.

Data Requirements

Both GLM and logistic regression model require field observations and measurements of environmental factors expected to influence organism's distribution. The environmental data can be collected with in situ field sampling methods or with remote sensing methods.

Strengths

GLM (and logistic regression) has long been used in biological research for a wide breath of studies to estimate species' distribution (Guisan and Thuiller 2005). The approach allows much flexibility in selecting the environmental data. This approach allows researchers to test several working hypotheses by using maximum likelihood methods to determining the most parsimonious model that best fits the observed data.

Weaknesses/Assumptions

GLM models are dependent upon the quality of data and the structure of the candidate models developed by the researcher. The logistic regression analysis is dependent upon the assumption that a species does not occur where it is deemed absent (as opposed to being present but undetected). There may be uncertainty about the whether or not locations were sampled extensively enough to verify that individuals are not present.

Also, spurious results can occur if the environmental variables that have little or no influence on the response variable (Burnham and Anderson 2002).

Linear regressions are parametric statistical analysis limited by the following four main assumptions:

Each environmental variable's error is assumed to be identically and independently distributed
The variance of the response variable is constant across observations
Each environmental variable's error are assumed to follow the selected link functions describing the probability distribution
The regression function is linear in the predictors

Capacity Needed (construct and run model)

Regression models are relatively easy to construct, run, and interpret with the help of many statistical packages (e.g., SAS and R).

Occupancy models

Overview

An occupancy modeling approach estimates the distribution or proportion of geographical locations occupied by a species (MacKenzie et al. 2002). Since the probability of observing a species can be < 1 when the species is present, the occupancy model also incorporates the probability of detecting the species within a site along with allowing the probability to vary as a function of site characteristics, time, or environmental variables (MacKenzie et al. 2002). With multiple site visits to detect the species, this approach estimates the probability that a species will be detected at site given a likelihood that it is present (Mackenzie et al. 2005).

Data Requirements

A species' occupancy within a site and distribution between sites involves multiple visits to sites when a species may be detectable (MacKenzie et al. 2002). For this approach sites may represent discrete habitat patches or sampling units (e.g., quadrats) regularly visited as part of a large-scale monitoring program. Each survey is conducted on discrete time periods where an investigator records if the species was present or absent at each occasion. The set of detection histories for each site is used to estimate the proportion of sites occupied by the species. Investigators can also collect site-level characteristics (e.g., area and dominant vegetation) and environmental variables expected to influence probability of detecting organisms (e.g., weather conditions and time of sampling).

Strengths

The occupancy modeling approach can be implemented more easily and less expensively than the methods used for abundance estimation. In addition, occupancy modeling can be applied to large-spatial extent monitoring programs to determine a species spatial distribution throughout a region. Covariates expected to influence detection or occupancy can be easily included in the occupancy model to account for the heterogeneity in probability detection and varied occupancy by site. Missed sampling events can be adjusted for by slightly modifying the maximum likelihood model that estimates likelihood of presence.

Weaknesses/Assumptions

One of the main weaknesses in this method is the requirement of many visits to a single site. For some study systems it may be logistically difficult and time consuming. Habitat patches need to be delineated by the investigator. But, increasing the number of visits per site improves the precision of the estimated occupancy rate, and the resulting increase in information improves the accuracy of the estimate when detection probabilities are low (MacKenzie et al. 2002). This approach only provides information on occupancy of a patch and no information about the population dynamics or abundance of the species in the patch. Therefore, it is difficult to use these data to speculate on the viability of the population.

Key assumptions for the occupancy modeling approach include (MacKenzie et al. 2002):

Sites are closed to changes in occupancy during sampling (i.e., closed system). Sites are occupied by the species of interest for the duration of the survey period, with no new sites becoming occupied after surveying has begun, and no sites abandoned before the cessation of surveying
Detection of the species at a site is also assumed to be independent of detecting the species at all other sites.
Species are never falsely detected at a site when absent, and a species may or may not be detected at a site when present.

Capacity needed

The development of the occupancy modeling approach, has lead to detailed documentation describing sampling procedures and analysis (Mackenzie et al. 2005). In addition, a freely downloadable program PRESENCE is available to analyze the data (version 3.1)

Maximum entropy models

Overview

A maximum entropy (MaxEnt) modeling approach using a machine-learning algorithm to predict a species' geographic distribution based on locations of known occurrences and layers of environmental data (Elith et al. 2006, Phillips et al. 2006). The maximum entropy modeling approach estimates the species distribution by finding the maximum entropy (i.e., closest to uniform) distribution, constrained by the environmental data associated with species known locations (Phillips et al. 2006).

Data Requirements

Maximum entropy modeling requires two types of input data, the geographic coordinates of species occurrences and geographically explicit environmental variables likely to influence the distribution of a species at the relevant spatial and temporal scale (Phillips et al. 2006). Occurrence locations only need to represent presence only records (e.g., natural history museum or herbarium) records and at least 50 to 100 occurrence locations are recommended to obtain predictions close to optimal distribution (Phillips et al. 2006).

Strengths

There are many advantages to using the Maximum entropy approach when modeling species distribution (Phillips et al. 2006):

Presence only data are required for species occurrences
Environmental grids can contain continuous and categorical information
There is an efficient deterministic algorithm for obtaining the optimal probability distribution, obviating the need for uncertainty analyses
Over fitting features can be avoided by adjusting the regularization parameter
One of the output products is a continuous map allowing fine distinctions between the species distribution throughout the entire region
Provides insight into relative importance and relationship of each environmental feature predicting species distribution

Weaknesses/Assumptions

Maps (geo-referenced data) of the environmental variables or covariates representing environmental conditions need to be available for the entire landscape. In addition, the environmental variables and the species occurrence locations should be measured for similar time periods (Phillips et al. 2006). The number of environmental variable used in conjunction with the occurrence locations may not be sufficient to describe the species distribution. The occurrence locations may be biased, spatially auto-correlated, or sampling intensity/methods may have widely varied across the study area (Phillips et al. 2006). For example, museum samples may have been collected near roads and within a small segment of the population. There could also be errors when recording the occurrence locations or the species may have been misidentified during field observations.

Capacity needed

Basic knowledge of GIS is needed to ensure that all environmental data have the same format (projection, extent, and resolution). A freely downloadable program Maxent is available to analyze the occurrence locations and environmental grids.

Resource selection probability functions

Multivariate Models

Expert opinion

Overview

When data are limited investigators may consult groups of experts to subjectively delineate species distributions or define environmental features that influence species distributions. Expert opinion can be incorporated into species distribution modeling by providing input into data preparation, identifying suspect records of species occurrences, selecting relevant environmental features influencing species distribution, developing various models, or by grouping vegetation into habitat suitability classes (Pearce et al. 2001).

Data Requirements

This approach requires limited field data collection. However, it is a time consuming process of identifying and interviewing experts for various ecosystems or species. Published literature (peer-reviewed articles and reports) should also be reviewed to supplement expert opinion information. If expert opinion models are displayed spatially then all relevant environmental features are needed in a spatial data layers such as grids or vectors.

Strengths

Since there is little or no field data collection this method is relatively inexpensive. For a few species and ecosystems, experts are available with extensive knowledge based on decades of field experience.

Weaknesses

There is limited publishing information or available expert knowledge for many rare and federally protected species. When experts are available, the degree of their expertise may be difficult to evaluate and it can be difficult to standardize interview techniques. While it is a cost-effective approach with regard to limited field data collection, incorporating expert opinion into distribution modeling can be a slow and tedious process and is usually performed on a species-by-species basis (Seoane et al. 2005). Distribution models created from expert opinion are rarely validated with independent data. Therefore, a high level of uncertainty is present in the model until observations confirm the presence of the species in relation to environment.

To address some of the uncertainties with varying expert opinions, species distribution models created with expert opinion can be subject to a pairwise comparison technique (Analytic Hierarchy Process) developed by Saaty (Saaty T.L., 1980), whereby experts rank the relative importance of each variable in a pair using a continuous scale. For example, each expert selects the variable deemed to be more important in each of pairwise comparisons and rank how important the selected variable is, compared with the others, on a scale of 1 (equally important) to 9 (extremely more important). The pairwise comparisons are transformed into a matrix of ranks based on the Analytic Hierarchy Process model. Those ranks can be calculated by averaging the survey scores of all respondents for each pairwise comparison to represent the relative importance of each variable against another variable.

Capacity needed

Beyond locating and interviewing experts, modeling species distribution with expert opinion usually requires GIS knowledge to compile a map overlaying relevant environmental features expected to influence species distribution.

Yale Framework

Integrating Climate Adaptation and Landscape Conservation Planning

Get Started

Before You Start

Using the Framework

About the Framework

Explore

Pilot Projects

Adaptation Objectives

Approaches & Tools

Data

Create

Community

Workspace

Approaches and Tools for Conducting Assessments of Species Distributions

Generalized linear model and logistic regression model

Overview

Data Requirements

Strengths

Weaknesses/Assumptions

Capacity Needed (construct and run model)

Occupancy models

Overview

Data Requirements

Strengths

Weaknesses/Assumptions

Capacity needed

Maximum entropy models

Overview

Data Requirements

Strengths

Weaknesses/Assumptions

Capacity needed

Resource selection probability functions

Overview

Data Requirements

Strengths

Weaknesses

Capacity needed

Multivariate Models

Overview

Data Requirements

Strengths

Weaknesses/Assumptions

Capacity needed

Expert opinion

Overview

Data Requirements

Strengths

Weaknesses

Capacity needed