TUESDAY, 4/16, LAB #1: "An Introduction to Methods for Calculating Measures of Community Similarity"

I. INTRODUCTION

Ecologists who sample communities for the purposes of community classification often record the following types of information:

1) a list of observed species

2) data on species presence/absence, or more ideally, a numerical estimate of their abundances.

The purpose of conducting such comparisons is to determine whether the communities being compared are distinct enough in their floral and faunal compositions to warrant separate community classifications or if they should be classified together.

Such determinations can have important practical implications for species conservation. For example, it may be necessary to determine the degree of similarity between two adjacent communities which border on the edges of a proposed conservation area or area proposed for development. The ability to accurately determine the degree of community similarity may be necessary and play a key role in assuring adequate conservation and protection of such communities.

In today’s lab, we will calculate and compare a variety of "Measures of Community Similarity".

 

II. TWO BROAD CLASSES OF MEASURES OF SIMILARITY:

1) Binary Similarity: Used only when presence/absence data is the only data available;

2) Quantitative Similarity: Requires that some measure of relative abundance is available for each species. Relative abundance may be measured by number of individuals, biomass, cover, productivity, or any measure that quantifies the importance of the species in the community.

 

 

III. TWO DESIREABLE ATTRIBUTES OF ALL MEASURES OF SIMILARITY

1) Measure should be independent of sample size and the number of species in the community;

2) Measure should increase smoothly from a fixed minimum to fixed maximum as the communities become more similar.

 

 

 

 

IV. TYPICAL DATA FORMAT FOR BINARY COEFFICIENTS

2X2 BINARY (ASSOCIATION) COEFFICIENTS

 

SAMPLE A # PRESENT

SAMPLE A # ABSENT

SAMPLE B # PRESENT

a

b

SAMPLE B # ABSENT

c

d

Where a= Number of species in sample A and sample B ("joint occurrences")

b = Number of species in sample B but not in sample A

c = Number of species in sample A but not in sample B

*d = Number of species absent in both samples* ("zero-zero matches")

*May be biologically meaningless and is often excluded from binary similarity measures

V. Four COMMONLY EMPLOYED "Similarity Coefficients":

1) Coefficient of Jaccard

Sj =

2) Coefficient of Sorenson

Ss =

*Notes and Remarks: This Coefficient weights matches in species composition more heavily than mismatches; appropriateness of matching depends upon data: Better to use this coefficient than Jaccard’s if many species known to be present in a community, but not present in a sample from that community

3) Simple Matching Coefficient

SSM =

*Notes and Remarks:

This represents the simplest binary coefficient that uses both positive and negative matches.

4) Baroni-Urbani and Buser Coefficient

SB =

*Notes and Remarks:

A more complex binary coefficient than the Simple Matching Coefficient that makes use of negative matches.

ADDITIONAL CAVEATS FOR BINARY SIMILARITY COEFFICIENTS

Binary Coefficients are the most crude measures of community similarity because they do not account for commonness and rarity. Given this, they should only be used when data quality is poor (presence/absence data) or when one finds it reasonable to weight all species equally. The coefficients are sensitive to sample size and you must use nearly equal sample sizes in all communities, if possible.

VI. MEASURES OF DISSIMILARITY: DISTANCE COEFFICIENTS

1) "Distance coefficients" require some measure of abundance for each species in the community.

2) "Distance coefficients" provide a measure of community dissimilarity. When the coefficient is zero, communities are identical. Similarity estimates can be obtained from the complement of distance coefficients.

VII. FIVE COMMONLY USED DISTANCE (DISSIMILARITY COEFFICIENTS):

1) Euclidean Distance:

Æjk = (D1)

where Æjk is Euclidean Distance between samples j and k

Xij is Number of individuals or biomass of species i in sample j

Xik is Number of individuals or biomass of species i in sample k

n is the Total number of species

Notes and Remarks:

*increases with number of samples so average distance is usually calculated instead:

*varies from 0 to infinity; the larger the distance, the less similar the two communities

2) Average Euclidean Distance:

djk = (D2)

where djk = Average Euclidean Distance between samples j and k

Æjk = Euclidean distance calculated in (D1)

n = number of samples

Notes and Remarks:

*varies from 0 to infinity; the larger the distance, the less similar the two communities

3) Manhattan, or city-block metric (D3)

dM (j,k) =

where dM (j,k) = Manhattan distance between samples j and k

Xij and Xik = Number of species i in each sample (j and k)

n = number of samples

 

4) Bray-Curtis Measure of Dissimilarity (D4)

B=

where B = Bray-Curtis Measure of Dissimilarity

Xij and Xik = Number of species i in each sample (j and k)

n = number of species in samples

 

4) Bray-Curtis Measure of Dissimilarity Continued:

* Modified form of Manhattan Metric so that it has a range from zero (similar) to 1(dissimilar)

* Some people use the complement of the dissimilarity (1.0 - B) as a preferred similarity measure

* Ignores cases where species absent in both communities and is dominated by abundant species; rare species add very little to value of coefficient

* Strongly affected by sample size: no good in diverse communities with large sample sizes; best if used when there is low species diversity and with small sample sizes.

 

5) Canberra-Metric

C = 1/n

Where C = Canberra metric coefficient of dissimilarity between samples j and k

Xij and Xik = Number of species i in each sample (j and k)

n = number of species in samples

* Ranges from 0 to 1 and can be converted to a "similarity measure" using the complement of dissimilarity (1.0-C)

* Different from Bray-Curtis measure because not so strongly affected by the more abundant species in community

* However, this Index Suffers From Two Major Problems:

1) Undefined when species are absent from both community samples so missing species contribute nothing to measure and must be ignored.

2) If individuals of a species are not present in one sample, but present in another, index reaches maximum value. Suggestion: replace all zero values by a small number such as 0.1.

* Strongly affected by sample size: no good in diverse communities with large sample sizes; best if used when there is low species diversity and with small sample sizes.

 

VIII. OTHER SIMILARITY MEASURES: "PERCENTAGE SIMILARITY"

P = (P1)

where P is percentage similarity between sample 1 and 2

P1,i is the percentage of species i in community sample 1

P2,i is the percentage of species i in community sample 2

*Percentage similarity is sometimes referred to as the Renkonen Index. To use this measure of similarity, each community sample must be standardized in terms of percentages such that relative abundances all sum to 100% in one sample.

*In spite of its simplicity, this is one of the best quantitative similarity coefficients available; remains relatively unaffected by sample size and species diversity, not affected by proportional differences in abundance between samples.

*Index ranges from 0 (no similarity) to (100%) complete similarity

 

IXA) Species Diversity Measures: Heterogeneity

Before using Heterogeneity Measures, determine whether or not you are more interested in emphasizing the dominant or rare species in your community of interest. Type I Indices are most sensitive to changes in the rare species in the community sample and include the Shannon-Wiener and the Brillouin Indices. Type II Indices are more sensitive to changes in the more abundant species and include Simpson’s Indices.

 

Type I Indices:

1) Shannon-Wiener Function:

(HIa)

where H= Information content of sample=Index of species Diversity

s= Number of species

pi= Proportion of total sample belonging to ith species

2) Modified Shannon-Wiener Function: The best heterogeneity measure that is sensitive to the abundances of rare species in the community

N1 = eH’ EQ(HIb)

 

Where e = 2.71828 (base of natural logs)

H= Shannon-Wiener function (calculated with using base e logs) in EQ(HIN1 = Number of equally common species that would produce the same diversity as H

Type II Indices:

1) Simpson’s Non-Parametric Diversity Measure (Simpson, 1949): Diversity is inversely related to the probability that two individuals picked at random belong to same species:

= EQ(HIIa)

Where = "Simpson’s Index"

pi = Proportion of species i in the community

To use this as a measure of diversity, take the complement of Simpson’s original Index (HIIa):

2) "Simpson’s Index of Diversity" = (Probability of picking two organisms at random that are different species)

= 1- (Probability of picking two organisms that are the same species)

= (HIIb)

Where = "Simpson’s Index"

pi = Proportion of species i in the community

*Index ranges from 0 (low diversity) to almost 1(1-1/s)

3) "Simpson’s Reciprocal Index" (Williams, 1964); MacArthur (1972): Index Interpreted as "The Number of equally common species required to generate the observed heterogeneity of the given sample":

EQ(HIIc)

Where 1/ = "Simpson’s Reciprocal Index"

pi = Proportion of species i in the community

*Index Ranges from 1(low diversity) to s, the number of species in the sample.

IXB) Species Diversity Measures: Species Evenness Measures

1) Simpson’s Measure of Evenness: For Simpson’s Measure of Heterogeneity, the maximum diversity is obtained when all abundances are equal (i.e. p=1/S) so in a very large population:

 

EQ(E1a)

 

where = Maximum possible value for Simpson’s Index EQ(HIIa)

s = Number of species in sample

It follows from this that the maximum possible value of the reciprocal of Simpson’s Index (1/) is always equal to the number of species observed in the sample. This leads to a simple definition of:

Simpson’s Index of Evenness:

EQ(E1b)

where is "Simpson’s Measure of Evenness"

is "Simpson’s Index"

s is Number of species in the sample

* Index ranges from 0 to 1 and is relatively unaffected by the rare species in the sample

2) Camargo’s Index of Evenness: Camargo (1993) proposed a new index of species evenness that is unaffected by species richness and is easy to compute:

EQ(E2)

 

 

where = Camargo’s index of evenness

pi = Proportion of species i in total sample

pj = Proportion of species j in total sample

s = Number of species in total sample

* Camargo’s Index, like Simpson’s, is relatively unaffected by rare species in the sample

3) Smith and Wilson’s Index of Evenness:

Smith and Wilson (1996) invented a new index of evenness based upon variance in the abundance of a species. The variance is measured over the log of the abundances in order to use proportional differences instead of absolute differences in abundance. The new index is provided as:

EQ(E3)

 

 

where Evar is Smith and Wilson’s Index of Evenness;

Ni = Number of individuals in species i in sample (i=1,2,3,4…s)

Nj = Number of individuals in species j in sample (j=1,2,3,4…s)

And the arctangent is measured as an angle in radians