TUESDAY, 4/16, LAB #1: "An Introduction to Methods for Calculating Measures of Community Similarity"
I. INTRODUCTION
Ecologists who sample communities for the purposes of community classification often record the following types of information:
1) a list of observed species
2) data on species presence/absence, or more ideally, a numerical estimate of their abundances.
The purpose of conducting such comparisons is to determine whether the communities being compared are distinct enough in their floral and faunal compositions to warrant separate community classifications or if they should be classified together.
Such determinations can have important practical implications for species conservation. For example, it may be necessary to determine the degree of similarity between two adjacent communities which border on the edges of a proposed conservation area or area proposed for development. The ability to accurately determine the degree of community similarity may be necessary and play a key role in assuring adequate conservation and protection of such communities.
In todays lab, we will calculate and compare a variety of "Measures of Community Similarity".
II. TWO BROAD CLASSES OF MEASURES OF SIMILARITY:
1) Binary Similarity: Used only when presence/absence data is the only data available;
2) Quantitative Similarity: Requires that some measure of relative abundance is available for each species. Relative abundance may be measured by number of individuals, biomass, cover, productivity, or any measure that quantifies the importance of the species in the community.
III. TWO DESIREABLE ATTRIBUTES OF ALL MEASURES OF SIMILARITY
1) Measure should be independent of sample size and the number of species in the community;
2) Measure should increase smoothly from a fixed minimum to fixed maximum as the communities become more similar.
IV. TYPICAL DATA FORMAT FOR BINARY COEFFICIENTS
2X2 BINARY (ASSOCIATION) COEFFICIENTS
|
SAMPLE A # PRESENT |
SAMPLE A # ABSENT |
|
|
SAMPLE B # PRESENT |
a |
b |
|
SAMPLE B # ABSENT |
c |
d |
Where a= Number of species in sample A and sample B ("joint occurrences")
b = Number of species in sample B but not in sample A
c = Number of species in sample A but not in sample B
*d = Number of species absent in both samples* ("zero-zero matches")
*May be biologically meaningless and is often excluded from binary similarity measures
V. Four COMMONLY EMPLOYED "Similarity Coefficients":
1) Coefficient of Jaccard
Sj =
2) Coefficient of Sorenson
Ss =
*Notes and Remarks: This Coefficient weights matches in species composition more heavily than mismatches; appropriateness of matching depends upon data: Better to use this coefficient than Jaccards if many species known to be present in a community, but not present in a sample from that community
3) Simple Matching Coefficient
SSM =
*Notes and Remarks:
This represents the simplest binary coefficient that uses both positive and negative matches.
4) Baroni-Urbani and Buser Coefficient
SB = ![]()
*Notes and Remarks:
A more complex binary coefficient than the Simple Matching Coefficient that makes use of negative matches.
ADDITIONAL CAVEATS FOR BINARY SIMILARITY COEFFICIENTS
Binary Coefficients are the most crude measures of community similarity because they do not account for commonness and rarity. Given this, they should only be used when data quality is poor (presence/absence data) or when one finds it reasonable to weight all species equally. The coefficients are sensitive to sample size and you must use nearly equal sample sizes in all communities, if possible.
VI. MEASURES OF DISSIMILARITY: DISTANCE COEFFICIENTS
1) "Distance coefficients" require some measure of abundance for each species in the community.
2) "Distance coefficients" provide a measure of community dissimilarity. When the coefficient is zero, communities are identical. Similarity estimates can be obtained from the complement of distance coefficients.
VII. FIVE COMMONLY USED DISTANCE (DISSIMILARITY COEFFICIENTS):
1) Euclidean Distance:
Æjk =
(D1)
where Æjk is Euclidean Distance between samples j and k
Xij is Number of individuals or biomass of species i in sample j
Xik is Number of individuals or biomass of species i in sample k
n is the Total number of species
Notes and Remarks:
*increases with number of samples so average distance is usually calculated instead:
*varies from 0 to infinity; the larger the distance, the less similar the two communities
2) Average Euclidean Distance:
djk =
(D2)
where djk = Average Euclidean Distance between samples j and k
Æjk = Euclidean distance calculated in (D1)
n = number of samples
Notes and Remarks:
*varies from 0 to infinity; the larger the distance, the less similar the two communities
3) Manhattan, or city-block metric (D3)
dM (j,k) = ![]()
where dM (j,k) = Manhattan distance between samples j and k
Xij and Xik = Number of species i in each sample (j and k)
n = number of samples
4) Bray-Curtis Measure of Dissimilarity (D4)
B= 
where B = Bray-Curtis Measure of Dissimilarity
Xij and Xik = Number of species i in each sample (j and k)
n = number of species in samples
4) Bray-Curtis Measure of Dissimilarity Continued:
* Modified form of Manhattan Metric so that it has a range from zero (similar) to 1(dissimilar)
* Some people use the complement of the dissimilarity (1.0 - B) as a preferred similarity measure
* Ignores cases where species absent in both communities and is dominated by abundant species; rare species add very little to value of coefficient
* Strongly affected by sample size: no good in diverse communities with large sample sizes; best if used when there is low species diversity and with small sample sizes.
5) Canberra-Metric
C = 1/n ![]()
Where C = Canberra metric coefficient of dissimilarity between samples j and k
Xij and Xik = Number of species i in each sample (j and k)
n = number of species in samples
* Ranges from 0 to 1 and can be converted to a "similarity measure" using the complement of dissimilarity (1.0-C)
* Different from Bray-Curtis measure because not so strongly affected by the more abundant species in community
* However, this Index Suffers From Two Major Problems:
1) Undefined when species are absent from both community samples so missing species contribute nothing to measure and must be ignored.
2) If individuals of a species are not present in one sample, but present in another, index reaches maximum value. Suggestion: replace all zero values by a small number such as 0.1.
* Strongly affected by sample size: no good in diverse communities with large sample sizes; best if used when there is low species diversity and with small sample sizes.
VIII. OTHER SIMILARITY MEASURES: "PERCENTAGE SIMILARITY"
P =![]()
(P1)
where
P1,i is the percentage of species i in community sample 1
P2,i is the percentage of species i in community sample 2
*Percentage similarity is sometimes referred to as the Renkonen Index. To use this measure of similarity, each community sample must be standardized in terms of percentages such that relative abundances all sum to 100% in one sample.
*In spite of its simplicity, this is one of the best quantitative similarity coefficients available; remains relatively unaffected by sample size and species diversity, not affected by proportional differences in abundance between samples.
*Index ranges from 0 (no similarity) to (100%) complete similarity
IXA) Species Diversity Measures: Heterogeneity
Before using Heterogeneity Measures, determine whether or not you are more interested in emphasizing the dominant or rare species in your community of interest. Type I Indices are most sensitive to changes in the rare species in the community sample and include the Shannon-Wiener and the Brillouin Indices. Type II Indices are more sensitive to changes in the more abundant species and include Simpsons Indices.
Type I Indices:
1) Shannon-Wiener Function:
(HIa)
where H= Information content of sample=Index of species Diversity
s= Number of species
pi= Proportion of total sample belonging to ith species
2) Modified Shannon-Wiener Function: The best heterogeneity measure that is sensitive to the abundances of rare species in the community
N1 = eH EQ(HIb)
Where e = 2.71828 (base of natural logs)
H= Shannon-Wiener function (calculated with using base e logs) in EQ(HIN1 = Number of equally common species that would produce the same diversity as H
Type II Indices:
1) Simpsons Non-Parametric Diversity Measure (Simpson, 1949): Diversity is inversely related to the probability that two individuals picked at random belong to same species:
=
EQ(HIIa)
Where
= "Simpsons Index"
pi = Proportion of species i in the community
To use this as a measure of diversity, take the complement of Simpsons original Index (HIIa):
2) "Simpsons Index of Diversity" = (Probability of picking two organisms at random that are different species)
= 1- (Probability of picking two organisms that are the same species)
=
(HIIb)
Where
= "Simpsons Index"
pi = Proportion of species i in the community
*Index ranges from 0 (low diversity) to almost 1(1-1/s)
3) "Simpsons Reciprocal Index" (Williams, 1964); MacArthur (1972): Index Interpreted as "The Number of equally common species required to generate the observed heterogeneity of the given sample":
EQ(HIIc)
Where 1/
= "Simpsons Reciprocal Index"
pi = Proportion of species i in the community
*Index Ranges from 1(low diversity) to s, the number of species in the sample.
IXB) Species Diversity Measures: Species Evenness Measures
1) Simpsons Measure of Evenness: For Simpsons Measure of Heterogeneity, the maximum diversity is obtained when all abundances are equal (i.e. p=1/S) so in a very large population:
EQ(E1a)
where
= Maximum possible value for Simpsons Index EQ(HIIa)
s = Number of species in sample
It follows from this that the maximum possible value of the reciprocal of Simpsons Index (1/
) is always equal to the number of species observed in the sample. This leads to a simple definition of:
Simpsons Index of Evenness:
EQ(E1b)
where
is "Simpsons Measure of Evenness"
is "Simpsons Index"
s is Number of species in the sample
* Index ranges from 0 to 1 and is relatively unaffected by the rare species in the sample
2) Camargos Index of Evenness: Camargo (1993) proposed a new index of species evenness that is unaffected by species richness and is easy to compute:
EQ(E2)
where
= Camargos index of evenness
pi = Proportion of species i in total sample
pj = Proportion of species j in total sample
s = Number of species in total sample
* Camargos Index, like Simpsons, is relatively unaffected by rare species in the sample
3) Smith and Wilsons Index of Evenness:
Smith and Wilson (1996) invented a new index of evenness based upon variance in the abundance of a species. The variance is measured over the log of the abundances in order to use proportional differences instead of absolute differences in abundance. The new index is provided as:
EQ(E3)
where Evar is Smith and Wilsons Index of Evenness;
Ni = Number of individuals in species i in sample (i=1,2,3,4 s)
Nj = Number of individuals in species j in sample (j=1,2,3,4 s)
And the arctangent is measured as an angle in radians