vegdist {vegan} | R Documentation |
The function computes dissimilarity indices that are useful for or
popular with community ecologists. All indices use quantitative data,
although they would be named by the corresponding binary index, but you
can calculate the binary index using an appropriate argument.
If you do not find your favourite
index here, you can see if it can be implemented using
designdist
.
Gower, Bray–Curtis, Jaccard and
Kulczynski indices are good in detecting underlying
ecological gradients (Faith et al. 1987). Morisita, Horn–Morisita,
Binomial and Chao
indices should be able to handle different sample sizes (Wolda 1981,
Krebs 1999, Anderson & Millar 2004),
and Mountford (1962) and Raup-Crick indices for presence–absence data should
be able to handle unknown (and variable) sample sizes.
vegdist(x, method="bray", binary=FALSE, diag=FALSE, upper=FALSE, na.rm = FALSE, ...)
x |
Community data matrix. |
method |
Dissimilarity index, partial match to "manhattan" ,
"euclidean" , "canberra" , "bray" , "kulczynski" ,
"jaccard" , "gower" , "morisita" , "horn" ,
"mountford" , "raup" , "binomial" or "chao" . |
binary |
Perform presence/absence standardization before analysis
using decostand . |
diag |
Compute diagonals. |
upper |
Return only the upper diagonal. |
na.rm |
Pairwise deletion of missing observations when computing dissimilarities. |
... |
Other parameters. These are ignored, except in
method ="gower" which accepts range.global parameter of
decostand . . |
Jaccard ("jaccard"
), Mountford ("mountford"
),
Raup–Crick ("raup"
), Binomial and Chao indices are discussed below.
The other indices are defined as:
euclidean | d[jk] = sqrt(sum (x[ij]-x[ik])^2) |
manhattan | d[jk] = sum(abs(x[ij] - x[ik])) |
gower | d[jk] = (1/M) sum (abs(x[ij]-x[ik])/(max(x[i])-min(x[i])) |
where M is the number of columns (excluding missing values) | |
canberra | d[jk] = (1/NZ) sum ((x[ij]-x[ik])/(x[ij]+x[ik])) |
where NZ is the number of non-zero entries. | |
bray | d[jk] = (sum abs(x[ij]-x[ik])/(sum (x[ij]+x[ik])) |
kulczynski | d[jk] 1 - 0.5*((sum min(x[ij],x[ik])/(sum x[ij]) + (sum min(x[ij],x[ik])/(sum x[ik])) |
morisita | d[jk] = 1 - 2*sum(x[ij]*x[ik])/((lambda[j]+lambda[k]) * sum(x[ij])*sum(x[ik])) |
where lambda[j] = sum(x[ij]*(x[ij]-1))/sum(x[ij])*sum(x[ij]-1) | |
horn | Like morisita , but lambda[j] = sum(x[ij]^2)/(sum(x[ij])^2) |
binomial | d[jk] = sum(x[ij]*log(x[ij]/n[i]) + x[ik]*log(x[ik]/n[i]) - n[i]*log(1/2))/n[i] |
where n[i] = x[ij] + x[ik] |
Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity.
Binomial index is derived from Binomial deviance under null hypothesis that the two compared communities are equal. It should be able to handle variable sample sizes. The index does not have a fixed upper limit, but can vary among sites with no shared species. For further discussion, see Anderson & Millar (2004).
Mountford index is defined as M = 1/α where α is
the parameter of Fisher's logseries assuming that the compared
communities are samples from the same community
(cf. fisherfit
, fisher.alpha
). The index
M is found as the positive root of equation exp(a*M) + exp(b*M) = 1 +
exp((a+b-j)*M), where j is the number of species occurring in
both communities, and a and b are the number of species in
each separate community (so the index uses presence–absence
information). Mountford index is usually misrepresented in the
literature: indeed Mountford (1962) suggested an approximation to be
used as starting
value in iterations, but the proper index is defined as the root of
the equation
above. The function vegdist
solves M with the Newton
method. Please note that if either a or b are equal to
j, one of the communities could be a subset of other, and the
dissimilarity is 0 meaning that non-identical objects may be
regarded as similar and the index is non-metric. The Mountford index
is in the range 0 ... log(2), but the dissimilarities are
divided by log(2)
so that the results will be in the conventional range 0 ... 1.
Raup–Crick dissimilarity (method = "raup"
) is a probabilistic
index based on presensec/absence data. It is defined as 1 - prob(j),
or based on the probability of observing at least j
species in shared in compared communities. Legendre & Legendre (1998)
suggest
using simulations to assess the probability, but the current function
uses analytic result from hypergeometric distribution
(phyper
) instead. This probability (and the index) is
dependent on the number of species missing in both sites, and adding
all-zero species to the data or removing missing species from the data
will influence the index. The probability (and the index) may be
almost zero or almost one for a wide range of parameter values. The
index is nonmetric: two
communities with no shared species may have a dissimilarity slightly
below one, and two identical communities may have dissimilarity
slightly above zero.
Chao index tries to take into account the number of unseen species
pairs, similarly as Chao's method in
specpool
. Function vegdist
implements a Jaccard
type index defined as d_{jk} = U_j U_k/(U_j + U_k - U_j U_k), where
U_j = C_j/N_j + (N_k - 1)/N_k times a_1/(2 a_2) times S_j/N_j. Here
C_j is the total number of individuals in species shared with
site k, N is the total number of individuals, a_1
and a_2 are number of species occurring only with one or two
individuals in another site, and S_j is the number of
individuals in species that occur only with one individual in
another site (Chao et al. 2005).
Morisita index can be used with genuine count data (integers) only. Its Horn–Morisita variant is able to handle any abundance data.
Euclidean and Manhattan dissimilarities are not good in gradient separation without proper standardization but are still included for comparison and special needs.
Bray–Curtis and Jaccard indices are rank-order similar, and some
other indices become identical or rank-order similar after some
standardizations, especially with presence/absence transformation of
equalizing site totals with decostand
. Jaccard index is
metric, and probably should be preferred instead of the default
Bray-Curtis which is semimetric.
The naming conventions vary. The one adopted here is traditional
rather than truthful to priority. The function finds either
quantitative or binary variants of the indices under the same name,
which correctly may refer only to one of these alternatives For
instance, the Bray
index is known also as Steinhaus, Czekanowski and Sørensen index.
The quantitative version of Jaccard should probably called
Ružička index.
The abbreviation "horn"
for the Horn–Morisita index is
misleading, since there is a separate Horn index. The abbreviation
will be changed if that index is implemented in vegan
.
Should provide a drop-in replacement for dist
and
return a distance object of the same type.
The function is an alternative to dist
adding
some ecologically meaningful indices. Both methods should produce
similar types of objects which can be interchanged in any method
accepting either. Manhattan and Euclidean dissimilarities should be
identical in both methods. Canberra index is divided by the
number of variables in vegdist
, but not in dist
.
So these differ by a constant multiplier, and the alternative in
vegdist
is in range (0,1). Function daisy
(package cluster) provides alternative implentation of Gower
index for mixed data of numeric and class variables (but it works for
mixed variables only).
Most dissimilarity indices in vegdist
are designed for
community data, and they will give misleading values if there are
negative data entries. The results may also be misleading or
NA
or NaN
if there are empty sites. In principle, you
cannot study species compostion without species and you should remove
empty sites from community data.
Jari Oksanen, with contributions from Tyler Smith (Gower index) and Michael Bedward (Raup–Crick index).
Anderson, M.J. and Millar, R.B. (2004). Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand. Journal of Experimental Marine Biology and Ecology 305, 191–221.
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8, 148–159.
Faith, D. P, Minchin, P. R. and Belbin, L. (1987). Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57–68.
Krebs, C. J. (1999). Ecological Methodology. Addison Wesley Longman.
Legendre, P, & Legendre, L. (1998) Numerical Ecology. 2nd English Edition. Elsevier.
Mountford, M. D. (1962). An index of similarity and its application to classification problems. In: P.W.Murphy (ed.), Progress in Soil Zoology, 43–50. Butterworths.
Wolda, H. (1981). Similarity indices, sample size and diversity. Oecologia 50, 296–302.
Function designdist
can be used for defining your own
dissimilarity index. Alternative dissimilarity functions include
dist
in base R,
daisy
(package cluster), and
dsvdis
(package labdsv). Function
betadiver
provides indices intended for the analysis of
beta diversity.
data(varespec) vare.dist <- vegdist(varespec) # Orlóci's Chord distance: range 0 .. sqrt(2) vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")