Title: | Two-Way Indicator Species Analysis |
---|---|
Description: | Classification of biological communities based on splitting first axis of Correspondence Analysis for the current subset of the data, and finding species that best indicate the splits. The method is particularly popular in vegetation science. |
Authors: | Jari Oksanen [cre, aut] , Mark O. Hill [aut] |
Maintainer: | Jari Oksanen <[email protected]> |
License: | MIT + file LICENCE |
Version: | 0.9-4 |
Built: | 2024-11-19 12:38:55 UTC |
Source: | https://github.com/jarioksa/twinspan |
Vegetation data set of 170 quadrats and 115 species with visually estimated cover percentage values (Oksanen & Ahti, 1982).
data("ahti")
data("ahti")
The taxon names are given with 4+4 acronyms of the binomial name. The
complete scientific names are listed in vignette
"ahtinames"
; perhaps the easiest way of reading the vignette is
to use browseVignettes
(but see Examples below). Only
understorey plants are included in the current file: trees and shrubs,
also as seedlings, are not included, although they are listed in the
original article.
Oksanen & Ahti (1982) allocated the quadrats into vegetation classes
that they called ‘noda’. The name of quadrat is based on the
original nodum followed by the original number of the quadrat. In
northern Finland (nortern boreal zone, roughly corresponding to the
are of intensive reindeer grazing) five noda were separated:
Ster
for Stereocaulon, Vuli
for Vaccinium
uliginosum, Vmyr
for Vaccinium myrtillus, Call
for Calluna, and Empe
for Empetrum. Most Middle
boreal, south boreal and hemiboreal stands were allocated into
Cetraria islandica nodum, but tabulated separately for young
and old forest stands and by vegetation zone: middle boreal (names
starting with M
) and southern boreal (names starting with
S
) zones divided into young (Myng
, Syng
) and old
(Mold
, Sold
) forests. Hemiboreal forests (Hbor
)
were not divided by age classes. The only separate southern boreal
type was Thymus serpyllum nodum (Thym
) on the sunny
slopes of eskers.
Oksanen & Ahti (1982).
Oksanen, J. & Ahti, T. (1982) Lichen-rich pine forest vegetation in Finland. Annales Botanici Fennici 19, 275–301.
## Read the vignette listing complete scientific names if (interactive()) { vignette("ahtinames", package="twinspan") }
## Read the vignette listing complete scientific names if (interactive()) { vignette("ahtinames", package="twinspan") }
Function extracts the species or quadrat classification as a
hierarchic dendrogram
.
## S3 method for class 'twinspan' as.dendrogram( object, what = c("quadrat", "species"), height = c("level", "chi", "eigen"), ... )
## S3 method for class 'twinspan' as.dendrogram( object, what = c("quadrat", "species"), height = c("level", "chi", "eigen"), ... )
object |
|
what |
Return either a |
height |
Use either division levels ( |
... |
Other parameters to functions. |
The dendrogram heights are levels of divisions, total Chi-squares
of divisions and groups, or eigenvalues of divisions depending on
argument height
. Terminal groups have no eigenvalues,
because they were not considered for division. For them the method
uses arbitrary value that for a group of units is
proportion
of the height of mother
division. Chi-squares are evaluated also for terminal groups.
There is no guarantee that eigenvalues or Chi-squares decrease in
divisions, and there may be reversals where lower levels are higher
than their mother groups, and the plotted trees can be messy and
unreadable. Chi-squares decrease more monotonically than
eigenvalues of first axis.
R has a wealth of functions to handle and display
dendrograms. See dendrogram
for general
description. There is even stronger support in packages (for
instance, dendextend).
The terminal groups of twinspan
trees are not binary,
but may have several elements (quadrats, species). In
dendrogram
plots, it is best to set
type="triangle"
for nicer looking trees.
A dendrogram
object.
as.hclust.twinspan
, dendrogram
.
## Large datasets are difficult to show in dendrograms: take only ## Northen Boreal quadrats (from 1 to 87). data(ahti) tw <- twinspan(ahti[1:87,]) den <- as.dendrogram(tw) str(den, max.level = 4) plot(den, type = "triangle", nodePar = list(lab.cex=0.6, pch=NA)) den <- as.dendrogram(tw, height="chi") plot(den, type = "triangle", nodePar = list(lab.cex=0.6, pch=NA))
## Large datasets are difficult to show in dendrograms: take only ## Northen Boreal quadrats (from 1 to 87). data(ahti) tw <- twinspan(ahti[1:87,]) den <- as.dendrogram(tw) str(den, max.level = 4) plot(den, type = "triangle", nodePar = list(lab.cex=0.6, pch=NA)) den <- as.dendrogram(tw, height="chi") plot(den, type = "triangle", nodePar = list(lab.cex=0.6, pch=NA))
Function extracts classification as an hclust
object. The terminal items are the final groups, but quadrats or
species are not shown: hclust
cannot handle
polytomies that are needed to display group members. Use
as.dendrogram
to show the single items. The group ID
number and number of items in the terminal group are used as group
names and are displayed in plots.
## S3 method for class 'twinspan' as.hclust( x, what = c("quadrat", "species"), height = c("level", "chi"), binname = FALSE, ... )
## S3 method for class 'twinspan' as.hclust( x, what = c("quadrat", "species"), height = c("level", "chi"), binname = FALSE, ... )
x |
|
what |
Extract |
height |
Use either division levels ( |
binname |
Use binary labels for classes instead of decimal numbers. |
... |
Other parameters to the function (ignored). |
Function can return either a tree showing the twinspan
hierarchy or showing the heterogeneity of each group or
division. In the first case, all divisions and groups at a certain
level of hierarchy are at the same height, but in the latter the
divisions are at the height defined by their heterogeneity. The
criterion of heterogeneity is the total chi-square (also known as
inertia) of the matrix that twinspan
internally uses
in that division (see twintotalchi
). This tree gives
the visual presentation of the modified method of Roleček et
al. (2009).
When tree heights are based on heterogeneity, subgroups can be more heterogeneous than their parent group. These appear as reversed branches in the tree. A warning is issued for each such case.
an hclust
object amended with labels for
internal nodes (nodelabels
).
Roleček, J, Tichý, L., Zelený, D. & Chytrý, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20: 596–602.
as.dendrogram.twinspan
provides an
alternative which also shows the sampling units (quadrats or
species). The result is based hclust
and can be
handled with its support
functions. plot.twinspan
,
image.twinspan
display the tree. Function
cut.twinspan
cuts the tree by a level of
hierarchy, and cuth
by heterogeneity for original
sampling units (quadrats, species).
data(ahti) tw <- twinspan(ahti) plot(as.hclust(tw, "species")) cl <- as.hclust(tw) ## plot and 8 groups by hierarchy level plot(cl) rect.hclust(cl, 8) ## plot and 8 groups by heterogeneity cl <- as.hclust(tw, height="chi") plot(cl) rect.hclust(cl, 8)
data(ahti) tw <- twinspan(ahti) plot(as.hclust(tw, "species")) cl <- as.hclust(tw) ## plot and 8 groups by hierarchy level plot(cl) rect.hclust(cl, 8) ## plot and 8 groups by heterogeneity cl <- as.hclust(tw, height="chi") plot(cl) rect.hclust(cl, 8)
Returns a vector of twinspan
classes at a given level of
hierarchy or classes respecting group heterogeneity for quadrats or
species.
## S3 method for class 'twinspan' cut(x, level, what = c("quadrat", "species"), binname = FALSE, ...) twingroup(x, group, what = c("quadrat", "species")) cuth(x, what = c("quadrat", "species"), ngroups, binname = FALSE)
## S3 method for class 'twinspan' cut(x, level, what = c("quadrat", "species"), binname = FALSE, ...) twingroup(x, group, what = c("quadrat", "species")) cuth(x, what = c("quadrat", "species"), ngroups, binname = FALSE)
x |
|
level |
Level of hierarchy for classification. If missing, the final level used in the object will be returned. |
what |
Return either a |
binname |
Use binary label for classes instead of decimal number. |
... |
Other parameters (ignored). |
group |
Group id number. |
ngroups |
Number of groups. |
twinspan
returns only the the classification at the
final level, but any upper level classes can be found by integer
divisions by 2. Function cut
returns a vector class id
numbers for a given level of classification. Utility function
twingroup
returns a logical vector that is TRUE
for
items belonging to a certain group at any level. It can be more
practical in subsetting data.
Function cuth
cuts the classification by class heterogeneity
instead of level, and can be used to implement the modified method
of Roleček et al. (2009). The groups are formed with decreasing
heterogeneity but respecting the hierarchy. Total chi-square (also
known as inertia) is used as the criterion of heterogeneity. The
criterion is calculated with twintotalchi
and the
criterion is based on the same data matrix as internally used in
twinspan
. The function can also be used for species
classification, also with the internally used modified species
matrix.
A vector of class numbers for the given level of hierarchy
using Twinspan identifiers. For identifiers and levels, see
summary.twinspan
,
as.hclust.twinspan
.
Roleček, J, Tichý, L., Zelený, D. & Chytrý, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20: 596–602.
predict.twinspan
gives similar classes, but
based on indicator pseudospecies. cutree
provides
a similar functionality for hclust
trees. Function as.hclust.twinspan
generates
corresponding tree presentation, and
plot.twinspan
will print that tree labelling
internal nodes (divisions).
data(ahti) tw <- twinspan(ahti) cut(tw) ## traditional twinspan classification by level of hierarchy cut(tw, level=3) cut(tw, what = "species") ## number of groups as with level=3, but by group heterogeneity cuth(tw, ngroups = 8)
data(ahti) tw <- twinspan(ahti) cut(tw) ## traditional twinspan classification by level of hierarchy cut(tw, level=3) cut(tw, what = "species") ## number of groups as with level=3, but by group heterogeneity cuth(tw, ngroups = 8)
Function returns the eigenvalues of twinspan
divisions.
## S3 method for class 'twinspan' eigenvals(x, what = c("quadrat", "species"), ...)
## S3 method for class 'twinspan' eigenvals(x, what = c("quadrat", "species"), ...)
x |
|
what |
Return eigenvalues of |
... |
Other arguments (ignored). |
The eigenvalues are for the first correspondence analysis axis of
downweighted pseudospecies data (see twinsform
). The
eigenvalues are not evaluated for final groups which are not
divided further, nor if the analysis had terminated in a mother
class. This leaves zeros in the vector of eigenvalues.
The eigenvalues are not additive and each is based on slightly
different data due to downweighting (see twinsform
)
and all come from the first axis of respective analysis. They may
have a relation to the magnitude of differences in division, but
there is no information on the total Chi-square nor on its
reduction in division: the eigenvalues only describe the strength
of the axis used as a device of division. Neither can eigenvalues
for quadrats and species compared: data for species clustering are
constructed in a completely different way than data for quadrat
classification.
Eigenvalues are in general not decreasing, but they can increase
when divisions proceed. Function
as.dendrogram.twinspan
can try to use division
eigenvalues as dendrogram heights, but the resulting trees often
have inversions and look messy or even unreadable.
Vector of eigenvalues ordered by division number, with zero for divisions that were not evaluated (terminal groups, division terminated in earlier steps).
eigenvals
in vegan.
summary.twinspan
displays the eigenvalues
numerically, and they can be used in
as.dendrogram.twinspan
.
data(ahti) tw <- twinspan(ahti) eigenvals(tw)
data(ahti) tw <- twinspan(ahti) eigenvals(tw)
Function draws an image
of mean pseudospecies values
in cells of final quadrat and species classifications together with
corresponding cluster trees.
## S3 method for class 'twinspan' image( x, leadingspecies = FALSE, reorder = FALSE, height = c("level", "chi"), ... )
## S3 method for class 'twinspan' image( x, leadingspecies = FALSE, reorder = FALSE, height = c("level", "chi"), ... )
x |
|
leadingspecies |
Show averages of leading species (most abundant, ties broken by frequency) of each cluster instead of averages of all species of the group. |
reorder |
Do not use the original |
height |
Use either division levels ( |
... |
The function is based on vegan function
tabasco
. Mean abundances in cells are shown
by colours which can be modified by the user (see
tabasco
). The mean abundances are either the
average values of all species of the species group (default), or
the average of the most abundant species of the species group (tied
abundances broken by species frequency) if
leadingspecies=TRUE
. The species groups are rows and quadrat
groups are columns. These are labelled by their decimal group
numbers with information of group sizes, or species by their name
if only leading species are shown. These numbers and the order of
the groups are similar as in hierarchic cluster trees produced by
as.hclust.twinspan
and
summary.twinspan
. Function also adds quadrat
classification tree on the top and species classification tree on
the left margin (see as.hclust.twinspan
). These trees
show either the levels of hierarchy or the heterogeneities of
divided groups depending on the argument height
.
Function returns invisibly the matrix of mean abundances used to produce the graph.
tabasco
and heatmap
for basic
functionality. The default colour scheme is based on
heat.colors
, but better shemes can be constructed
(viridis package provides clear schemes). The
dendrograms are based on as.hclust.twinspan
.
Function twintable
provides an alternative that
can list all quadrats and all species in textual format.
data(ahti) tw <- twinspan(ahti) image(tw) im <- image(tw, height = "chi", leading = TRUE) ## image returns invisibly data head(im)
data(ahti) tw <- twinspan(ahti) image(tw) im <- image(tw, height = "chi", leading = TRUE) ## image returns invisibly data head(im)
twinspan
bases its quadrat classification primarily
on ordination axis. In some cases this is in conflict with the
classification derived from indicator pseudospecies. This function
identifies these cases.
misclassified(x, level, verbose = TRUE, binname = FALSE)
misclassified(x, level, verbose = TRUE, binname = FALSE)
x |
|
level |
Only consider misclassification down to this level. |
verbose |
If |
binname |
Use binary labels instead of decimal numbers for classes and divergent division in verbose object. |
The function compares the final twinspan
classification (from cut.twinspan
) and the
classification predicted from indicator pseudospecies (from
predict.twinspan
). If these two differ, the quadrat
is “misclassified”. With verbose=FALSE
, the function
returns a logical vector that is TRUE
for misclassified
quadrats. In default, it also returns the names, twinspan classes
and predicted classes of misclassified quadrats, and the division
where the misclassification occurred and classifications
diverged. The divisions and their numbers can be seen in
summary.twinspan
and plot.twinspan
.
If verbose=TRUE
, the function returns an object of class
"misclassified"
with following elements.
Index of misclassified quadrats.
Labels (names) of quadrats.
Final classification from twinspan
.
Final classification from predict.twinspan
.
The division where the misclassification occurred.
With verbose=FALSE
, the function returns a logical vector
that is TRUE
for misclassified quadrats.
The basic functions are cut.twinspan
and
predict.twinspan
. You can see the division
numbers with summary.twinspan
(with indicator
pseudospecies) and in plot.twinspan
.
data(ahti) tw <- twinspan(ahti) misclassified(tw) ## see the ID numbers of divisions plot(tw) ## only look at misclassifications at first two levels misclassified(tw, level = 2)
data(ahti) tw <- twinspan(ahti) misclassified(tw) ## see the ID numbers of divisions plot(tw) ## only look at misclassifications at first two levels misclassified(tw, level = 2)
Function displays the classification tree.
## S3 method for class 'twinspan' plot( x, what = c("quadrat", "species"), height = c("level", "chi"), main = "Twinspan Dendrogram", binname = FALSE, ... )
## S3 method for class 'twinspan' plot( x, what = c("quadrat", "species"), height = c("level", "chi"), main = "Twinspan Dendrogram", binname = FALSE, ... )
x |
|
what |
Plot |
height |
Use either division levels ( |
main |
Main title of the plot. |
binname |
Use binary labels for classes and nodes instead of decimal numbers. |
... |
The internal nodes are labelled by the numbers of division. These
are the same numbers as used in summary.twinspan
and
returned by cut.twinspan
or
predict.twinspan
for the same classification
level. For terminal groups the plot shows the numeric code of the
group and the number of items (quadrats or species) in the
group. For division number , its daughter divisions or
groups are coded
and
. The tree is
similar as a plot of
as.hclust.twinspan
, but adds
numbers of internal nodes. The tree can be based either on the
levels of hierarchy or on the heterogeneity of division as assessed
by chi-square (or inertia) of the division (see
as.hclust.twinspan
).
summary.twinspan
for similar textual
presentation also showing the items (quadrats, species) in
terminal groups. vegan function
scores.hclust
can extract the coordinates
of internal (or terminal nodes), and
ordilabel
is used add the labels on
internal nodes.
data(ahti) tw <- twinspan(ahti) plot(tw, "species") ## default plot for quadrats plot(tw) ## plot by the heterogeneity of divisions plot(tw, height = "chi")
data(ahti) tw <- twinspan(ahti) plot(tw, "species") ## default plot for quadrats plot(tw) ## plot by the heterogeneity of divisions plot(tw, height = "chi")
Function predicts the class membership for each quadrat using the reported indicator pseudospecies and limit for the indicator score for the “positive” (right) group.
## S3 method for class 'twinspan' predict(object, newdata, level, binname = FALSE, ...)
## S3 method for class 'twinspan' predict(object, newdata, level, binname = FALSE, ...)
object |
|
newdata |
Data used in prediction. The species will be matched
by their names, and the pseudospecies are based on the
|
level |
Level of hierarchy of classification. If missing, the prediction is made to the highest level of classification. |
binname |
Use binary labels instead of decimal class numbers. |
... |
Other parameters passed to the function (ignored). |
The twinspan
classification is based on splicing polarized
ordination axis, and the reported indicator pseudospecies only
indicate the decisions in each division, and do not necessarily
give the same classification: The original classification cannot be
necessarily found when giving the original data as
newdata
. In the original TWINSPAN this is called
misclassification.
cut.twinspan
gives the original
classification, and misclassified
analyses the
differences of this and predict
.
data(ahti) tw <- twinspan(ahti) predict(tw) predict(tw, level=3) ## misclassifications: predict and twinspan differ sum(predict(tw) != cut(tw)) ## build model for 4/5 of data and predict for the removed 1/5 i <- rep(1:5, length = nrow(ahti)) i <- sample(i) # shuffle in random order tw <- twinspan(ahti[i != 1,]) # remove i==1 predict(tw, newdata = ahti[i==1,])
data(ahti) tw <- twinspan(ahti) predict(tw) predict(tw, level=3) ## misclassifications: predict and twinspan differ sum(predict(tw) != cut(tw)) ## build model for 4/5 of data and predict for the removed 1/5 i <- rep(1:5, length = nrow(ahti)) i <- sample(i) # shuffle in random order tw <- twinspan(ahti[i != 1,]) # remove i==1 predict(tw, newdata = ahti[i==1,])
The function gives a compact summary of divisions with indicator species and items in final classification. The output gives the same essential information as the printed output of TWINSPAN batch program, but in more compact form.
## S3 method for class 'twinspan' summary(object, what = c("quadrat", "species"), binname = FALSE, maxitems, ...)
## S3 method for class 'twinspan' summary(object, what = c("quadrat", "species"), binname = FALSE, maxitems, ...)
object |
|
what |
Summarize either quadrat or species classification. |
binname |
Use binary labels for divisions instead of decimal numbers. |
maxitems |
Maximum number of items (members) listed for terminal groups. |
... |
Other arguments (ignored). |
For each division, summary
prints the eigenvalue. For
quadrat divisions, it also prints the indicator pseudospecies with
their signs, followed by <
and the lowest indicator score
for the ‘positive’ (right) group. If the indicator score is
below this value, follow the summary to the next item at the lower
level, and if the indicator score is at the limit or higher, follow
to the second alternative. For division number , the next
items are either
(‘negative’ group) or
(‘positive’ group). Function
plot.twinspan
displays the division numbers in a
classification tree.
For terminal groups, the function gives the size of the group and lists its elements (quadrats or species).
The function returns nothing. It only prints the result object in a human-readable way.
plot.twinspan
displays the same structure
visually. Function predict.twinspan
follows the
summary strcture to predict the classification with indicator
pseudospecies.
data(ahti) tw <- twinspan(ahti) summary(tw, maxitems = 6) summary(tw, "species")
data(ahti) tw <- twinspan(ahti) summary(tw, maxitems = 6) summary(tw, "species")
Total scaled Chi-square is the same as the sum of all eigenvalues
in Correspondence Analysis. It is a measure of heterogeneity of a
set of data. The function finds this measure for all divisions and
terminal groups in twinspan
totalchi(x) twintotalchi(x, what = c("quadrat", "species"))
totalchi(x) twintotalchi(x, what = c("quadrat", "species"))
x |
A matrix of non-negative data for |
what |
Analyse |
Function first reconstructs the data as it was internally used in
twinspan
using functions twin2stack
and
twin2specstack
. The scaled Chi-square is calculated
with support function totalchi
that can be called
independently for any matrix with non-negative data. The scaling in
Chi-square means that data are standardized to unit sum before
calculating the actual Chi-square. This is often called the sum of
all eigenvalues in Correspondence Analysis, but no eigenvalues are
evaluated in these functions, only their potential sum.
The basic functions are twin2stack
and
twin2specstack
that construct the data matrices.
data(ahti) tw <- twinspan(ahti) twintotalchi(tw)
data(ahti) tw <- twinspan(ahti) twintotalchi(tw)
Functions extract the data twinspan
used in its
analysis, and allow reproducing the internal ordination and
inspecting the twinspan
divisions.
twin2stack(x, subset, downweight = FALSE) twin2mat(x) twin2specstack(x, subset, downweight = TRUE)
twin2stack(x, subset, downweight = FALSE) twin2mat(x) twin2specstack(x, subset, downweight = TRUE)
x |
|
subset |
Select a subset of quadrats ( |
downweight |
Downweight infrequent pseudospecies. |
Function twin2stack
extracts the pseudospecies matrix, where
columns are pseudospecies with their cutlevels. This is similar to
the file generated with twinsform
. The default is to
return a binary matrix, where data entries are eiter or
. Alternatively, it is possible to extract a subset of data
with downweighting allowing scrutiny of
twinspan
divisions. When downweighted data are ordinated with correspondence
analysis (such as vegan functions
cca
, decorana
with
ira=1
) the first eigenvalue will match the eigenvalue in
twinspan
, and when a division is used as a
subset
, its eigenvalue will match with twinspan
.
Function twin2mat
extracts data file with pseudospecies
transformation. Columns are original species, and entries are
abundances after pseudospecies transformation. This is similar as
the output from vegan function
coverscale
with similar cut levels and
argument character=FALSE
. These data were not analysed in
twinspan
, but these are the data tabulated with
twintable
.
Function twin2specstack
returns similar data as used in
species classification in twinspan
. In this matrix,
species are rows and columns are “pseudocluster”
preferences. The preference of each species in each terminal group
and internal division is estimated as proportion of its abundance
(in pseudospecies scale, see twin2mat
) in the group and all
data. If this proportion is 0.8 or higher, species is regarded as
present at pseudocluster value 1, if it is 2 or higher at value 2,
and if it is 6 or higher at value 3. With downweight=FALSE
these data are returned. The columns are named by their division or
cluster number followed a
, b
and c
for
pseudocluster levels (and including zero columns). In default, the
pseudocluster values are still downweighted using species
frequencies as weights, and then rows are weighted by species
frequencies and columns by their totals extended to the same lowest
level of classification, giving two times higher weight to higher
“pseudocluster” levels b
and c
. When
ordinated with correspondence analysis (cca
,
decorana
with ira=1
) this gives similar
eigenvalue for the first axis as in twinspan
, and
when a division is used as a subset
, similar eigenvalue as
in that division.
For original data set instead of twinspan
result, functions twinsform
and
coverscale
are analogous to
twin2stack
and twin2mat
.
data(ahti) dim(ahti) range(ahti) tw <- twinspan(ahti) x <- twin2mat(tw) dim(x) range(x) colnames(x) x <- twin2stack(tw) dim(x) range(x) colnames(x) ## Inspect group 4 x <- twin2stack(tw, subset = twingroup(tw, 4), downweight = TRUE) ## need vegan for correspondence analysis if (suppressPackageStartupMessages(require("vegan"))) { cca(x) } ## species classification x <- twin2specstack(tw) if (suppressPackageStartupMessages(require("vegan"))) { cca(x) }
data(ahti) dim(ahti) range(ahti) tw <- twinspan(ahti) x <- twin2mat(tw) dim(x) range(x) colnames(x) x <- twin2stack(tw) dim(x) range(x) colnames(x) ## Inspect group 4 x <- twin2stack(tw, subset = twingroup(tw, 4), downweight = TRUE) ## need vegan for correspondence analysis if (suppressPackageStartupMessages(require("vegan"))) { cca(x) } ## species classification x <- twin2specstack(tw) if (suppressPackageStartupMessages(require("vegan"))) { cca(x) }
Twinspan returns the classification topology as a
single integer vector. These functions find similar
classification identifiers for each sampling unit, or
cut
that vector for a lower number of classes.
twinid(hclust, ...) ## S3 method for class 'twinid' cut(x, level, binname = FALSE, ...)
twinid(hclust, ...) ## S3 method for class 'twinid' cut(x, level, binname = FALSE, ...)
hclust |
Cluster Analysis result compatible with
|
... |
Other parameters to functions (ignored). |
x |
Vector of classification IDs from |
level |
Level of hierarchy of classification. If missing, level used in the object will be returned. |
binname |
Use binary labels instead of decimal class numbers. |
Twinspan expresses the topology of cluster tree as an
integer. When a cluster is split into two, its
daughters will be
and
, and its
parent cluster is found with integer division
. The
classification vector only stores the topology of the trees,
and has no information on heights.
twinspan
will not split small clusters and only
proceeds to a defined depth of divisions. In contrast,
twinind
proceeds to each terminal unit (leaf, sampling
unit, quadrat) and these will all have unique
identifiers. With cut
function you can restrict the
identifiers to certain level of classification similarly as in
twinspan
(see cut.twinspan
).
Vector of class "twinid"
giving
twinspan
id of each sampling unit.
If the classification is deep and has many (> 30) levels of hierarchy, the identifiers can exceed the integer maximum in R, and leaves may have non-unique identifiers, and may not recover the correct topoloty. However, they may still be unique beyond this limit, but the user should check this after getting a warning.
data(ahti) cl <- hclust(dist(ahti, "manhattan"), "average") (id <- twinid(cl)) cut(id, 6) table(cut(id, 6))
data(ahti) cl <- hclust(dist(ahti, "manhattan"), "average") (id <- twinid(cl)) cut(id, 6) table(cut(id, 6))
Function transforms data so that Correspondence Analysis gives the
same result as in twinspan
divisions.
twinsform(x, cutlevels = c(0, 2, 5, 10, 20), subset, downweight = TRUE)
twinsform(x, cutlevels = c(0, 2, 5, 10, 20), subset, downweight = TRUE)
x |
Input (community) data. |
cutlevels |
Cut levels used to split quantitative data into binary pseudospecies. |
subset |
Logical vector or indices that select a subset of quadrats (sampling units). |
downweight |
Downweight result similarly as in
|
In twinspan
, quantitative species data are split into
binary (0/1) pseudospecies by cutlevels
. All these
pseudospecies are stacked as columns in a new data set. Rare
pseudospecies that occur at lower frequency than 0.2 are
downweight
ed within twinspan
. This
reduces the weight of rare species or rare abundance levels in
correspondence analysis, but downweighting is optional in this
function.
When the downweighted data are analysed with correspondence
analysis (e.g., cca
,
decorana
with option ira=1
), these will
give the same first eigenvalue and ordination as in
twinspan
. When a subset
of a
twinspan
class is used, correspondence analysis of
subdivision of the class can be obtained.
A stacked matrix of optionally downweighted pseudospecies.
downweight
in vegan: this
function is often used with Detrended Correspondence Analysis
(decorana
). However, the implementation is
slightly different in TWINSPAN, and weights differ
slightly. Function twin2stack
extracts similar
data from a twinspan
result object.
data(ahti) tahti <- twinsform(ahti) colnames(tahti) ## needs vegan for correspondence analysis if (suppressPackageStartupMessages(require("vegan"))) { decorana(tahti, ira=1) } ## similar first eigenvalue eigenvals(twinspan(ahti))
data(ahti) tahti <- twinsform(ahti) colnames(tahti) ## needs vegan for correspondence analysis if (suppressPackageStartupMessages(require("vegan"))) { decorana(tahti, ira=1) } ## similar first eigenvalue eigenvals(twinspan(ahti))
Two-Way Indicator Analysis (TWINSPAN) is a divisive classification
method that works by splitting first Correspondence Analysis into
two classes, and then recursively working with each split
subset. The current function is based on and uses much of the
original FORTRAN code of the original TWINSPAN (Hill
1979). twinspan
is the main function of this package, but it
works silently and prints very little information: you must use
separate support functions to extract various aspects of the
result.
twinspan( x, cutlevels = c(0, 2, 5, 10, 20), indmax = 7, groupmin = 5, levmax = 6, lind, lwgt, noind )
twinspan( x, cutlevels = c(0, 2, 5, 10, 20), indmax = 7, groupmin = 5, levmax = 6, lind, lwgt, noind )
x |
Input data, usually a species community data set where columns give the species and rows the sampling units. |
cutlevels |
Cut levels used to split quantitative data into binary pseudospecies. Max of 9 cutlevels can be used. |
indmax |
Maximum number of indicators for division (15 or less). |
groupmin |
Minimum group size for division (2 or larger). |
levmax |
Maximum depth of levels of divisions (15 or less). |
lind |
Weights for levels of pseudospecies. For example
indicator potentials |
lwgt |
Weights for the levels of pseudospecies. For example
weights |
noind |
Numbers (indices) of species that you wish to omit from list of potential indicators. Species omitted from this list are used in the calculation, but cannot appear as indicators. |
twinspan
may not print anything when it runs, but it will
return its result that you should save for later use. The functions
that reproduce most of the traditional printout are
summary.twinspan
and twintable
. The
summary
prints the history of divisions with eigenvalues,
signed indicator pseudospecies and the threshold of indicator
scores for division, and for terminal groups it prints the group
size and group members (quadrats, species). Function
twintable
prints the classified community table. In
addition, plot.twinspan
shows the dendrogram corresponding
to the summary
with division numbers and sizes and id number
of terminal groups. Function image.twinspan
provides
a graphical overview of major structure of classification as a
prelude to twintable
. With function
as.dendrogram.twinspan
it is possible to construct a
dendrogram
of complete classification down to final
units (quadrats, species), and as.hclust.twinspan
constructs an hclust
tree down to final groups.
The twinspan
function performs the classic TWINSPAN with
fixed levels of hierarchy, but with other functions in this package
it is also possible to perform the modified method of Roleček et
al. (2009): the topography of the tree is the same, but the
division heights are based on the hetergeneity of the group, and
the groups are extracted in the order of their heterogeneity. The
measure of the heterogeneity is total standardized chi-square (or
inertia) of the divided group. This is based on the same matrix
as used internally in twinspan
code (see support functions
twintotalchi
, twin2stack
and
twin2specstack
).
The classification at any level of division can be extracted with
cut.twinspan
, and the most heterogenous groups with
cuth
. Function predict.twinspan
provides a similar classification vector, but based on indicator
pseudospecies, and can be used also with new data that was not used
in twinspan
. These two classifications are often in
conflict, and misclassified
will detect those cases
and the divisions where the two classifications diverged. Function
eigenvals
extracts the eigenvalues of divisions, and
twintotalchi
finds the “sum of all
eigenvalues” or the standardized chi-square of each division or
final group.
Function twinsform
transforms the data similarly as
twinspan
and can be used to reproduce the results of any
single division. Functions twin2mat
and
twin2stack
extract the internal data matrices in
standard R format from the twinspan
result.
Function returns an object of class "twinspan"
, with
following items:
Function call.
Defined cutlevels. These will be used in
predict.twinspan
.
Maximum level depth of divisions. The divisions will end when this depth is achieved.
Number of species.
Number of quadrats.
Pseudospecies data in the internal format used in
twinspan
. Functions twin2mat
and
twin2stack
can change this into more usable format.
Results for quadrats (described below).
Results for species (described below).
The results of the analysis are stored in items quadrat
and
species
with similar structure, but species
has only
items iclass
, eig
, labels
and index
of
the following:
ID numbers of final classes at the lowest level of
hierarchy. These can be extracted with cut.twinspan
which also can transform these to any higher level of hierarchy.
Eigenvalues of divisions.
Name labels of units (species, quadrats).
An index to order the units in twinspan
displays, e.g., in twintable
.
A matrix of dimensions maximum number of
indicators maximum number of divisions giving the
signed indices of indicator species for the division. These are
shown in labels in
summary.twinspan
and used by
predict.twinspan
.
Lowest value of indicator score for positive group in a division.
Labels of pseudospecies.
Index from pseudospecies to the corresponding species.
TWINSPAN is very complicated and has several obscure details, and it will not be explained in details in this manual, but you should consult the source code or literature sources. Hill (1979) is the most authorative source, but may be difficult to find. Kent & Coker (1991) do a great job in explaining the method, including many obscure details.
A strong simplification (but often sufficient to understand the basic principles) is that TWINSPAN is a divisive clustering based on splitting first correspondence analysis axis, and applying the same method recursively for resulting classes. The same method is used first for quadrats and then for species. In addition, it finds the species abundance levels (called ‘pseudospecies’) that best indicate these divisions facilitating ecologist's understanding of classes. Species classification is performed so that it best corresponds to the previous quadrat classification also with species composition.
The following details are more technical. The analysis starts with
splitting species abundance data into discrete abundance levels
called pseudospecies. With these the function constructs a stacked
binary matrix with values of 0 and 1, where value 1 means that
species occurs at given threshold (called ‘cut level’) in a
quadrat. Then the pseudospecies (abundance levels) that occur in
fewer than 1/5 of quadrats are downweighted so that their presences
(values 1) are reduced linearly towards minimimum value of 0.01
according to their frequencies. This will reduce their impact in
correspondence analysis which is regarded as being sensitive to
rare species. Each cut level of the species is downweighted
independently. The first axis of correspondence analysis is found
for the downweighted data. This initial step can be reproduced with
the help of functions twinsform
or
twin2stack
. However, the division is not based on
this step only. Next the method finds the best indicator
pseudospecies for the division. Further, it polarizes the
ordination by using indicator scores for all species to find the
final classes for quadrats. It does not mechanically just
split the axis in the middle, but it finds the cutpoint so that
indicator scores from the indicator pseudospecies and final split
are as concordant as possible. Then the analysis is repeated for
both resulting groups, including downweighting within the subset of
quadrats.
After quadrat classification, TWINSPAN constructs species data
which are completely different from the data used in quadrat
classification. Species values depend on their ability to
discriminate quadrat classes at any level of classification. Then
species are classified in the same way and with the same code as
the quadrats. In this way species classification is concordant with
quadrat classification, and good indicators of quadrat classes are
grouped together. The species classification can be reproduced with
function twin2specstack
which also provides a more
detailed description of the data structure used at this stage.
Function twinspan
performs only the classical TWINSPAN, but
with support functions the modified method of Roleček et al. (2009)
can be performed (see cuth
,
as.hclust.twinspan
).
Hill, M.O. (1979). TWINSPAN - a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of individuals and attributes. Cornell Univ., Dept of Ecology and Systematics.
Kent, M. & Coker, P. (1992) Vegetation description and analysis: A practical approach. John Wiley & Sons.
Roleček, J, Tichý, L., Zelený, D. & Chytrý, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20: 596–602.
data(ahti) ## default cut levels (tw <- twinspan(ahti)) ## visual look at the divisions and group numbers plot(tw) ## Braun-Blanquet scale (twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))) plot(twb) ## compare confusion table(cut(tw, level=3), cut(twb, level=3)) ## modified method of Roleček et al. (2009) plot(twb, height="chi", main = "Rolecek tree") ## compare against the default by hierarchy levels table(cuth(twb, ngroups=8), cut(twb, level=3))
data(ahti) ## default cut levels (tw <- twinspan(ahti)) ## visual look at the divisions and group numbers plot(tw) ## Braun-Blanquet scale (twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75))) plot(twb) ## compare confusion table(cut(tw, level=3), cut(twb, level=3)) ## modified method of Roleček et al. (2009) plot(twb, height="chi", main = "Rolecek tree") ## compare against the default by hierarchy levels table(cuth(twb, ngroups=8), cut(twb, level=3))
Prints a community table of pseudospecies ordered by
twinspan
classification.
twintable(object, maxspp, goodspecies, subset)
twintable(object, maxspp, goodspecies, subset)
object |
|
maxspp |
Maximum number of most abundant species displayed. The abundance is estimated with pseudospecies cut levels. The default is to show all species. |
goodspecies |
Select “good species” for
tabulation. These are either species that were used as
indicator pseudospecies ( |
subset |
Select a subset of quadrats. |
Function prints a compact community table of pseudospecies
values. The table is ordered by clustering both species and
quadrats similarly as in summary.twinspan
or in plot
of as.dendrogram.twinspan
. The classification of each
quadrat and species is shown by a sequence of 0
and 1
indicating division of each level. This string is binary
presentation of the decimal class number without the leading
1
.
Only one character is used for each abundance, and the table is
very compact. However, large tables can be divided over several
pages or screen windows. The width of the displayed table is
controlled by R option width
(see
options
). It is possible to select only a
subset
of the quadrats for tabulation giving narrower
tables. The number of species can be reduced by setting the maximum
number of most abundant species, or alternatively, by restricting
tabulation only to “good species” which are the most
abundant species of each species group (ties broken by species
frequency), or species used as indicators, or both.
Function returns invisibly the data that
vegemite
used. This data can also be
exported or used as any other data set.
data(ahti) tw <- twinspan(ahti) ## complete table would be large, but we show subset of group 4 twintable(tw, subset = twingroup(tw, 4), goodspecies = "both")
data(ahti) tw <- twinspan(ahti) ## complete table would be large, but we show subset of group 4 twintable(tw, subset = twingroup(tw, 4), goodspecies = "both")