%\VignetteIndexEntry{Unconstrained Gaussian Maximum Likelihood Ordination} \documentclass[a4paper,10pt]{article} \usepackage{vegan} % vegan settings \usepackage{graphicx} \usepackage{subfigure} \usepackage{tikz} % used for Gaussian response pic \newcommand{\E}{\mathrm{e}} % used in the previous tikz pic \title{Gaussian Ordination} \author{Jari Oksanen} \begin{document} \SweaveOpts{strip.white=true} <>= par(mfrow=c(1,1)) options(width=72) figset <- function() par(mar=c(4,4,1,1)+.1) options(SweaveHooks = list(fig = figset)) options("prompt" = "> ", "continue" = " ") @ \maketitle \tableofcontents \section{Gaussian response model} The Gaussian response model for a single species and a single gradient is defined as (Fig.~\ref{fig:gaussresp}) \begin{figure} \centering \begin{tikzpicture}[scale=1.2] \begin{small} \draw (-2.75,0) -- (3,0) node [near end,below]{Gradient $x$}; \draw (-2.75,0) -- (-2.75,3) node [near end,above,sloped]{Response $\mu$}; \draw[domain=-2.75:3,smooth,thick] plot (\x, {3*exp(-\x*\x/2)}); \draw (0,3) -- (0,0) node [near start,left]{$h$} node [below] {$u$}; \draw (0, {3*exp(-0.5)}) -- (1, {3*exp(-0.5)}) node [midway,above]{$t$}; \draw [dashed] (1, 0) -- (1, {exp(-0.5)*3}); \draw [dashed] (0, {exp(-0.5)*3}) -- (-1, {exp(-0.5)*3}) -- (-1,0) node [below] {$u-t$}; \draw [dashed] (2, 0) -- (2, {exp(-2)*3}) -- (0, {exp(-2)*3}) -- (-2, {exp(-2)*3}) -- (-2,0) node [below] {$u-2t$}; \draw (-2.75, 3) -- (-2.65,3) node[right] {$h$}; \draw (-2.75, {exp(-0.5)*3}) -- (-2.65,{exp(-0.5)*3}) node[right] {$h \E^{-1/2}$}; \draw (-2.75, {exp(-2)*3}) -- (-2.65,{exp(-2)*3}) node [right] {$h \E^{-2}$}; \end{small} \end{tikzpicture} \caption{Gaussian response function of eq.~(\ref{eq:gauss}).} \label{fig:gaussresp} \end{figure} \begin{equation} \label{eq:gauss} \mu_i = h \exp \left[-\frac{(x_i - u)^2}{2t^2}\right] \, , \end{equation} where $\mu_i$ is the fitted value for SU $i$, $x_i$ is the gradient value, and $h$, $u$ and $t$ are the response parameters of the species: $h$ is the maximum height of the response, $u$ is the location of the optimum where $\mu = h$ is obtained, and $t$ is the width or tolerance of the response. The Gaussian response can be re-parametrized as a generalized linear model \begin{equation} \label{eq:gpoly} g(\mu_i) = a + b x_i + c x_i^2 \, , \end{equation} which is equal to the Gaussian model of eq.~(\ref{eq:gauss}) if $c < 0$. In this model, $a$, $b$ and $c$ are species parameters and $g(\cdot)$ is a link function. For Gaussian response, the link function should be $g = \log$ so that the inverse link is $g^{-1} = \exp$. In practice, binomial models are normally fitted with logistic link function which does not strictly define a Gaussian response, but is still treated similary as $\log$ link. Polynomial form eq.~(\ref{eq:gpoly}) is a re-parametrization of the original Gaussian eq.~\ref{eq:gauss}, and the orginal Gaussian parameters can be found from the polynomial coefficients (provided that $c < 0$): \begin{align} t &= \sqrt{-\frac{1}{2c}}\\ u &= -\frac{b}{2c} = b t^2\\ g(h) &= a -\frac{b^2}{4c} = a + 0.5 b^2 t^2\, . \end{align} If we scale $x$ ``in sd units'' so that $t = 1$ (and hence $c=-0.5$), eq.~(\ref{eq:gpoly}) becomes \begin{equation} \label{eq:gpolyfix} g(\mu_i) = a + b x_i - 0.5 x_i^2 \, . \end{equation} With this scaling, $t=1$, $u = b$ and $g(h) = a + 0.5 b^2$. Curiously, the unimodal optima $u$ are expressed as linear coefficients $b$\,--\,something we discussed with Cajo ter Braak's new electronic paper on CA. In Cajo's paper that was approximate, but here the correspondence is exact since we defined that species have unit response widths $t=1$. \section{Gaussian Ordination} In usual model fitting with Gaussian responses, we regard gradient values $x_i$ as known, and estimate the species parameters so that fitted values $\mu_i$ are as similar to observed values $y_i$ as possible. In Maximum Likelihood estimation the fitted species parameters are estimated to maximize the likelihood of fitted values $\mu_i$ given data $y_i$ and $x_i$. In Gaussian Ordination we select $x_i$ and species parameters so that fitted values $\mu_i$ will maximize the likelihood function given data $y_i$ (and the model). We generalize eq.~\ref{eq:gpolyfix} with fixed $t=1$ for several species $j$ and several $k$ gradients: \begin{equation} \label{eq:GO} g(\mu_{ij}) = a_j - 0.5 \sum_k x_{ki}^2 + \sum_k b_{kj} x_{ki} \, . \end{equation} All terms ($a_j$, $b_{kj}$ and $x_{ki}$) are model parameters which are estimated to maximize the likelihood of fitted values $\mu_{ij}$ given data $y_{ij}$. The first terms $a_j$ define the scale or height for each species, second terms $-0.5 x_{ki}^2$ scale each gradient to unit tolerance, and the last terms $b_{kj} x_{ki}$ define the relationship between the species and the gradients. These last ones are the terms of interest: $b_{kj}$ are the locations of optima $u$ for species on each gradient, and $x_{ki}$ are the gradient values. \subsection{Details of implementation} Equation~(\ref{eq:GO}) can be solved with non-linear minization of log-likelihood or deviance function. With a $n \times m$ data matrix and $k$ ordination dimensions, there are $m$ parameters $a$, $km$ parameters $b$ and $kn$ parameters $x$ or altogether $m + k(m+n)$ parameters which can be a large number. There are two obvious alternative ways of estimating the model: \begin{enumerate} \item Only ordination dimensions $x$ are found by non-linear minimization, and the species parameters $a$ and $b$ are found conditionally to the current iterate of $x$ using GLM. The disadvantages of the strategy is that evaluation of the function is very expensive, because full GLM iteration is performed for every choice of $x$, and function may be evaluated hundreds of times in every iteration step. Further, it may be that sometimes function is trapped because species parameters adapt to the current estimates of $x$ and prevent changing $x$. This is apparent as stopping because steplength is too short. Analysis of a $24 \times 28$ data set used below took nearly 400\,sec in a 1.6\,GHz laptop, but used only 23 iterations. \item All parameters are found simultaneously with non-linear minization. The most obvious disadvantage is that the non-linear minimization problem can be huge. However, the same data set as above took 33\,sec, although it needed 197 iterations. Large data sets can be really slow: one case I studied was a subset of 127 most common species in 398 SUs of the Mt.\,Field vegetation data whick in 2 dimensions gave $127 + 2 \times (127 + 398) = 1177$ estimated parameters, and it took days to run this analysis in a 1.6\,GHz laptop. \end{enumerate} I have implemented both alternatives although the main development branch is based on the second alternative, and this paper only uses the second alternative. In tests, both approaches gave similar results for one dimension. Non-linear minimization needs starting values, and because problem is large, they should be rather good. I have used site scores of detrended correspondence analysis as a starting values for $x$, and coefficients of the corresponding GLM (eq.~\ref{eq:gpolyfix}) fit for species parameters $a$ and $b$. A commonly held conjecture is that if all species have Gaussian responses with equal tolerances, DCA should be able to estimate both the locations of species optima $u$ and gradient values $x$ related with them. I will add an option to supply own starting values for $x$. The target function to be minimized is the deviance of the model. The current version of the function uses quasi-Binomial deviance, but there will be other alternatives and this will be made a user choice. The Binomial model uses logit link function so that the models are not strictly Gaussian. Other error models to be inspected are quasi-Poisson and Normal (Gaussian distribution), both with $\log$ link. For effective evaluation, we need derivatives of the likelihood with respect to the parameters. If these are not supplied, numerical derivatives will be used, and for each parameter this means re-evaluation of the complete loss function. McCullagh and Nelder show that the general form of the derivate of likelihood function $l$ with respect to the parameter $p$ is \begin{align} \frac{\partial l}{\partial p} &= \frac{\partial l}{\partial \theta} \frac{d \theta}{d \mu} \frac{d \mu}{d \eta} \frac{\partial \eta}{\partial p}\\ &= \frac{W (y-\mu)}{\VAR (\mu)} \frac{d \mu}{d \eta} \frac{\partial \eta}{\partial p}\,, \end{align} where $W$ are prior weights, $y$ are observed values (possibly scaled by $W^{-1}$), $\mu$ are fitted values, and $\eta$ are the linear predictors, and $\theta$ is the parameter of the exponential family defining both fitted values and variances. The $\VAR (\mu$) and $\frac{d \mu}{d \eta}$ functions are supplied by \R{} as a part of defining the error distribution and link function. The partial derivatives of parameters with respect to linear predictor $\frac{\partial \eta}{\partial p}$ of eq.~(\ref{eq:gpoly}) are trivial: \begin{align} \frac{\partial \eta}{\partial a} = 1\,, \quad \frac{\partial \eta}{\partial b} = x \,,\quad \frac{\partial \eta}{\partial x} = b-x \,. \end{align} \subsection{Testable models} We have ML ordination and therefore we have all ML statistical tools available. Each model has deviance ($D$) which is a measure of badness of fit, and degrees of freedom of that fit. For instance, we can inspect two models with different number of extracted dimensions and see if the extra dimensions are significant. Let us inspect an alternative model with deviance $D_A$ and $k_A$ dimensions and a null model with deviance $D_0$ and $k_0$ dimensions, where $D_A \le D_0$ and $k_A > k_0$. The significance of extra dimensions $k_A - k_0$ can be evaluated with $F$ statistic \begin{equation} F_{(k_A-k_0)(m+n), mn - k_A(m+n) -m} = \frac{(D_0-D_A)/\left[(k_A-k_0)(m+n)\right]}{D_A/(mn - k_A(m+n) - m)} \, . \end{equation} A null model with no gradients ($k=0$) will only estimate $m$ coefficients for the average of each species, and can be used as a baseline of model comparison. Please note that the following does not define a meanigfully testable model: \begin{equation} g(\mu_{ij}) = a_j - 0.5 \sum_k x_{ij}^2 \, . \end{equation} This defines a model where all species have their optima $u$ at the gradient origin $x = 0$, and superficially could be regarded as a test for an overall test that species have different optima $b$ when compared against eq.~(\ref{eq:GO}). However, the location of the origin $x = 0$ is arbitrary, and tests against an arbitrary value are meaningless. The default ANOVA-style tests for GLM would use this approach, and compare model with the the linear coefficient and fixed quadratic coefficients to a model with only the fixed quadratic coefficient. In principle it would be possible to get the estimates of standard errors of the parameters, but this is not done (yet). For this, we need to use log-likelihood as a minimized function instead of deviance (which has twice larger differences), and ask the minimizer function to return the Hessian or the matrix of the second derivatives. The estimates of the standard errors of the parameters are the square roots of inverse Hessian. These standard errors could be used to display the error variation ellipses of the ordination scores both for species and for SUs. \subsection{Constrained Gaussian Ordination} This is a work not yet done. However, the constrained version of GO looks pretty simple: Instead of freely estimating the gradients $x$ in eq.~(\ref{eq:GO}), we estimate $x$ as a function of $p$ constraining variables $z$ \begin{equation} x_{ki} = \sum_p \beta_{kp} z_{ip} \,, \end{equation} and plug the estimated $x_{ki}$ into eq.~(\ref{eq:GO}). This is actually an easier problem than unconstrained GO, because we only need to estimate $kp$ regression coefficients ($\beta$) instead of $kn$ gradient values, and usually $p \ll n$. This will only give us strictly constrained scores $x$ comparable to linear combination (LC) scores in CCA and RDA. It may be possible to get related scores that are estimates from species composition, and hence comparable to WA scores in CCA/RDA. We reverse the model and ask what are the most likely values of $x$ giving the observed $y$ and the current model with fixed species parameters. This is the multivariate calibration or bioindication model that I suggested in JVS 1 in 1991. However, I am not sure that this work consistently: what would it give as $x$ if similarly applied in unconstrained GO? Consistent method should give the same $x$ as estimated originally, but I do not believe this will happen (but will try and see). \section{Examples} I use a current and always changing version of Gaussian Ordination in these examples and data sets and functions from \pkg{vegan} package. <<>>= library(GO) library(vegan) @ The file \code{GO.R} contains main functions for analysis plus some support functions. All these are under work, and they will change, in particular in their internals, and even the user-interface is not yet stabilized. The two main analysis functions are \code{GO1} which implements alternate non-linear estimation of gradient values $x$ and GLM estimation of species parameters $a, b$. The user interfaces are: <<>>= args(GO1) args(GO) @ Both of these functions share some arguments: \begin{itemize} \item \code{comm}: The community data frame. \item \code{tot}: Binomial denominator. \item \code{freqlim}: Frequency (number of occurrences) of rarest species to be analysed. $k + 1$ parameters are fitted for each species, and therefore we need some minimal data for estimation. \end{itemize} Function \code{GO1} is so far implemented only for one dimension. Because it uses the alternating non-linear and GLM fitting, it can use parallel processing in the evaluation (\code{parallel} gives the number of desired parallel processes). Function \code{GO} can fit multidimensional models, and parameter \code{k} gives the number of axes. Currently \code{k} is limited to 4 axes, but there is no particularly compelling reason to impose this limit\,--\,I only assume that things get hairier as the number of axes increases. In addition, we can pass arguments to the minimizing function \code{nlm} in these functions, and we shall set higher iteration limits with \code{iterlim}. \code{GO1} is currently only available for one dimension, and it is very slow, and therefore we use only \code{GO}. Let us first evaluate models for one to three dimensions: <<>>= data(varespec) system.time(m1 <- GO(varespec, k=1, freqlim=10, tot=100, iterlim=1000)) system.time(m2 <- GO(varespec, k=2, freqlim=10, tot=100, iterlim=1000)) system.time(m3 <- GO(varespec, k=3, freqlim=10, tot=100, iterlim=1000)) @ I have written some method functions that can be used with the result: <<>>= methods(class="GO") @ For instance, there is a \code{print} method so that we get a brief overview of the result (and typing the name of the object implicitly calls \code{print}): <<>>= m1 m2 m3 @ The \code{print} gives a brief overview of the result object, e.g., the astonishingly high proportions of deviance explained by each ordination. In fact, the result object contains a large number of components which are not displayed, but can be used by other functions: <<>>= names(m2) @ The main results are SU scores (\code{points}, $x_{ik}$) and species scores (\code{species}, $b_{jk}$ which give the optima; the constants $a_j$ are saved in item \code{b0}). The result object was constructed so that several standard \R{} functions can extract the results. Moreover, \pkg{vegan} function \code{scores} can extract species and site scores and many standard \pkg{vegan} functions can handle the result and we do not need to write new functions or interfaces; some examples we will see here are functions are \code{ordiplot}, \code{enfvit}, \code{ordisurf}, \code{procrustes}. I have made a specific \code{plot} function that displays the responses against gradients (Fig.~\ref{fig:plotm1}). \begin{figure} \subfigure[a]{\includegraphics[width=0.5\linewidth]{GO-plot-GO}}% \subfigure[b]{\includegraphics[width=0.5\linewidth]{GO-ordiplot-stack}} \caption{One-dimensional Gaussian Ordination. \textbf{a} Dedicated \code{plot} function of \code{GO} showing fitted Gaussian responses against the gradient. \textbf{b} Standard \pkg{vegan} ordination plot (\code{ordiplot}) which for one-axis models defaults to a stacked line.} \label{fig:plotm1} \end{figure} <>= plot(m1, label = TRUE) @ For multi-axis models this will display a line through origin with other gradients set to zero. It can optionally show the marginal model where the height of the curve would be the same on all gradients, but the default is a gradient line through the origin. For a classical ordination plot, we can use \pkg{vegan} \code{ordiplot} function that is able to handle \code{GO} results (Fig.~\ref{fig:plotm1}). <>= ordiplot(m1) @ Function \code{ordiplot} displays one-axis results with a \pkg{vegan} \code{linestack} function. With more axes, the default is to show ordination plots of both species and sites (Fig.~\ref{fig:plotm2}) \begin{figure} \centering \subfigure[]{\includegraphics[width=0.5\linewidth]{GO-ordiplot-m2a}}% \subfigure[]{\includegraphics[width=0.5\linewidth]{GO-ordiplot-m2b}} \caption{Two-dimensional Gaussian Ordination. \textbf{a} Species optima and gradient positions of SUs. \textbf{b} Only SUs.} \label{fig:plotm2} \end{figure} <>= ordiplot(m2, type="t") @ <>= ordiplot(m2, display="si") @ Low-dimensional solution is not a subspace in higher dimensions. We cannot add new dimensions to existing solutions, but ordination for each dimensionality must be found separately. In this respect GO and NMDS are similar. We can see this by fitting surface of one-dimensional solution onto two-dimensional solution (Fig.~\ref{fig:go-subspace}). \begin{figure} \centering \includegraphics[width=0.6\linewidth]{GO-go-subspace} \caption{One-dimensional Gaussian Ordination axis fitted to a two-dimensional solution as a smooth surface and as a vector arrow. The diameter of SU circles is proportional to the scores on the one-dimensional ordination.} \label{fig:go-subspace} \end{figure} <>= ordisurf(m2 ~ scores(m1), bubble=3) plot(envfit(m2 ~ scores(m1)), label="Dim1") @ As you see, \pkg{vegan} functions \code{ordisurf}, \code{envfit} and \code{scores} are can handle \code{GO} results. \subsection{Other methods} I will not launch any large scale comparison nor simulations. However, it may be good to peek how GO results compare with other methods: are they more or less similar or something completely different. I have used DCA solution as the initial values, and it has been claimed to approximate GO ML ordination\,--\,provided all species have $t=1$, equal heights and their optima ($u=b$) are evenly distributed along the gradients. GO used only species with frequency of $\ge 10$ and we shall analyse the same subset. <>= v10 <- varespec[, colSums(varespec>0) >= 10] ord <- decorana(v10) p1 <- procrustes(m2, ord, choices = 1:2) p1 plot(p1, to.target = FALSE, main="") @ \begin{figure} \centering \subfigure[DCA]{\includegraphics[width=0.5\textwidth]{GO-proc-dca}}% \subfigure[NMDS]{\includegraphics[width=0.5\textwidth]{GO-proc-nmds}} \caption{Procrustes rotation of 2-dimension Gaussian ordination (dots) vs (\textbf{a}) Detrended Correspondence Analysis and (\textbf{b}) Nonmetric Multidimensional Scaling (arrow heads). } \label{fig:proc-GO} \end{figure} The \pkg{vegan} \code{metaMDS} standardizes data by default, and this seems to have a large effect with these data. Therefore we turn off \code{autotransform} to analyse non-transformed data, but still use the default Bray-Curtis index. <>= nm <- metaMDS(v10, autotransform = FALSE, trace = FALSE) p2 <- procrustes(m2, nm) p2 plot(p2, to.target=FALSE, main="") @ The goodness of fit is measured in the units of the target, and the analyses are comparable. NMDS is clearly much more similar to GO than DCA, although DCA was used as the starting configuration in GO. \subsection{Fixed tolerance and free shapes} I defined the species richnesses to have strictly equal tolerances $t=1$. This does not mean that such responses really are the best ones for the gradient. In the following we fit similar second degree polynomials to all species without restricting the second degree coefficient, and plot the results over the canonical fit (Fig.~\ref{fig:freefit}): \begin{figure} \centering \includegraphics[width=0.6\textwidth]{GO-freefit} \caption{Gaussian ordination axis with species fitted with $t=1$ (solid lines) and allowing freely varying $t$ or even non-unimodal responses.} \label{fig:freefit} \end{figure} <>= ## take the same subset of species as used in GO v10 <- varespec[, colSums(varespec>0) >= 10] ## the gradient ax <- drop(scores(m1)) ## fit free quadratic quasibinomial GLMs mods <- lapply(v10, function(y) glm(cbind(y, 100-y) ~ ax + I(ax^2), family=quasibinomial)) ## predict values predx <- seq(min(ax), max(ax), len=101) fits <- sapply(mods, function(z) predict(z, type="response", newdata=data.frame(ax=predx))) ## plot plot(m1) matlines(predx, fits, lty=3) legend("topright", c("Fit with t=1", "Fit with free t"), lty=c(1,3), col=3) @ The widths of Gaussian responses differ from $t=1$ although this was imposed in fitting: <<>>= b <- sapply(mods, coef) rownames(b) sqrt(-1/2/b[3,]) @ The \code{NaN} cases had quadratic coefficient\;$>0$ which define a non-unimodal response. Surprisingly, even \code{Cla.ste} is among those non-unimodal species, although chapter~\ref{sec:contributions} will show it to be most important driver of the ordination axis. It seems that it would like to shoot up even more strongly than the restricted Gaussian allows. Even with unimodal species, the best fitting widths deviate from theoretical (and fitted) $t=1$. The difference is significant, although only barely so compared to astronomically low values in all other test (see chapter~\ref{sec:tests}): <<>>= devalt <- sum(sapply(mods, deviance)) devalt deviance(m1) ## we have 28 species dff <- 28 dfr <- df.residual(m1)-dff scl <- deviance(m1)/dfr f <- (deviance(m1) - devalt)/dff/scl f pf(f, dff, dfr, lower.tail = FALSE) @ \subsection{Tests} \label{sec:tests} We fit ML models which are special cases of GLM, and therefore we can perform all typical tests. I have packed basic ANOVA as a separate function. An overall test of the fitted model is <<>>= anova(m2) @ The significances are very high (that is, $P$-values are low). The degrees of freedom take into account that the gradient also is estimated, but the tests still are biased. We have selected $x$ to minimize residual deviance and maximize $F$ statistics. I do not know yet how to develop more correct tests. Simple permutation is out of question due to slow calculation times. The above tests gave the overall statistic for the whole multidimensional model. The packaged \code{anova} function allows testing a sequence of models for their difference. Comparison of \code{m2} against \code{m1} tests the significance of adding second axis to a one-axis model. Let us first see the significance of one-dimensional ordination: <<>>= anova(m1) @ Then we can see how adding dimensions improves the results: <<>>= anova(m1, m2, m3) @ \subsection{Contributions by species} \label{sec:contributions} This chapter is for those who complain that ordination tells us nothing about species. The deviance of the model is the sum of deviances of individual species. This test is even more approximate than previous since we cannot take into account that the gradient $x$ was estimated to minimize the sum of deviances over all species. The degrees of freedom are based only on species parameters\footnote{I could take the residual d.f. per species from a residual d.f. of the complete model divided by the number of species so that the estimation of row parameters $x$ would be divided evenly over species. This would reduce $F$ statistics and number of d.f. and move tests to more correct direction. The implemenation of $F$ distribution in \R{} accepts decimal d.f.} The canned test procedure for species is called \code{spanodev} (I know, this is an ugly name). It can be called with one model and in that case the overall model is compared against the null model. Alternatively, the function can be called with two models (but not with more models), and then these are compared against each other. Let us see first how one dimension ``explains'' each species: <<>>= spanodev(m1) @ %\begin{figure} % \centering % \includegraphics[width=0.7\linewidth]{DSCN1123.jpg} % \caption{\emph{Cladina stellaris}, the most influential species in the ordina%tion example.} % \label{fig:Claste} %\end{figure} It is remarkable how unevenly species contribute to the model fit, and some species are actually more poorly explained by the model than the Null of constant abundance. One single species (\code{Cla.ste}, \emph{Cladina stellaris}) %, Fig.~\ref{fig:Claste}) contributes more than half of the modelled deviance. The situation is similar as in non-scaled PCA where species with large variance are the only worth explaining. Some weight equalizing trick should be developed. However, \emph{Cladina stellaris} is an important species and probably also a keystone species which many other species depend on. Moreover, the major external driver in these data is reindeer grazing which reduces the \emph{C. stellaris} cover drastically. We cannot say that method fails if it picks up this species. Then we can see how adding second axis improves fits for each species: <<>>= spanodev(m1, m2) @ Adding second axis picks up new important species. The explanation for \emph{Cladina stellaris} does not improve: all was explained in the first axis. \subsection{Failures} Even with limited tests, I have found cases where Gaussian Ordination seems to fail. Sometimes it fails in an informative way, sometimes mysteriously. We fit a Gaussian response model to the data, and if that model is not appropriate, ordination should fail. One such failure case are the classic Duch Dune Meadows. Actually, one reason why I dropped developing Gaussian Ordination years ago was that I used Dune Meadows as a test case, and regarded its failure as a failure of the method. That also explains some arbitrary choices I made with the current draft version: fixing tolerances to unit scale, and starting with Binomial models with a ceiling to a response. But let us have a look at the problems: \begin{figure} \centering \subfigure[]{\includegraphics[width=0.5\textwidth]{GO-dune-resp}}% \subfigure[]{\includegraphics[width=0.5\textwidth]{GO-dune-ordi}} \caption{Gaussian Ordination of Dune Meadows.} \label{fig:dune} \end{figure} <>= data(dune) mdu1 <- GO(dune, k=1, iterlim=1000) mdu1 plot(mdu1, label=TRUE) @ <>= ordiplot(mdu1) @ The analysis did not converge, and the results are surprising (Fig.~\ref{fig:dune}). SUs are polarized in two groups very far away from each other. It seems that GO thinks that there really is no continuous gradient. Most species have their optima in the empty middle region of the axis, and species are almost sure to occur in SUs that do not exist, since all SUs are somewhere else than the species optima (Fig.~\ref{fig:dune}). There is no penalty of fitting any values for areas with no data. The problem was the same in earlier implementation years ago when I used quasi-Poisson models that shot up to the sky. I now switched to quasi-Binomial to avoid this, but it had no effect. The fit is evaluated at data points, and if there are no points, you can do anything with no penalty. I have feeling that most species are ubiquitous, but there is one species of dry patches (\emph{Anthoxanthum odoratum}), and some species of wet patches (\emph{Agrostis stolonifera}, \emph{Juncus articulatus}, \emph{Ranunculus flammula}, \emph{Eleocharis palustris}). I used the term ``patches'' because it may be that whole plot is not dry or wet, but it has a dry (hummock) or a wet (depression) patch in addition to normal flat ground with its ubiquitous species. I think Dune Meadows deserves to fail, and a method should fail if its strict assumptions are not appropriate. Preferably it should fail loudly, which was not the case now. Naturally, such a method is not robust, but should ML GO be robust? Isn't it Maximum Likelihood? Isn't it Gaussian? If data are not such, no ML GO should work\dots However, there is another case which seems to fail, and this is more problematic since here the method should have worked: GO of Mt.\,Field. I intended to send you brief overview of my experiments last Sunday, but then light mindedly decided to also run a GO on the full Mt.\,Field data set: all 398\,SUs, but only 127 most common species (frequency\,$\ge 10$). My 1.6\,GHz laptop finished the analysis on Tuesday evening, and I closed the lid only when we drove from Tuusniemi to Oulu and when I rode to job on my bicycle. It took well over 50\,hrs, and then it ended in a local optimum after 952\;iterations. I had expected a clean result, but it looks strange (Fig.~\ref{fig:mtf-nmds}). \begin{figure} \centering \includegraphics[width=0.6\textwidth]{go2mtf} \caption{Two-dimensionsl Gaussian Ordination of the Mt.\,Field data.} \label{fig:mtf-nmds} \end{figure} Fitted vectors look OK: Altitude $R^2 = 0.8427$ vs. $R^2 = 0.8336$ and Drainage $R^2 = 0.7579$ vs. $R^2=0.7610$ in NMDS with the same subset of species. Moreover, Procrustes analysis hints that this has changed from the starting configuration of DCA, since Procrustes errors are smaller against NMDS than DCA. However, there are these strange straight lines that close the points in a kind of convex polygon which indicates that there is something fishy in the result. I do not know what that could be, but it does not smell success. \end{document}