pcurve {pcurve}R Documentation

Principal Curve Analysis

Description

Fits a principal curve to a numeric multivariate dataset in arbitrary dimensions. Produces diagnostic plots.

Usage

pcurve(x, xcan = NULL, start = "ca", rank = FALSE, cv.fit = FALSE,
penalty= 1, cv.all = FALSE, df = "vary", fit.meth = "spline",
canfit = "lm",candf = FALSE, vary.adj = FALSE, subset,
robust = FALSE, lowf = 0.5, min.df, max.df, max.df.cv.fit,
ext.dist = TRUE, ext.dc = 0.9, metric = "bray", latent = FALSE,
plot.pca = TRUE, thresh = 0.001, plot.true = TRUE,
plot.init = FALSE, plot.segs = TRUE, plot.resp = TRUE,
plot.cov = TRUE, maxit = 10, stretch = 2, fits = FALSE,
prnt.fits = TRUE, trace = TRUE, trace.all = FALSE, pch = 1,
row.chk0 = FALSE, col.chk0 = TRUE, use.loc = FALSE)
 

Arguments

x numeric data matrix or data.frame.
xcan data.frame or matrix of explanatory variables to be used in constrained PCs.
start specifies how to determine the starting configuration (location of points on initial curve): "ca" = correspondence analysis; "pca" = principal components analysis with Euclidan metric; "pca.bc" = principal components analysis with Bray-Curtis metric; "mds" = non-metric multidimensional scaling with Euclidean metric; "mds.bc" = non-metric multidimensional scaling with Bray-Curtis metric; "cs.bc" = classical scaling (metric multidimensional scaling) with Bray-Curtis metric; "ran" = random start. Or if start is numeric and of length dim(x)[1] a user supplied configuration will be used.
rank if TRUE starting configuration is transformed to rank
cv.fit if TRUE a final iteration using cross-validation is done.
penalty penalty for smoothing spline. A value of 1 corresponds to no penalty with values > 1 giving a less-smoothed fit. Increasing the penalty for small data sets can reduce over-fitting. If penalty = "np", penalty = 1 for N > 1000, penalty = 2 for N <=100, and penalty = 4-log(N, 10) for N > 100 and N <= 1000.
cv.all if TRUE a cross-validated smoothing spline fit at each iteration.
df if numeric specifies the df for the smoothing spline.
fit.meth specifies smoother. "spline" = smooth.spline, "poisson" = poisson general additive model, "binomial" = binomial general additive model, "lowess" = lowess smoother (this argument overridden by robust = TRUE).
canfit "lm" or "gam", model used to relate pc to xcan.
candf if canfit = "gam", df for model. May be a single value or a vector of FALSE or positive integers indicating dfs for each explanatory variable in xcan. If FALSE, this is equivalent to fx=FALSE in gam, and d.f. is selected by GCV.UBRE
vary.adj if FALSE the same df are used for the smooth of each variable, otherwise each variable has its own df.
subset used to take a subset of x and start (if numeric).
robust if TRUE uses lowess smooths, if FALSE uses smoothing spline.
lowf specifies the span of the lowess smooth.
min.df specifies the min df for the smoothing.
max.df specifies the max df for the smoothing.
max.df.cv.fit
ext.dist if TRUE extended dissimilarities in calculation of initial configuration using the flexible shortest path. If FALSE standard dissimilarites are used (see De'ath, 1999b and stepacross in package vegan).
ext.dc critical distance, the toolong argument in stepacross.
metric similarity metric, the method argument in vegdist in package vegan.
latent if FALSE locations are rescaled after each iteration to give distance along the curve; if TRUE no rescaling is done.
plot.pca if TRUE the fitting is plotted (assuming plot.true = TRUE) in the first 2 dimensions of PCA space.
thresh threshold value of difference in cross-validation for ceasing iteration
plot.true if TRUE the fitting process is plotted.
plot.init if TRUE the initial fits to each variable are plotted.
plot.segs if TRUE segments linking the fitted points on the curves to their corresponding data points are plotted.
plot.resp if TRUE the final response curves are plotted.
plot.cov if TRUE covariate partial effects are plotted (only if xcan is not null).
maxit specifies the maximin number of iterations.
stretch end segments of the curve are stretched by this factor at each iteration.
fits if TRUE value of pcurve includes diagnostics for each variable.
prnt.fits statistics on model fits printed.
trace prints out useful fitting diagnostics at each iteration.
trace.all if TRUE prints out all curve details at each iteration.
pch symbol for plots
row.chk0 if TRUE checks for and removes rows of x identically 0.
col.chk0 if TRUE checks for and removes columns of x identically 0.
use.loc if TRUE pauses during the fitting displays (left mouse-click to progress to next plot).

Details

See De'ath (1999a) for a full discussion of the functions and their application.

Value

An object of class principal curve containing a list comprising

s fitted values
tag order of points along the curve
lambda locations along the curve
dist sum of squared distances of points from the curve
c call to pcurve
x data to which the curve was fitted
df degrees of freedom for the smoothers used in the fit
fit.list diagnostics for each variable, only included if fits = TRUE.

Author(s)

R port by Chris Walsh Chris.Walsh@sci.monash.edu.au from S+ library by Glenn De'ath g.death@aims.gov.au. Original S code for principal curve analysis by Trevor Hastie hastie@stat.stanford.edu.

References

De'ath, G. 1999a Principal Curves: a new technique for indirect and direct gradient analysis. Ecology 80, 2237–2253.

De'ath, G. 1999b Extended dissimilarity: method of robust estimation of ecological distances with high beta diversity. Plant Ecology 144, 191–199.

Gittins, R. 1985 Canonical Analysis. A review with applications in ecology. Berlin: Springer-Verlag.

Hastie, T.J and Tibshirani, R.J. 1990 Generalized additive models. London: Chapman and Hall.

Hastie, T.J. and Stuetzle, W. 1989 Principal Curves. Journal of the American Statistical Association 84, 502–516.

See Also

pcdiags.plt, vegdist, stepacross

Examples

#a simulated dataset with 4 response variables (taxa 1-4),
#n=100.  The response curve is Gaussian and noise is Poisson.
    data(sim4var)
    sim4fit <-  pcurve(sim4var, plot.init = FALSE, use.loc = TRUE)

#Limestone grassland community example worked by De'ath (1999a),
#from data in Gittins (1985)
    data(soilspec)
    species <- sqrt(soilspec[,2:9])
    envvar <- soilspec[,10:12]
#indirect gradient analysis
    spec.fit <- pcurve(species, start = "mds.bc", plot.init = FALSE,
                       use.loc = TRUE)
#direct gradient analysis
    soilspec.fit <- pcurve(species, xcan = envvar, 
                           start = "mds.bc", plot.init = FALSE,  
                           fits = TRUE, prnt.fits = TRUE,
                           use.loc = TRUE)

[Package pcurve version 0.6-2 Index]