Workshop 7 - Nested, randomized block, repeated measures and partly-nested designs (mixed effects models)

4 August 2011

Topics

Nested ANOVA
- Exercise 1
Randomized block
- Exercise 2
Split-plot
- Exercise 3
- Exercise 4

Basic statistics references

Logan (2010) - Chpt 12-14
Quinn & Keough (2002) - Chpt 9-11

Very basic overview of Nested ANOVA

Randomized block

Nested ANOVA - one between factor

In an unusually detailed preparation for an Environmental Effects Statement for a proposed discharge of dairy wastes into the Curdies River, in western Victoria, a team of stream ecologists wanted to describe the basic patterns of variation in a stream invertebrate thought to be sensitive to nutrient enrichment. As an indicator species, they focused on a small flatworm, Dugesia, and started by sampling populations of this worm at a range of scales. They sampled in two seasons, representing different flow regimes of the river - Winter and Summer. Within each season, they sampled three randomly chosen (well, haphazardly, because sites are nearly always chosen to be close to road access) sites. A total of six sites in all were visited, 3 in each season. At each site, they sampled six stones, and counted the number of flatworms on each stone.

Download Curdies data set

Format of curdies.csv data files

SEASON	SITE	DUGESIA	S4DUGESIA
WINTER	1	0.648	0.897
..	..	..	..
WINTER	2	1.016	1.004
..	..	..	..
WINTER	3	0.689	0.991
..	..	..	..
SUMMER	4	0	0
..	..	..	..

Each row represents a different stone

SEASON	Season in which flatworms were counted - fixed factor
SITE	Site from where flatworms were counted - nested within SEASON (random factor)
DUGESIA	Number of flatworms counted on a particular stone
S4DUGESIA	4th root transformation of DUGESIA variable

Open

the curdies data file.

Show code

> curdies <- read.table("curdies.csv", header = T, sep = ",", strip.white = T)

> head(curdies)

SEASON SITE DUGESIA S4DUGES

1 WINTER 1 0.6476829 0.8970995

2 WINTER 1 6.0961516 1.5713175

3 WINTER 1 1.3105639 1.0699526

4 WINTER 1 1.7252788 1.1460797

5 WINTER 1 1.4593867 1.0991136

6 WINTER 1 1.0575610 1.0140897

The SITE variable is supposed to represent a random factorial variable (which site). However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor (HINT)

Show code

> curdies$SITE <- as.factor(curdies$SITE)

Notice the data set - each of the nested factors is labelled differently - there can be no replicate for the random (nesting) factor.

Q1-1. What are the main hypotheses being tested?

H₀ Effect 1:
H₀ Effect 2:

Q1-2. In the table below, list the assumptions of nested ANOVA along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption	Diagnostic/Risk Minimization
I.
II.
III.

Q1-3. Check these assumptions (HINT).

Show code

> library(nlme)

> curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, mean)

Note that for the effects of SEASON (Factor A in a nested model) there are only three values for each of the two season types. Therefore, boxplots are of limited value! Is there however, any evidence of violations of the assumptions (HINT)?

Show code

> boxplot(DUGESIA ~ SEASON, curdies.ag)

(Y or N)
If so, assess whether a transformation will address the violations (HINT) and then make the appropriate corrections

Show code

> curdies$S4DUGES <- sqrt(sqrt(curdies$DUGESIA))

> curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, mean)

> boxplot(S4DUGES ~ SEASON, curdies.ag)

Q1-4. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Effect	Nominator (Mean Sq, df)	Denominator (Mean Sq, df)
SEASON
SITE

Q1-5. If there is no evidence of violations, test the model;
S4DUGES = SEASON + SITE + CONSTANT
using a nested ANOVA

(HINT). Fill (HINT) out the table below, make sure that you have treated SITE as a random factor when compiling the overall results.

Show code

> curdies.aov <- aov(S4DUGES ~ SEASON + Error(SITE), data = curdies)

> summary(curdies.aov)

Error: SITE

Df Sum Sq Mean Sq F value Pr(>F)

SEASON 1 5.5709 5.5709 34.496 0.004198 **

Residuals 4 0.6460 0.1615

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

Residuals 30 4.5555 0.15185

> summary(lm(curdies.aov))

Call:

lm(formula = curdies.aov)

Residuals:

Min 1Q Median 3Q Max

-0.3811 -0.2618 -0.1381 0.1652 0.9023

Coefficients: (1 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.3811 0.1591 2.396 0.02303 *

SEASONWINTER 0.7518 0.2250 3.342 0.00224 **

SITE2 0.1389 0.2250 0.618 0.54156

SITE3 -0.2651 0.2250 -1.178 0.24798

SITE4 -0.0303 0.2250 -0.135 0.89376

SITE5 -0.2007 0.2250 -0.892 0.37955

SITE6 NA NA NA NA

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3897 on 30 degrees of freedom

Multiple R-squared: 0.5771, Adjusted R-squared: 0.5066

F-statistic: 8.188 on 5 and 30 DF, p-value: 5.718e-05

> library(gmodels)

> ci(lm(curdies.aov))

Estimate CI lower CI upper Std. Error p-value

(Intercept) 0.38112229 0.05622464 0.7060200 0.1590863 0.023031342

SEASONWINTER 0.75181980 0.29234513 1.2112945 0.2249821 0.002241586

SITE2 0.13892774 -0.32054693 0.5984024 0.2249821 0.541560368

SITE3 -0.26507141 -0.72454609 0.1944033 0.2249821 0.247982767

SITE4 -0.03030096 -0.48977564 0.4291737 0.2249821 0.893763109

SITE5 -0.20066007 -0.66013474 0.2588146 0.2249821 0.379548166

Q1-6. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Source of variation	df	Mean Sq	F-ratio	P-value
SEASON
SITE
Residuals

Estimate	Mean	Lower 95% CI	Upper 95% CI
Summer
Effect size (Winter-Summer)

Normally, we are not interested in formally testing the effect of the nested factor to get the correct F test for the nested factor (SITE), examine a representation of the anova table of the fitted linear model that assumes all factors are fixed (HINT)

Q1-7. What are your conclusions (statistical and biological)?

Q1-8. Now analyse these data using a linear mixed effects model (REML).

Show code - lme

> library(nlme)

> curdies.lme <- lme(S4DUGES ~ SEASON, random = ~1 | SITE, data = curdies)

> summary(curdies.lme)

Linear mixed-effects model fit by REML

Data: curdies

AIC BIC logLik

46.42966 52.5351 -19.21483

Random effects:

Formula: ~1 | SITE

(Intercept) Residual

StdDev: 0.04008598 0.3896803

Fixed effects: S4DUGES ~ SEASON

Value Std.Error DF t-value p-value

(Intercept) 0.3041353 0.0947195 30 3.210905 0.0031

SEASONWINTER 0.7867589 0.1339536 4 5.873369 0.0042

Correlation:

(Intr)

SEASONWINTER -0.707

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.2064782 -0.7680513 -0.2480590 0.3531126 2.3209354

Number of Observations: 36

Number of Groups: 6

#Calculate the confidence interval for the effect size of the main effect of season

> intervals(curdies.lme)

Approximate 95% confidence intervals

Fixed effects:

lower est. upper

(Intercept) 0.1106923 0.3041353 0.4975783

SEASONWINTER 0.4148441 0.7867589 1.1586737

attr(,"label")

[1] "Fixed effects:"

Random Effects:

Level: SITE

lower est. upper

sd((Intercept)) 1.837163e-07 0.04008598 8746.562

Within-group standard error:

lower est. upper

0.3025499 0.3896803 0.5019033

#Examine the ANOVA table with Type I (sequential) SS

> anova(curdies.lme)

numDF denDF F-value p-value

(Intercept) 1 30 108.45711 <.0001

SEASON 1 4 34.49647 0.0042

Show code - lmer

> library(lme4)

> curdies.lmer <- lmer(S4DUGES ~ SEASON + (1 | SITE), data = curdies)

> summary(curdies.lmer)

Linear mixed model fit by REML

Formula: S4DUGES ~ SEASON + (1 | SITE)

Data: curdies

AIC BIC logLik deviance REMLdev

46.44 52.77 -19.22 32.52 38.44

Random effects:

Groups Name Variance Std.Dev.

SITE (Intercept) 1.1647e-13 3.4127e-07

Residual 1.5299e-01 3.9113e-01

Number of obs: 36, groups: SITE, 6

Fixed effects:

Estimate Std. Error t value

(Intercept) 0.30414 0.09219 3.299

SEASONWINTER 0.78676 0.13038 6.034

Correlation of Fixed Effects:

(Intr)

SEASONWINTE -0.707

#Calculate the confidence interval for the effect size of the main effect of season

> library(languageR)

> pvals.fnc(curdies.lmer, withMCMC = FALSE, addPlot = F)

$fixed

Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)

(Intercept) 0.3041 0.3059 0.0791 0.5443 0.0176 0.0023

SEASONWINTER 0.7868 0.7848 0.4724 1.1253 0.0014 0.0000

$random

Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper

1 SITE (Intercept) 0.0000 0.0562 0.0827 0.0000 0.2652

2 Residual 0.3911 0.3878 0.3928 0.3032 0.4922

#Examine the effect season via a likelihood ratio test

> curdies.lmer1 <- lmer(S4DUGES ~ SEASON + (1 | SITE), data = curdies,

> REML = F)

> curdies.lmer2 <- lmer(S4DUGES ~ 1 + (1 | SITE), data = curdies,

> REML = F)

> anova(curdies.lmer1, curdies.lmer2, test = "F")

Data: curdies

Models:

curdies.lmer2: S4DUGES ~ 1 + (1 | SITE)

curdies.lmer1: S4DUGES ~ SEASON + (1 | SITE)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

curdies.lmer2 3 51.831 56.581 -22.916

curdies.lmer1 4 40.519 46.853 -16.259 13.312 1 0.0002637 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

LOOK into ezStats in ez package

Show code - ezStats

> library(ez)

> ezStats(dv = .(S4DUGES), between = .(SEASON), wid = .(SITE),

> data = curdies)

SEASON N Mean SD FLSD

1 SUMMER 3 0.3041353 0.1081703 0.3719147

2 WINTER 3 1.0908942 0.2052556 0.3719147

Show code - ezAnova

> library(ez)

> ezANOVA(dv = .(S4DUGES), between = .(SEASON), wid = .(SITE),

> data = curdies, type = 3)

$ANOVA

Effect DFn DFd F p p<.05 ges

1 SEASON 1 4 34.49649 0.004197673 * 0.8960944

$`Levene's Test for Homogeneity of Variance`

DFn DFd SSn SSd F p p<.05

1 1 4 0.006891131 0.05167755 0.5333945 0.5056576

Plot predicted mean square-root transformed number of Dugesia (and confidence interval) for each season

Show code

> library(gplots)

> library(AICcmodavg)

> curdies.predict <- predictSE.mer(curdies.lmer, newdata = data.frame(SEASON = c("SUMMER",

> "WINTER")))

> par(mar = c(5, 5, 1, 1))

> plot(1:2, curdies.predict$fit, type = "n", axes = F, ann = F,

> ylim = range(0, 1.5), xlim = c(0.5, 2.5))

> plotCI(curdies.predict$fit, uiw = curdies.predict$se.fit * 2,

> las = 1, add = T)

> axis(1, at = c(1, 2), lab = levels(curdies$SEASON))

> axis(2, las = 1)

> mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

> mtext("Season", 1, line = 3, cex = 1.25)

> box(bty = "l")

OR the more flexible way, but that requires more code and thought:

Show code

> newdata <- expand.grid(SEASON = c("SUMMER", "WINTER"), S4DUGES = 0)

> mod.mat <- model.matrix(terms(curdies.lmer), newdata)

> newdata$S4DUGES = mod.mat %*% fixef(curdies.lmer)

> pvar1 <- (diag(mod.mat %*% tcrossprod(vcov(curdies.lmer), mod.mat)))

> tvar1 <- pvar1 + VarCorr(curdies.lmer)$SITE[1]

> newdata <- data.frame(newdata, plo = newdata$S4DUGES - 2 * sqrt(pvar1),

> phi = newdata$S4DUGES + 2 * sqrt(pvar1), tlo = newdata$S4DUGES -

> 2 * sqrt(tvar1), thi = newdata$S4DUGES + 2 * sqrt(tvar1))

> par(mar = c(5, 5, 1, 1))

> plot(S4DUGES ~ as.numeric(SEASON), newdata, type = "n", axes = F,

> ann = F, ylim = range(0, newdata$phi), xlim = c(0.5, 2.5))

> points(curdies$DUGESIA ~ jitter(as.numeric(curdies$SEASON)),

> pch = 16, col = "grey")

> with(newdata, arrows(as.numeric(SEASON), plo, as.numeric(SEASON),

> phi, ang = 90, length = 0.05, code = 3))

> with(newdata, points(as.numeric(SEASON), S4DUGES, pch = 21, bg = "black",

> col = "black"))

> axis(1, at = c(1, 2), lab = newdata$SEASON)

> axis(2, las = 1)

> mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

> mtext("Season", 1, line = 3, cex = 1.25)

> box(bty = "l")

Plot predicted mean transformed number of Dugesia (and confidence interval) for each season. Note that these data were modelled with a Gaussian distribution (this might not have been appropriate)

Show code

> newdata[, -1] <- newdata[, -1]^4

> par(mar = c(5, 5, 1, 1))

> plot(S4DUGES ~ as.numeric(SEASON), newdata, type = "n", axes = F,

> ann = F, ylim = range(newdata$plo, newdata$phi), xlim = c(0.5,

> 2.5))

> points(curdies$DUGESIA ~ jitter(as.numeric(curdies$SEASON)),

> pch = 16, col = "grey")

> with(newdata, arrows(as.numeric(SEASON), plo, as.numeric(SEASON),

> phi, ang = 90, length = 0.05, code = 3))

> with(newdata, points(as.numeric(SEASON), S4DUGES, pch = 21, bg = "black",

> col = "black"))

> axis(1, at = c(1, 2), lab = newdata$SEASON)

> axis(2, las = 1)

> mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

> mtext("Season", 1, line = 3, cex = 1.25)

> box(bty = "l")

Note, backtransforming the standard deviations has the reverse effect to desired (since they are less than 1).

Show code

> library(gplots)

> library(AICcmodavg)

> curdies.predict <- predictSE.mer(curdies.lmer, newdata = data.frame(SEASON = c("SUMMER",

> "WINTER")))

> curdies.predict$fit <- curdies.predict$fit^2

> curdies.predict$se.fit <- curdies.predict$se.fit^2

> par(mar = c(5, 5, 1, 1))

> plot(1:2, curdies.predict$fit, type = "n", axes = F, ann = F,

> ylim = range(0, 1.5), xlim = c(0.5, 2.5))

> plotCI(curdies.predict$fit, uiw = curdies.predict$se.fit * 2,

> las = 1, add = T)

> axis(1, at = c(1, 2), lab = levels(curdies$SEASON))

> axis(2, las = 1)

> mtext("Number of Dugesia per stone", 2, line = 3, cex = 1.25)

> mtext("Season", 1, line = 3, cex = 1.25)

> box(bty = "l")

Q1-9. Where is the major variation in numbers of flatworms? Between (seasons, sites or stones)?

Show code

> library(nlme)

> nlme:::VarCorr(lme(S4DUGES ~ 1, random = ~1 | SEASON/SITE, curdies))

Variance StdDev

SEASON = pdLogChol(1)

(Intercept) 0.300522947 0.54819973

SITE = pdLogChol(1)

(Intercept) 0.001606859 0.04008565

Residual 0.151850793 0.38968037

Q1-10. How might this information influence the design of future experiments on Dugesia in terms of:

What influences the abundance of Dugesia
Where best to focus sampling effort to maximize statistical power?

Q1-11.Finally, construct an appropriate summary figure to accompany the above analyses. Note that this should use the correct replicates for depicting error.

Show code

> opar <- par(mar = c(5, 5, 1, 1))

> means <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, mean)

> sds <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, sd)

> lens <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, length)

> ses <- sds/sqrt(lens)

> xs <- barplot(means, beside = T, ann = F, axes = F, ylim = c(0,

> max(means + ses)), axisnames = F)

> arrows(xs, means - ses, xs, means + ses, code = 3, length = 0.05,

> ang = 90)

> axis(1, at = xs, lab = c("Summer", "Winter"))

> mtext("Season", 1, line = 3, cex = 1.25)

> axis(2, las = 1)

> mtext("Mean number of Dugesia per stone", 2, line = 3, cex = 1.25)

> box(bty = "l")

Randomized block

A plant pathologist wanted to examine the effects of two different strengths of tobacco virus on the number of lesions on tobacco leaves. She knew from pilot studies that leaves were inherently very variable in response to the virus. In an attempt to account for this leaf to leaf variability, both treatments were applied to each leaf. Eight individual leaves were divided in half, with half of each leaf inoculated with weak strength virus and the other half inoculated with strong virus. So the leaves were blocks and each treatment was represented once in each block. A completely randomised design would have had 16 leaves, with 8 whole leaves randomly allocated to each treatment.

Download Tobacco data set

Format of tobacco.csv data files

LEAF	TREAT	NUMBER
1	Strong	35.898
1	Week	25.02
2	Strong	34.118
2	Week	23.167
3	Strong	35.702
3	Week	24.122
...	...	...

LEAF	The blocking factor - Factor B
TREAT	Categorical representation of the strength of the tobacco virus - main factor of interest Factor A
NUMBER	Number of lesions on that part of the tobacco leaf - response variable

Open the tobacco data file.

Show code

> tobacco <- read.table("tobacco.csv", header = T, sep = ",", strip.white = T)

> head(tobacco)

LEAF TREATMENT NUMBER

1 L1 Strong 35.89776

2 L1 Weak 25.01984

3 L2 Strong 34.11786

4 L2 Weak 23.16740

5 L3 Strong 35.70215

6 L3 Weak 24.12191

Since each level of treatment was applied to each leaf, the data represents a randomized block design with leaves as blocks.

The variable LEAF contains a list of Leaf identification numbers and is supposed to represent a factorial blocking variable. However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor

(HINT).

Show code

> tobacco$LEAF <- as.factor(tobacco$LEAF)

Q2-1. What are the main hypotheses being tested?

H₀ Factor A:
H₀ Factor B:

Q2-2. In the table below, list the assumptions of a randomized block design along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption Diagnostic/Risk Minimization
I.

II.

III.

IV.
Is the proposed model balanced?
(Yor N)

Show code

> replications(NUMBER ~ LEAF + TREATMENT, data = tobacco)

LEAF TREATMENT

2 8

> !is.list(replications(NUMBER ~ LEAF + TREATMENT, data = tobacco))

[1] TRUE

Q2-3. There are a number of ways of diagnosing block by within block interactions.

The simplest method is to plot the number of lesions for each treatment and leaf combination (ie. an interaction plot
(HINT). Any evidence of an interaction? Note that we cannot formally test for an interaction because we don't have replicates for each treatment-leaf combination!

Show code

> library(car)

> interaction.plot(tobacco$LEAF, tobacco$TREAT, tobacco$NUMBER)

We can also examine the residual plot from the fitted linear model. A curvilinear pattern in which the residual values switch from positive to negative and back again (or visa versa) over the range of predicted values implies that the scale (magnitude but not polarity) of the main treatment effects differs substantially across the range of blocks. (HINT). Any evidence of an interaction?

Show code

> residualPlots(lm(NUMBER ~ LEAF + TREATMENT, data = tobacco))

Test stat Pr(>|t|)

LEAF NA NA

TREATMENT NA NA

Tukey test -0.18 0.858

Perform a Tukey's test for non-additivity evaluated at α = 0.10 or even &alpha = 0.25. This (curvilinear test) formally tests for the presence of a quadratic trend in the relationship between residuals and predicted values. As such, it too is only appropriate for simple interactions of scale. (HINT). Any evidence of an interaction?

Show code

> library(alr3)

> tukey.nonadd.test(lm(NUMBER ~ LEAF + TREATMENT, data = tobacco))

Test Pvalue

-0.1795228 0.8575272

Examine a lattice graphic to determine the consistency of the effect across each of the blocks(HINT).

Show code

> library(lattice)

> print(barchart(NUMBER ~ TREATMENT | LEAF, data = tobacco))

Q2-4. Analyse these data with a classic randomized block ANOVA

(HINT) to test the H₀ that there is no difference in the number of lesions produced by the different strength viruses. Complete the table below.

Show code

> tobacco.aov <- aov(NUMBER ~ TREATMENT + Error(LEAF), data = tobacco)

> summary(tobacco.aov)

Error: LEAF

Df Sum Sq Mean Sq F value Pr(>F)

Residuals 7 292.09 41.727

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

TREATMENT 1 248.34 248.339 17.172 0.004328 **

Residuals 7 101.23 14.462

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Source of variation	df	Mean Sq	F-ratio	P-value
LEAF (block)
TREAT
Residuals

Q2-5. Interprete the results.

Q2-6. Now analyse these data with a linear mixed effects model (REML).

Show code

> library(nlme)

> tobacco.lme <- lme(NUMBER ~ TREATMENT, random = ~1 | LEAF, data = tobacco,

> method = "ML")

> summary(tobacco.lme)

Linear mixed-effects model fit by maximum likelihood

Data: tobacco

AIC BIC logLik

102.4911 105.5814 -47.24553

Random effects:

Formula: ~1 | LEAF

(Intercept) Residual

StdDev: 3.453752 3.557307

Fixed effects: NUMBER ~ TREATMENT

Value Std.Error DF t-value p-value

(Intercept) 34.94013 1.873989 7 18.644793 0.0000

TREATMENTWeak -7.87938 1.901460 7 -4.143858 0.0043

Correlation:

(Intr)

TREATMENTWeak -0.507

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.73025327 -0.51797920 0.01211696 0.43723932 1.52636286

Number of Observations: 16

Number of Groups: 8

> library(gmodels)

> ci(tobacco.lme)

Estimate CI lower CI upper Std. Error DF p-value

(Intercept) 34.940130 30.50885 39.371409 1.873989 7 3.169056e-07

TREATMENTWeak -7.879381 -12.37562 -3.383142 1.901460 7 4.328219e-03

> anova(tobacco.lme)

numDF denDF F-value p-value

(Intercept) 1 7 368.4997 <.0001

TREATMENT 1 7 17.1716 0.0043

Show code

> library(lme4)

> tobacco.lmer <- lmer(NUMBER ~ TREATMENT + (1 | LEAF), data = tobacco)

> summary(tobacco.lmer)

Linear mixed model fit by REML

Formula: NUMBER ~ TREATMENT + (1 | LEAF)

Data: tobacco

AIC BIC logLik deviance REMLdev

96.71 99.8 -44.35 94.49 88.71

Random effects:

Groups Name Variance Std.Dev.

LEAF (Intercept) 13.632 3.6922

Residual 14.462 3.8029

Number of obs: 16, groups: LEAF, 8

Fixed effects:

Estimate Std. Error t value

(Intercept) 34.940 1.874 18.645

TREATMENTWeak -7.879 1.901 -4.144

Correlation of Fixed Effects:

(Intr)

TREATMENTWk -0.507

#Calculate the confidence interval for the effect size of the main effect of season

> library(languageR)

> pvals.fnc(tobacco.lmer, withMCMC = FALSE, addPlot = F)

$fixed

Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)

(Intercept) 34.940 34.898 30.83 38.789 0.0001 0.000

TREATMENTWeak -7.879 -7.826 -12.82 -2.224 0.0072 0.001

$random

Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper

1 LEAF (Intercept) 3.6922 1.1625 1.3775 0.0000 3.8555

2 Residual 3.8029 5.0287 5.1814 3.2469 7.5238

#Examine the effect season via a likelihood ratio test

> tobacco.lmer1 <- lmer(NUMBER ~ TREATMENT + (1 | LEAF), data = tobacco,

> REML = F)

> tobacco.lmer2 <- lmer(NUMBER ~ 1 + (1 | LEAF), data = tobacco,

> REML = F)

> anova(tobacco.lmer1, tobacco.lmer2, test = "F")

Data: tobacco

Models:

tobacco.lmer2: NUMBER ~ 1 + (1 | LEAF)

tobacco.lmer1: NUMBER ~ TREATMENT + (1 | LEAF)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

tobacco.lmer2 3 110.47 112.79 -52.235

tobacco.lmer1 4 102.49 105.58 -47.246 9.9786 1 0.001584 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Show code - ezAnova

> library(ez)

> ezANOVA(dv = .(NUMBER), within = .(TREATMENT), wid = .(LEAF),

> data = tobacco, type = 3)

$ANOVA

Effect DFn DFd F p p<.05 ges

2 TREATMENT 1 7 17.17156 0.004328219 * 0.3870228

Plot predicted mean square-root transformed number of Dugesia (and confidence interval) for each season

Show code

> newdata <- expand.grid(TREATMENT = c("STRONG", "WEAK"), NUMBER = 0)

> mod.mat <- model.matrix(terms(tobacco.lmer), newdata)

> newdata$NUMBER = mod.mat %*% fixef(tobacco.lmer)

> pvar1 <- (diag(mod.mat %*% tcrossprod(vcov(tobacco.lmer), mod.mat)))

> tvar1 <- pvar1 + VarCorr(tobacco.lmer)$LEAF[1]

> newdata <- data.frame(newdata, plo = newdata$NUMBER - 2 * sqrt(pvar1),

> phi = newdata$NUMBER + 2 * sqrt(pvar1), tlo = newdata$NUMBER -

> 2 * sqrt(tvar1), thi = newdata$NUMBER + 2 * sqrt(tvar1))

> par(mar = c(5, 5, 1, 1))

> plot(NUMBER ~ as.numeric(TREATMENT), newdata, type = "n", axes = F,

> ann = F, ylim = range(newdata$plo, newdata$phi), xlim = c(0.5,

> 2.5))

> with(newdata, arrows(as.numeric(TREATMENT), plo, as.numeric(TREATMENT),

> phi, ang = 90, length = 0.05, code = 3))

> with(newdata, points(as.numeric(TREATMENT), NUMBER, pch = 21,

> bg = "black", col = "black"))

> axis(1, at = c(1, 2), lab = newdata$TREATMENT)

> axis(2, las = 1)

> mtext("Number of lesions", 2, line = 3, cex = 1.25)

> mtext("Season", 1, line = 3, cex = 1.25)

> box(bty = "l")

Split-plot

In an attempt to understand the effects on marine animals of short-term exposure to toxic substances, such as might occur following a spill, or a major increase in storm water flows, a it was decided to examine the toxicant in question, Copper, as part of a field experiment in Honk Kong. The experiment consisted of small sources of Cu (small, hemispherical plaster blocks, impregnated with copper), which released the metal into sea water over 4 or 5 days. The organism whose response to Cu was being measured was a small, polychaete worm, Hydroides, that attaches to hard surfaces in the sea, and is one of the first species to colonize any surface that is submerged. The biological questions focused on whether the timing of exposure to Cu affects the overall abundance of these worms. The time period of interest was the first or second week after a surface being available.

The experimental setup consisted of sheets of black perspex (settlement plates), which provided good surfaces for these worms. Each plate had a plaster block bolted to its centre, and the dissolving block would create a gradient of [Cu] across the plate. Over the two weeks of the experiment, a given plate would have pl ain plaster blocks (Control) or a block containing copper in the first week, followed by a plain block, or a plain block in the first week, followed by a dose of copper in the second week. After two weeks in the water, plates were removed and counted back in the laboratory. Without a clear idea of how sensitive these worms are to copper, an effect of the treatments might show up as an overall difference in the density of worms across a plate, or it could show up as a gradient in abundance across the plate, with a different gradient in different treatments. Therefore, on each plate, the density of worms (#/cm²) was recorded at each of four distances from the center of the plate.

Download Copper data set

Format of copper.csv data file

COPPER	PLATE	DIST	WORMS
..	..	..	..

COPPER	Categorical listing of the copper treatment (control = no copper applied, week 2 = copper treatment applied in second week and week 1= copper treatment applied in first week) applied to whole plates. Factor A (between plot factor).
PLATE	Substrate provided for polychaete worm colonization on which copper treatment applied. These are the plots (Factor B). Numbers in this column represent numerical labels given to each plate.
DIST	Categorical listing for the four concentric distances from the center of the plate (source of copper treatment) with 1 being the closest and 4 the furthest. Factor C (within plot factor)
WORMS	Density (#/cm₂) of worms measured. Response variable.

Open the Copper data set

Show code

> copper <- read.table("copper.csv", header = T, sep = ",", strip.white = T)

> head(copper)

COPPER PLATE DIST WORMS

1 control 200 4 11.50

2 control 200 3 13.00

3 control 200 2 13.50

4 control 200 1 12.00

5 control 39 4 17.75

6 control 39 3 13.75

Notice that both the PLATE variable and the DIST variable contain only numbers. Make sure that you define both of these as factors (HINT)

Show code

> copper$PLATE <- factor(copper$PLATE)

> copper$DIST <- factor(copper$DIST)

Q3-1. What are the main hypotheses being tested?

H₀ Main Effect 1 (Factor A):
H₀ Main Effect 2 (Factor C):
H₀ Main Effect 3 (A*C):

Q3-2. The usual ANOVA assumptions apply to split-plot designs, and these can be tested by constructing boxplots for each of the main effects. However, it is important to consider what data the boxplots should be based upon. For each of the main hypothesis tests, describe what data should be used to construct boxplots (remember that the assumptions of normality and homogeneity of variance apply to the residuals of each hypothesis test, and therefore the data used in the construction of boxplots for each hypothesis test should represent the respective residuals, or sources of unexplained variation).

H₀ Main Effect 1 (Factor A):
H₀ Main Effect 2 (Factor C):
H₀ Main Effect 3 (A*C):

Q3-3. For each of the hypothesis tests, indicate which Mean Square term should be used as the residual (denominator) in the F-ratio calculation. Note, COPPER and DIST are fixed factors and PLATE is a random factor.

H₀ Main Effect 1 (Factor A): F-ratio = MS_COPPER/MS
H₀ Main Effect 2 (Factor C): F-ratio = MS_DIST/MS
H₀ Main Effect 3 (A*C): F-ratio = MS_COPPER:DIST/MS

Q3-4. Construct a boxplot to investigate the assumptions as they apply to the test of H₀ Main Effect 1 (Factor A): This is done in two steps

Aggregate the data set by the mean number of WORMS within each plate
(HINT)

Show code

> library(plyr)

> cu.ag <- ddply(copper, ~COPPER + PLATE, function(df) data.frame(WORMS = mean(df$WORMS)))

#OR

> library(plyr)

> cu.ag <- ddply(copper, ~COPPER + PLATE, summarise, WORMS = mean(WORMS,

> na.rm = TRUE))

#OR

> cu.ag <- aggregate(copper, list(COPPER = copper$COPPER, PLATE = copper$PLATE),

> mean)

#OR

> library(nlme)

> cu.ag <- gsummary(copper, form = ~COPPER/PLATE, base:::mean)

Construct a boxplot of aggregated mean number of WORMS against COPPER treatment
(HINT)

Show code

> boxplot(WORMS ~ COPPER, data = cu.ag)

Any evidence of violations of the assumptions (y or n)?

Q3-5. Construct a boxplot to investigate the assumptions as they apply to the test of H₀ Main Effect 2 (Factor C): Since Factor C is tested against the overal residual in this case, this is a relatively straight forward procedure. (HINT)

Show code

> boxplot(WORMS ~ DIST, data = copper)

Any evidence of violations of the assumptions (y or n)?

Q3-6. Construct a boxplot to investigate the assumptions as they apply to the test of H₀ the main interaction effect (A:C): Since A:C is tested against the overal residual, this is a relatively straight forward procedure.(HINT)

Show code

> boxplot(WORMS ~ COPPER:DIST, data = copper)

Any evidence of violations of the assumptions (y or n)?

Q3-7. In addition to the above assumptions, the test of PLATE assumes that there is no PLATE by DIST interaction as this is the overal residual (the replicates). That is, the test assumes that the effect of DIST is consistent in all PLATES. Construct an interaction plot to examine whether there is any evidence of an interaction between PLATE and DISTANCE (HINT)

Show code

> library(car)

> with(copper, interaction.plot(PLATE, DIST, WORMS))

> library(alr3)

> tukey.nonadd.test(lm(WORMS ~ PLATE + DIST, data = copper))

Test Pvalue

-4.122434e+00 3.748907e-05

Any evidence of an interaction (y or n)?
Is the design (model) unbalanced ?
(HINT) (Yor N)
Show code

[1] TRUE
Is there any evidence of issues with sphericity
?(HINT) (Y or N)

Show code

> library(biology)

> epsi.GG.HF(aov(WORMS ~ COPPER + Error(PLATE) + DIST + COPPER:DIST,

> data = copper))

$GG.eps

[1] 0.5203159

$HF.eps

[1] 0.584099

$sigma

[1] 1.806771

Q3-8. Write out the linear model

Q3-9. Perform a split-plot ANOVA

(HINT), and complete the following table (HINT). To obtain the hypothesis test for the random factor (Factor B: PLATE), run the model as if all factors were fixed and thus all terms are tested against the overall residuals, HINT)

Show code

> copper.aov <- aov(WORMS ~ COPPER + DIST + COPPER:DIST + Error(PLATE),

> data = copper)

> AnovaM(copper.aov, RM = T)

Sphericity Epsilon Values

-------------------------------

Greenhouse.Geisser Huynh.Feldt

0.5203159 0.584099

Anova Table (Type I tests)

Response: WORMS

Error: PLATE

Df Sum Sq Mean Sq F value Pr(>F)

COPPER 2 783.73 391.87 128.02 8.051e-09 ***

Residuals 12 36.73 3.06

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

DIST 3 153.803 51.268 28.3753 1.356e-09 ***

COPPER:DIST 6 53.419 8.903 4.9276 0.0009028 ***

Residuals 36 65.044 1.807

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Greenhouse-Geisser corrected ANOVA table

Response: WORMS

Error: PLATE

Df Sum Sq Mean Sq F value Pr(>F)

COPPER 1.0406 783.73 391.87 128.02 8.021e-08 ***

Residuals 12.0000 36.73 3.06

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

DIST 1.5609 153.803 51.268 28.3753 2.692e-07 ***

COPPER:DIST 3.1219 53.419 8.903 4.9276 0.005222 **

Residuals 36.0000 65.044 1.807

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Huynh-Feldt corrected ANOVA table

Response: WORMS

Error: PLATE

Df Sum Sq Mean Sq F value Pr(>F)

COPPER 1.1682 783.73 391.87 128.02 5.259e-08 ***

Residuals 12.0000 36.73 3.06

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

DIST 1.7523 153.803 51.268 28.3753 1.126e-07 ***

COPPER:DIST 3.5046 53.419 8.903 4.9276 0.00397 **

Residuals 36.0000 65.044 1.807

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Source of variation	df	Mean Sq	F-ratio	P-value
COPPER
PLATE
DIST
COPPER:DIST
Residuals

Q3-10.Construct an interaction plot

showing the density of worms against distance from Cu source, with each treatment as different lines (or different bars). HINT

Show code

> interaction.plot(copper$DIST, copper$COPPER, copper$WORMS)

Q3-11. What conclusions would you draw from the analysis (and graph)?

Q3-12. In order to tease out the interaction, we could analyse the effects of the main factor (COPPER) separately for each Distance and/or investigate the effects of Distance separately for each of the three copper treatments. Recall that when performing such simple main effects, it is necessary to use the residual terms from the global analyses as these are likely to be better estimates (as they are based on a larger amount of data).

Investigate the effects of copper separately for each of the four distances HINT
Show code

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        9 207.222 23.0247  12.743 8.382e-09 ***
Residuals 36  65.044  1.8068
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        9 207.222 23.0247  12.743 8.382e-09 ***
Residuals 36  65.044  1.8068
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        9 207.222 23.0247  12.743 8.382e-09 ***
Residuals 36  65.044  1.8068
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What conclusions would you draw from these four analyses?
Investigate the effects of distance separately for each of the three copper treatments HINT
Show code

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        6 188.597 31.4328 17.3972 2.506e-09 ***
DIST       3  18.625  6.2083  3.4361   0.02691 *
Residuals 36  65.044  1.8068
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        6 188.059 31.3432 17.3477 2.599e-09 ***
DIST       3  19.163  6.3875  3.5353   0.02419 *
Residuals 36  65.044  1.8068
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: PLATE
          Df Sum Sq Mean Sq F value    Pr(>F)
INT        2 783.73  391.87  128.02 8.051e-09 ***
Residuals 12  36.73    3.06
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
          Df  Sum Sq Mean Sq F value    Pr(>F)
INT        6  37.788   6.298  3.4857  0.008066 **
DIST       3 169.434  56.478 31.2592 3.968e-10 ***
Residuals 36  65.044   1.807
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What conclusions would you draw from these four analyses?

Mixed models

Show code

> copper.lme <- lme(WORMS ~ COPPER * DIST, random = ~1 | PLATE,

> data = copper, method = "ML")

> copper.lme1 <- lme(WORMS ~ COPPER * DIST, random = ~1 | PLATE,

> data = copper, method = "ML", correlation = corCompSymm(form = ~1 |

> PLATE))

> copper.lme2 <- lme(WORMS ~ COPPER * DIST, random = ~1 | PLATE,

> data = copper, method = "ML", correlation = corAR1(form = ~1 |

> PLATE))

> anova(copper.lme, copper.lme1)

Model df AIC BIC logLik Test L.Ratio p-value

copper.lme 1 14 228.2842 257.6050 -100.1421

copper.lme1 2 15 230.2842 261.6993 -100.1421 1 vs 2 1.024887e-10 1

Check whether a first order autoregressive correlation structure is more appropriate

> copper.lme2 <- update(copper.lme, correlation = corAR1(form = ~1 |

> PLATE))

> anova(copper.lme, copper.lme2)

Model df AIC BIC logLik Test L.Ratio p-value

copper.lme 1 14 228.2842 257.6050 -100.14209

copper.lme2 2 15 222.9325 254.3477 -96.46625 1 vs 2 7.351676 0.0067

# Now fit the 'best'' model with REML

> copper.lme3 <- update(copper.lme2, method = "REML")

> anova.lme(copper.lme3)

numDF denDF F-value p-value

(Intercept) 1 36 980.6144 <.0001

COPPER 2 12 93.0577 <.0001

DIST 3 36 25.1142 <.0001

COPPER:DIST 6 36 4.2520 0.0025

Show code

> detach(package:nlme)

> copper.lmer <- lmer(WORMS ~ COPPER * DIST + (1 | PLATE), data = copper)

Examine the fixed effects coefficients

Show code

> summary(copper.lmer)@coefs

> library(languageR)

> pvals.fnc(copper.lmer, withMCMC = FALSE, addPlot = F)

$fixed

Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)

(Intercept) 10.85 10.8586 9.5809 12.184 0.0001 0.0000

COPPERWeek 1 -3.60 -3.6040 -5.4196 -1.744 0.0001 0.0003

COPPERWeek 2 -10.60 -10.6050 -12.4661 -8.789 0.0001 0.0000

DIST2 1.15 1.1468 -0.7585 2.829 0.2074 0.1825

DIST3 1.55 1.5459 -0.1836 3.360 0.0830 0.0745

DIST4 2.70 2.6984 0.9173 4.550 0.0054 0.0026

COPPERWeek 1:DIST2 -0.05 -0.0453 -2.5593 2.476 0.9552 0.9670

COPPERWeek 2:DIST2 0.05 0.0540 -2.4253 2.541 0.9654 0.9670

COPPERWeek 1:DIST3 -0.30 -0.3022 -2.7646 2.198 0.8132 0.8040

COPPERWeek 2:DIST3 2.20 2.1969 -0.2649 4.787 0.0852 0.0735

COPPERWeek 1:DIST4 0.05 0.0448 -2.5683 2.504 0.9700 0.9670

COPPERWeek 2:DIST4 4.90 4.8992 2.3596 7.395 0.0010 0.0002

$random

Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper

1 PLATE (Intercept) 0.5599 0.2627 0.2841 0.0000 0.7267

2 Residual 1.3442 1.4133 1.4279 1.1428 1.7589

Broad interpretation: In general, compared to the number of worms settling on the control plates, the number of worms settling was less in the Week 1 treatment and even lower in the Week 2 treatment. The increases in number of settled worms in distances progressively further from the source (Dist 1) get progressively larger and the trend between DIST1 (base level) and DIST 4 is significantly greater for the Week 2 treatment compared to the control copper treatment.

Examine the model fit measures

Show code

> summary(copper.lmer)@AICtab

AIC BIC logLik deviance REMLdev

218.2515 247.5723 -95.12575 200.2842 190.2515

Examine the variance components

Show code

> data.frame(summary(copper.lmer)@REmat)

Groups Name Variance Std.Dev.

1 PLATE (Intercept) 0.31354 0.55995

2 Residual 1.80677 1.34416

Examine the predicted means and confidence intervals

Show code

> newdata <- expand.grid(COPPER = levels(copper$COPPER), DIST = levels(copper$DIST),

> WORMS = 0)

> mod.mat <- model.matrix(terms(copper.lmer), newdata)

> newdata$WORMS = mod.mat %*% fixef(copper.lmer)

> pvar1 <- diag(mod.mat %*% tcrossprod(vcov(copper.lmer), mod.mat))

> tvar1 <- pvar1 + VarCorr(copper.lmer)$PLATE[1]

> newdata <- data.frame(newdata, plo = newdata$WORMS - 2 * sqrt(pvar1),

> phi = newdata$WORMS + 2 * sqrt(pvar1), tlo = newdata$WORMS -

> 2 * sqrt(tvar1), thi = newdata$WORMS + 2 * sqrt(tvar1))

> par(mar = c(5, 5, 1, 1))

> plot(WORMS ~ as.numeric(DIST), newdata, type = "n", axes = F,

> ann = F, ylim = range(newdata$plo, newdata$phi), xlim = c(0.5,

> 4.5))

> with(subset(newdata, COPPER == "control"), arrows(as.numeric(DIST),

> plo, as.numeric(DIST), phi, ang = 90, length = 0.05, code = 3))

> with(subset(newdata, COPPER == "control"), lines(as.numeric(DIST),

> WORMS, lty = 1))

> with(subset(newdata, COPPER == "control"), points(as.numeric(DIST),

> WORMS, pch = 21, bg = "black", col = "black"))

> with(subset(newdata, COPPER == "Week 1"), arrows(as.numeric(DIST),

> plo, as.numeric(DIST), phi, ang = 90, length = 0.05, code = 3))

> with(subset(newdata, COPPER == "Week 1"), lines(as.numeric(DIST),

> WORMS, lty = 2))

> with(subset(newdata, COPPER == "Week 1"), points(as.numeric(DIST),

> WORMS, pch = 21, bg = "grey", col = "black"))

> with(subset(newdata, COPPER == "Week 2"), arrows(as.numeric(DIST),

> plo, as.numeric(DIST), phi, ang = 90, length = 0.05, code = 3))

> with(subset(newdata, COPPER == "Week 2"), lines(as.numeric(DIST),

> WORMS, lty = 3))

> with(subset(newdata, COPPER == "Week 2"), points(as.numeric(DIST),

> WORMS, pch = 21, bg = "white", col = "black"))

> axis(1, at = as.numeric(newdata$DIST), lab = newdata$DIST)

> axis(2, las = 1)

> mtext("Number of settled worms", 2, line = 3, cex = 1.25)

> mtext("Distance from Copper", 1, line = 3, cex = 1.25)

> legend("bottomright", legend = c("Control", "Week 1", "Week 2"),

> pch = c(21, 21, 21), pt.bg = c("black", "grey", "white"),

> lty = c(1, 2, 3), bty = "n", title = "Treatment")

> box(bty = "l")

> library(AICcmodavg)

> data.frame(newdata, predictSE.mer(copper.lmer, newdata, print.matrix = T))

COPPER DIST WORMS plo phi tlo thi fit se.fit

1 control 1 10.85 9.5476118 12.152388 9.1323308 12.567669 10.85 0.6511941

2 Week 1 1 7.25 5.9476118 8.552388 5.5323308 8.967669 7.25 0.6511941

3 Week 2 1 0.25 -1.0523882 1.552388 -1.4676692 1.967669 0.25 0.6511941

4 control 2 12.00 10.6976118 13.302388 10.2823308 13.717669 12.00 0.6511941

5 Week 1 2 8.35 7.0476118 9.652388 6.6323308 10.067669 8.35 0.6511941

6 Week 2 2 1.45 0.1476118 2.752388 -0.2676692 3.167669 1.45 0.6511941

7 control 3 12.40 11.0976118 13.702388 10.6823308 14.117669 12.40 0.6511941

8 Week 1 3 8.50 7.1976118 9.802388 6.7823308 10.217669 8.50 0.6511941

9 Week 2 3 4.00 2.6976118 5.302388 2.2823308 5.717669 4.00 0.6511941

10 control 4 13.55 12.2476118 14.852388 11.8323308 15.267669 13.55 0.6511941

11 Week 1 4 10.00 8.6976118 11.302388 8.2823308 11.717669 10.00 0.6511941

12 Week 2 4 7.85 6.5476118 9.152388 6.1323308 9.567669 7.85 0.6511941

Now modeled against a Poisson distribution with polynomial contrasts for the distance variable.

Show code

> contrasts(copper$DIST) <- contr.poly

> copper.lmer <- lmer(WORMS ~ COPPER * DIST + (1 | PLATE), data = copper,

> family = "poisson")

Examine the fixed effects coefficients

Show code

> summary(copper.lmer)@coefs

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.498288400 0.06422083 38.90152634 0.000000e+00

COPPERWeek 1 -0.361810036 0.10032970 -3.60621071 3.107009e-04

COPPERWeek 2 -1.890274446 0.25973697 -7.27764880 3.396883e-13

DIST.L 0.156402669 0.12875085 1.21476998 2.244538e-01

DIST.Q -0.006024973 0.12844166 -0.04690824 9.625864e-01

DIST.C 0.027693947 0.12813173 0.21613652 8.288813e-01

COPPERWeek 1:DIST.L 0.063303015 0.20090958 0.31508212 7.526993e-01

COPPERWeek 2:DIST.L 2.382701845 0.63043195 3.77947508 1.571593e-04

COPPERWeek 1:DIST.Q 0.016654058 0.20065940 0.08299665 9.338542e-01

COPPERWeek 2:DIST.Q -0.535807563 0.51947394 -1.03144262 3.023333e-01

COPPERWeek 1:DIST.C 0.032271051 0.20040891 0.16102603 8.720729e-01

COPPERWeek 2:DIST.C 0.062310438 0.37717623 0.16520245 8.687846e-01

Interpretation: Compared to the number of worms settling on control plates, the number of settling worms is less in Week 1 and even lower in the Week 2 copper treatment. Settlement increases with increasing distance from the copper source and this rate of increase is over twice the rate (significantly greater) in the Week 2 treatment.

Show code - ezAnova

> library(ez)

> ezANOVA(dv = .(WORMS), between = .(COPPER), within = .(DIST),

> wid = .(PLATE), data = copper, type = 3)

$ANOVA

Effect DFn DFd F p p<.05 ges

2 COPPER 2 12 128.021440 8.051223e-09 * 0.8850657

3 DIST 3 36 28.375324 1.356185e-09 * 0.6017852

4 COPPER:DIST 6 36 4.927645 9.028004e-04 * 0.3442068

$`Mauchly's Test for Sphericity`

Effect W p p<.05

3 DIST 0.2311286 0.008004879 *

4 COPPER:DIST 0.2311286 0.008004879 *

$`Sphericity Corrections`

Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05

3 DIST 0.5203159 6.355564e-06 * 0.584099 2.048030e-06 *

4 COPPER:DIST 0.5203159 1.021592e-02 * 0.584099 7.346469e-03 *

Repeated Measures

In an honours thesis from (1992), Mullens was investigating the ways that cane toads ( Bufo marinus ) respond to conditions of hypoxia. Toads show two different kinds of breathing patterns, lung or buccal, requiring them to be treated separately in the experiment. Her aim was to expose toads to a range of O₂ concentrations, and record their breathing patterns, including parameters such as the expired volume for individual breaths. It was desirable to have around 8 replicates to compare the responses of the two breathing types, and the complication is that animals are expensive, and different individuals are likely to have different O₂ profiles (leading to possibly reduced power). There are two main design options for this experiment;

One animal per O₂ treatment, 8 concentrations, 2 breathing types. With 8 replicates the experiment would require 128 animals, but that this could be analysed as a completely randomized design
One O₂ profile per animal, so that each animal would be used 8 times and only 16 animals are required (8 lung and 8 buccal breathers)

Mullens decided to use the second option so as to reduce the number of animals required (on financial and ethical grounds). By selecting this option, she did not have a set of independent measurements for each oxygen concentration, by repeated measurements on each animal across the 8 oxygen concentrations.

Download Mullens data set

Format of mullens.csv data file

BREATH	TOAD	O2LEVEL	FREQBUC	SFREQBUC
lung	a	0	10.6	3.256
lung	a	5	18.8	4.336
lung	a	10	17.4	4.171
lung	a	15	16.6	4.074
...	...	...	...	...

BREATH	Categorical listing of the breathing type treatment (buccal = buccal breathing toads, lung = lung breathing toads). This is the between subjects (plots) effect and applies to the whole toads (since a single toad can only be one breathing type - either lung or buccal). Equivalent to Factor A (between plots effect) in a split-plot design
TOAD	These are the subjects (equivalent to the plots in a split-plot design: Factor B). The letters in this variable represent the labels given to each individual toad.
O2LEVEL	0 through to 50 represent the the different oxygen concentrations (0% to 50%). The different oxygen concentrations are equivalent to the within plot effects in a split-plot (Factor C).
FREQBUC	The frequency of buccal breathing - the response variable
SFREQBUC	Square root transformed frequency of buccal breathing - the response variable

Open the mullens data file. HINT.

Show code

> mullens <- read.table("mullens.csv", header = T, sep = ",", strip.white = T)

> head(mullens)

BREATH TOAD O2LEVEL FREQBUC SFREQBUC

1 lung a 0 10.6 3.255764

2 lung a 5 18.8 4.335897

3 lung a 10 17.4 4.171331

4 lung a 15 16.6 4.074310

5 lung a 20 9.4 3.065942

6 lung a 30 11.4 3.376389

Notice that both the O2LEVEL variable contains only numbers. Make sure that you define both of this as a factors (HINT)

Show code

> mullens$O2LEVEL <- factor(mullens$O2LEVEL)

Q4-1. What are the main hypotheses being tested?

H₀ Main Effect 1 (Factor A):
H₀ Main Effect 2 (Factor C):
H₀ Main Effect 3 (A*C):

Q4-2. We will now address all the assumptions.
Construct a boxplot to investigate the assumptions as they apply to the test of H₀ Main Effect 1 (Factor A): This is done in two steps

Construct a boxplot to investigate the assumptions as they apply to the test of H₀ Main Effect 1 (Factor A): Start by aggregating the data set by TOAD
(since the toads are the replicates for the between subject effect - BREATH) (HINT). Then Construct a boxplot of aggregated mean FREQBUC against BREATH treatment
(HINT). Any evidence of violations of the assumptions (y or n)?
Show code

Try a square-root transformation!

Show code

> library(plyr)

> mullens.ag <- ddply(mullens, ~BREATH + TOAD, function(df) data.frame(SFREQBUC = mean(sqrt(df$FREQBUC))))

> boxplot(SFREQBUC ~ BREATH, mullens.ag)

Construct a boxplot to investigate the assumptions as they apply to the test of H₀ Main Effect 2 (Factor C): Since Factor C is tested against the overall residual in this case, this is a relatively straight forward procedure.(HINT). Any evidence of violations of the assumptions (y or n)?

Show code

> boxplot(SFREQBUC ~ O2LEVEL, mullens)

Construct a boxplot to investigate the assumptions as they apply to the test of H₀ the main interaction effect (A:C): Since A:C is tested against the overal residual, this is a relatively straight forward procedure.(HINT). Any evidence of violations of the assumptions (y or n)?

Show code

> boxplot(SFREQBUC ~ BREATH * O2LEVEL, mullens)

In addition to the above assumptions, the test of TOAD assumes that there is no TOAD by O2LEVEL interaction as this is the overal residual (the replicates). That is, the test assumes that the effect of O2LEVEL is consistent in all TOADS. Construct an interaction plot to examine whether there is any evidence of an interaction between TOAD and O2LEVEL (HINT). Any evidence of an interaction (y or n)?

Show code

> with(mullens, interaction.plot(TOAD, O2LEVEL, SFREQBUC))

> library(alr3)

> tukey.nonadd.test(lm(SFREQBUC ~ TOAD + O2LEVEL, data = mullens))

Test Pvalue

3.079699333 0.002072097

You must also check to see whether the proposed model balanced.

Show code

[1] FALSE

Is it? (Yor N)
Finally, since the design is a repeated measures, there is a good chance that the assumption of sphericity
has been violated balanced.
Show code

$GG.eps
[1] 0.4281775
$HF.eps
[1] 0.5172755
$sigma
[1] 0.7531308

Has it (HINT)? (Y or N)

Q4-3. Assume that the assumption of compound symmetry/sphericity will be violated and perform a split-plot ANOVA (repeated measures)

(HINT), and complete the following table with corrected p-values (HINT).

Show code

> mullens.aov <- aov(SFREQBUC ~ BREATH * O2LEVEL + Error(TOAD),

> data = mullens)

> library(biology)

> AnovaM(mullens.aov, RM = T)

Sphericity Epsilon Values

-------------------------------

Greenhouse.Geisser Huynh.Feldt

0.4281775 0.5172755

Anova Table (Type I tests)

Response: SFREQBUC

Error: TOAD

Df Sum Sq Mean Sq F value Pr(>F)

BREATH 1 39.921 39.921 5.7622 0.02678 *

Residuals 19 131.634 6.928

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

O2LEVEL 7 38.969 5.5670 7.3918 1.710e-07 ***

BREATH:O2LEVEL 7 56.372 8.0531 10.6928 1.228e-10 ***

Residuals 133 100.166 0.7531

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Greenhouse-Geisser corrected ANOVA table

Response: SFREQBUC

Error: TOAD

Df Sum Sq Mean Sq F value Pr(>F)

BREATH 0.42818 39.921 39.921 5.7622 0.04785 *

Residuals 19.00000 131.634 6.928

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

O2LEVEL 2.9972 38.969 5.5670 7.3918 0.000129 ***

BREATH:O2LEVEL 2.9972 56.372 8.0531 10.6928 2.435e-06 ***

Residuals 133.0000 100.166 0.7531

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Huynh-Feldt corrected ANOVA table

Response: SFREQBUC

Error: TOAD

Df Sum Sq Mean Sq F value Pr(>F)

BREATH 0.51728 39.921 39.921 5.7622 0.04393 *

Residuals 19.00000 131.634 6.928

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

O2LEVEL 3.6209 38.969 5.5670 7.3918 4.095e-05 ***

BREATH:O2LEVEL 3.6209 56.372 8.0531 10.6928 4.223e-07 ***

Residuals 133.0000 100.166 0.7531

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Source of variation	df	Mean Sq	F-ratio	P-value	GG.P	HF.P
BREATH
TOAD
O2LEVEL
BREATH:O2LEVEL
Residuals

Q4-4. Construct an interaction plot

showing the frequency of buccal breathing against oxygen level, with each breathing type as different lines (or different bars). HINT

Show code

> with(mullens, interaction.plot(O2LEVEL, BREATH, FREQBUC))

Q4-5. What conclusions would you draw from the analysis (and graph)?

Mixed models

Show code

> mullens.lmer <- lmer(SFREQBUC ~ BREATH * O2LEVEL + (1 | TOAD),

> data = mullens)

Examine the fixed effects coefficients

Show code

> summary(mullens.lmer)@coefs

> library(languageR)

> pvals.fnc(mullens.lmer, withMCMC = FALSE, addPlot = F)

$fixed

Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|)

(Intercept) 4.9410 4.9384 4.3270 5.5424 0.0001 0.0000

BREATHlung -3.3914 -3.3876 -4.3599 -2.3899 0.0001 0.0000

O2LEVEL5 -0.2174 -0.2189 -0.9361 0.5244 0.5640 0.5240

O2LEVEL10 -0.5476 -0.5484 -1.2543 0.1900 0.1394 0.1097

O2LEVEL15 -0.8687 -0.8702 -1.5651 -0.1471 0.0178 0.0117

O2LEVEL20 -1.5444 -1.5463 -2.2855 -0.8327 0.0002 0.0000

O2LEVEL30 -1.4644 -1.4656 -2.1589 -0.7132 0.0002 0.0000

O2LEVEL40 -2.2457 -2.2460 -2.9810 -1.5347 0.0001 0.0000

O2LEVEL50 -2.4600 -2.4599 -3.1459 -1.7343 0.0001 0.0000

BREATHlung:O2LEVEL5 1.0362 1.0376 -0.0997 2.2189 0.0766 0.0622

BREATHlung:O2LEVEL10 2.3327 2.3326 1.1724 3.5081 0.0001 0.0000

BREATHlung:O2LEVEL15 2.3729 2.3726 1.1888 3.5310 0.0001 0.0000

BREATHlung:O2LEVEL20 3.2985 3.2967 2.1247 4.4251 0.0001 0.0000

BREATHlung:O2LEVEL30 2.9767 2.9738 1.8095 4.1667 0.0001 0.0000

BREATHlung:O2LEVEL40 3.6168 3.6121 2.4003 4.7480 0.0001 0.0000

BREATHlung:O2LEVEL50 3.4669 3.4647 2.3277 4.6389 0.0001 0.0000

$random

Groups Name Std.Dev. MCMCmedian MCMCmean HPD95lower HPD95upper

1 TOAD (Intercept) 0.8786 0.6016 0.6089 0.4299 0.8014

2 Residual 0.8678 0.9305 0.9341 0.8218 1.0581

Source	A,C (Fix); B (Rand)
A	MS_A/MS_B
B	MS_B/MS_Residual
C	MS_C/MS_B:C
A:C	MS_A:C/MS_B:C
B:C	MS_B:C/MS_Residual

Workshop 7 - Nested, randomized block, repeated measures and partly-nested designs (mixed effects models)

Basic statistics references

Nested ANOVA - one between factor

Download Curdies data set

Randomized block

Download Tobacco data set

Split-plot

Download Copper data set

Mixed models

Repeated Measures

Download Mullens data set

Mixed models

Welcome to the end of Workshop 7