Workshop 6 - Factorial Analysis of Variance

5 July 2011

Topics

Nested ANOVA
- Exercise 1
Factoral ANOVA

Basic statistics references

Logan (2010) - Chpt 12
Quinn & Keough (2002) - Chpt 9

Very basic overview of Nested ANOVA

Nested ANOVA - one between factor

In an unusually detailed preparation for an Environmental Effects Statement for a proposed discharge of dairy wastes into the Curdies River, in western Victoria, a team of stream ecologists wanted to describe the basic patterns of variation in a stream invertebrate thought to be sensitive to nutrient enrichment. As an indicator species, they focused on a small flatworm, Dugesia, and started by sampling populations of this worm at a range of scales. They sampled in two seasons, representing different flow regimes of the river - Winter and Summer. Within each season, they sampled three randomly chosen (well, haphazardly, because sites are nearly always chosen to be close to road access) sites. A total of six sites in all were visited, 3 in each season. At each site, they sampled six stones, and counted the number of flatworms on each stone.

Download Curdies data set

Format of curdies.csv data files

SEASON	SITE	DUGESIA	S4DUGESIA
WINTER	1	0.648	0.897
..	..	..	..
WINTER	2	1.016	1.004
..	..	..	..
WINTER	3	0.689	0.991
..	..	..	..
SUMMER	4	0	0
..	..	..	..

Each row represents a different stone

SEASON	Season in which flatworms were counted - fixed factor
SITE	Site from where flatworms were counted - nested within SEASON (random factor)
DUGESIA	Number of flatworms counted on a particular stone
S4DUGESIA	4th root transformation of DUGESIA variable

Open

the curdies data file.

Show code

> curdies <- read.table("curdies.csv", header = T, sep = ",", strip.white = T)

> head(curdies)

SEASON SITE DUGESIA S4DUGES

1 WINTER 1 0.6476829 0.8970995

2 WINTER 1 6.0961516 1.5713175

3 WINTER 1 1.3105639 1.0699526

4 WINTER 1 1.7252788 1.1460797

5 WINTER 1 1.4593867 1.0991136

6 WINTER 1 1.0575610 1.0140897

The SITE variable is supposed to represent a random factorial variable (which site). However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor (HINT)

Show code

> curdies$SITE <- as.factor(curdies$SITE)

Notice the data set - each of the nested factors is labelled differently - there can be no replicate for the random (nesting) factor.

Q1-1. What are the main hypotheses being tested?

H₀ Effect 1:
H₀ Effect 2:

Q1-2. In the table below, list the assumptions of nested ANOVA along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

Assumption	Diagnostic/Risk Minimization
I.
II.
III.

Q1-3. Check these assumptions (HINT).

Show code

> library(nlme)

> curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, mean)

Note that for the effects of SEASON (Factor A in a nested model) there are only three values for each of the two season types. Therefore, boxplots are of limited value! Is there however, any evidence of violations of the assumptions (HINT)?

Show code

> boxplot(DUGESIA ~ SEASON, curdies.ag)

(Y or N)
If so, assess whether a transformation will address the violations (HINT) and then make the appropriate corrections

Show code

> curdies$S4DUGES <- sqrt(sqrt(curdies$DUGESIA))

> curdies.ag <- gsummary(curdies, form = ~SEASON/SITE, mean)

> boxplot(S4DUGES ~ SEASON, curdies.ag)

Q1-4. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Effect	Nominator (Mean Sq, df)	Denominator (Mean Sq, df)
SEASON
SITE

Q1-5. If there is no evidence of violations, test the model;
S4DUGES = SEASON + SITE + CONSTANT
using a nested ANOVA

(HINT). Fill (HINT) out the table below, make sure that you have treated SITE as a random factor when compiling the overall results.

Show code

> curdies.aov <- aov(S4DUGES ~ SEASON + Error(SITE), data = curdies)

> summary(curdies.aov)

Error: SITE

Df Sum Sq Mean Sq F value Pr(>F)

SEASON 1 5.5709 5.5709 34.496 0.004198 **

Residuals 4 0.6460 0.1615

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within

Df Sum Sq Mean Sq F value Pr(>F)

Residuals 30 4.5555 0.15185

> summary(lm(curdies.aov))

Call:

lm(formula = curdies.aov)

Residuals:

Min 1Q Median 3Q Max

-0.3811 -0.2618 -0.1381 0.1652 0.9023

Coefficients: (1 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.3811 0.1591 2.396 0.02303 *

SEASONWINTER 0.7518 0.2250 3.342 0.00224 **

SITE2 0.1389 0.2250 0.618 0.54156

SITE3 -0.2651 0.2250 -1.178 0.24798

SITE4 -0.0303 0.2250 -0.135 0.89376

SITE5 -0.2007 0.2250 -0.892 0.37955

SITE6 NA NA NA NA

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3897 on 30 degrees of freedom

Multiple R-squared: 0.5771, Adjusted R-squared: 0.5066

F-statistic: 8.188 on 5 and 30 DF, p-value: 5.718e-05

> confint(lm(curdies.aov))

2.5 % 97.5 %

(Intercept) 0.05622464 0.7060200

SEASONWINTER 0.29234513 1.2112945

SITE2 -0.32054693 0.5984024

SITE3 -0.72454609 0.1944033

SITE4 -0.48977564 0.4291737

SITE5 -0.66013474 0.2588146

SITE6 NA NA

Q1-6. For each of the tests, state which error (or residual) term (state as Mean Square) will be used as the nominator and denominator to calculate the F ratio. Also include degrees of freedom associated with each term.

Source of variation	df	Mean Sq	F-ratio	P-value
SEASON
SITE
Residuals

Estimate	Mean	Lower 95% CI	Upper 95% CI
Summer
Effect size (Winter-Summer)

Normally, we are not interested in formally testing the effect of the nested factor to get the correct F test for the nested factor (SITE), examine a representation of the anova table of the fitted linear model that assumes all factors are fixed (HINT)

Q1-7. What are your conclusions (statistical and biological)?

Q1-8. Where is the major variation in numbers of flatworms? Between (seasons, sites or stones)?

Show code

> library(nlme)

> VarCorr(lme(S4DUGES ~ 1, random = ~1 | SEASON/SITE, curdies))

Variance StdDev

SEASON = pdLogChol(1)

(Intercept) 0.300522947 0.54819973

SITE = pdLogChol(1)

(Intercept) 0.001606859 0.04008565

Residual 0.151850793 0.38968037

Q1-9. How might this information influence the design of future experiments on Dugesia in terms of:

What influences the abundance of Dugesia
Where best to focus sampling effort to maximize statistical power?

Q1-10.Finally, construct an appropriate summary figure to accompany the above analyses. Note that this should use the correct replicates for depicting error.

Show code

> opar <- par(mar = c(5, 5, 1, 1))

> means <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, mean)

> sds <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, sd)

> lens <- tapply(curdies.ag$DUGESIA, curdies.ag$SEASON, length)

> ses <- sds/sqrt(lens)

> xs <- barplot(means, beside = T, ann = F, axes = F, ylim = c(0,

> max(means + ses)), axisnames = F)

> arrows(xs, means - ses, xs, means + ses, code = 3, length = 0.05,

> ang = 90)

> axis(1, at = xs, lab = c("Summer", "Winter"))

> mtext("Season", 1, line = 3, cex = 1.25)

> axis(2, las = 1)

> mtext("Mean number of Dugesia per stone", 2, line = 3, cex = 1.25)

> box(bty = "l")

Exercise 2 - Two factor ANOVA

A biologist studying starlings wanted to know whether the mean mass of starlings differed according to different roosting situations. She was also interested in whether the mean mass of starlings altered over winter (Northern hemisphere) and whether the patterns amongst roosting situations were consistent throughout winter, therefore starlings were captured at the start (November) and end of winter (January). Ten starlings were captured from each roosting situation in each season, so in total, 80 birds were captured and weighed.

Download Starling data set

Format of starling.csv data files

SITUATION	MONTH	MASS	GROUP
S1	November	78	S1Nov
..	..	..	..
S2	November	78	S2Nov
..	..	..	..
S3	November	79	S3Nov
..	..	..	..
S4	November	77	S4Nov
..	..	..	..
S1	January	85	S1Jan
..	..	..	..

SITUATION	Categorical listing of roosting situations
MONTH	Categorical listing of the month of sampling.
MASS	Mass (g) of starlings.
GROUP	Categorical listing of situation/month combinations - used for checking ANOVA assumptions

Open the starling data file.

Show code

> starling <- read.table("starling.csv", header = T, sep = ",",

> strip.white = T)

> head(starling)

SITUATION MONTH MASS GROUP

1 S1 November 78 S1Nov

2 S1 November 88 S1Nov

3 S1 November 87 S1Nov

4 S1 November 88 S1Nov

5 S1 November 83 S1Nov

6 S1 November 82 S1Nov

Q2-1. List the 3 null hypothesis being tested

Q2-2. Test the assumptions

by producing boxplots

(HINT) and mean vs variance plot

Show code

> boxplot(MASS ~ SITUATION * MONTH, data = starling)

> means <- with(starling, tapply(MASS, list(SITUATION, MONTH),

> mean))

> vars <- with(starling, tapply(MASS, list(SITUATION, MONTH), var))

> plot(means, vars, pch = 16)

Is there any evidence that one or more of the assumptions are likely to be violated? (Y or N)
Is the proposed model balanced?
(Y or N)

Show code

> replications(MASS ~ SITUATION * MONTH, data = starling)

SITUATION MONTH SITUATION:MONTH

20 40 10

> !is.list(replications(MASS ~ SITUATION * MONTH, data = starling))

[1] TRUE

Q2-3. Now fit a two-factor ANOVA model (HINT)

and examine the residuals (HINT).

Show code

> starling.lm <- lm(MASS ~ SITUATION * MONTH, data = starling)

> par(mfrow = c(2, 1), oma = c(0, 0, 2, 0))

> plot(starling.lm, ask = F, which = 1:2)

Any evidence of skewness or unequal variances? Any outliers? Any evidence of violations? ('Y' or 'N') .
Examine the ANOVA table

Show code

> summary(starling.lm)

Call:

lm(formula = MASS ~ SITUATION * MONTH, data = starling)

Residuals:

Min 1Q Median 3Q Max

-7.4 -3.2 -0.4 2.9 9.2

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 90.800 1.330 68.260 < 2e-16 ***

SITUATIONS2 -0.600 1.881 -0.319 0.750691

SITUATIONS3 -2.600 1.881 -1.382 0.171213

SITUATIONS4 -6.600 1.881 -3.508 0.000781 ***

MONTHNovember -7.200 1.881 -3.827 0.000274 ***

SITUATIONS2:MONTHNovember -3.600 2.660 -1.353 0.180233

SITUATIONS3:MONTHNovember -2.400 2.660 -0.902 0.370003

SITUATIONS4:MONTHNovember -1.600 2.660 -0.601 0.549455

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.206 on 72 degrees of freedom

Multiple R-squared: 0.64, Adjusted R-squared: 0.605

F-statistic: 18.28 on 7 and 72 DF, p-value: 9.546e-14

> anova(starling.lm)

Analysis of Variance Table

Response: MASS

Df Sum Sq Mean Sq F value Pr(>F)

SITUATION 3 574.4 191.47 10.8207 5.960e-06 ***

MONTH 1 1656.2 1656.20 93.6000 1.172e-14 ***

SITUATION:MONTH 3 34.2 11.40 0.6443 0.5891

Residuals 72 1274.0 17.69

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

and fill in the following table:

Source of Variation	SS	df	MS	F-ratio	Pvalue
SITUATION
MONTH
SITUATION : MONTH
Residual (within groups)

Q2-4.An interaction plot (plot of means)

is useful for summarizing multi-way ANOVA models. Summarize the trends using an interaction plot (HINT).

Show code

> library(car)

> with(starling, interaction.plot(SITUATION, MONTH, MASS))

In the classical frequentist regime, many at this point would advocate dropping the interaction term from the model (p-value for the interaction is greater than 0.25). This term is not soaking up much of the residual and yet is soaking up 3 degrees of freedom. The figure also indicates that situation and month are likely to operate additively.

Q2-5. In the absence of an interaction, we can examine the effects of each of the main effects in isolation. It is not necessary to examine the effect of MONTH any further, as there were only two groups. However, if we wished to know which roosting situations were significantly different to one another, we need to perform additional multiple comparisons

. Since we don't know anything about the roosting situations, no one comparison is any more or less meaningful than any other comparisons. Therefore, a Tukey's test is most appropriate. Perform a Tukey's test (HINT)

and summarize indicate which of the following comparisons were significant (put * in the box to indicate P< 0.05, ** to indicate P< 0.001, and NS to indicate not-significant).

Show code

> library(multcomp)

> summary(glht(starling.lm, linfct = mcp(SITUATION = "Tukey")))

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = MASS ~ SITUATION * MONTH, data = starling)

Linear Hypotheses:

Estimate Std. Error t value Pr(>|t|)

S2 - S1 == 0 -0.600 1.881 -0.319 0.98868

S3 - S1 == 0 -2.600 1.881 -1.382 0.51453

S4 - S1 == 0 -6.600 1.881 -3.508 0.00407 **

S3 - S2 == 0 -2.000 1.881 -1.063 0.71289

S4 - S2 == 0 -6.000 1.881 -3.189 0.01119 *

S4 - S3 == 0 -4.000 1.881 -2.126 0.15469

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Adjusted p values reported -- single-step method)

> library(multcomp)

> summary(glht(lm(MASS ~ SITUATION + MONTH, data = starling), linfct = mcp(SITUATION = "Tukey")))

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = MASS ~ SITUATION + MONTH, data = starling)

Linear Hypotheses:

Estimate Std. Error t value Pr(>|t|)

S2 - S1 == 0 -2.400 1.321 -1.817 0.27350

S3 - S1 == 0 -3.800 1.321 -2.877 0.02614 *

S4 - S1 == 0 -7.400 1.321 -5.603 < 0.001 ***

S3 - S2 == 0 -1.400 1.321 -1.060 0.71471

S4 - S2 == 0 -5.000 1.321 -3.786 0.00182 **

S4 - S3 == 0 -3.600 1.321 -2.726 0.03900 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Adjusted p values reported -- single-step method)

> library(multcomp)

> confint(glht(lm(MASS ~ SITUATION + MONTH, data = starling), linfct = mcp(SITUATION = "Tukey")))

Simultaneous Confidence Intervals

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = MASS ~ SITUATION + MONTH, data = starling)

Quantile = 2.626

95% family-wise confidence level

Linear Hypotheses:

Estimate lwr upr

S2 - S1 == 0 -2.4000 -5.8681 1.0681

S3 - S1 == 0 -3.8000 -7.2681 -0.3319

S4 - S1 == 0 -7.4000 -10.8681 -3.9319

S3 - S2 == 0 -1.4000 -4.8681 2.0681

S4 - S2 == 0 -5.0000 -8.4681 -1.5319

S4 - S3 == 0 -3.6000 -7.0681 -0.1319

Comparison	Multiplicative				Additive
	Est	P	Lwr	Upr	Est	P	Lwr	Upr
Situation 2 vs Situation 1
Situation 3 vs Situation 1
Situation 4 vs Situation 1
Situation 3 vs Situation 2
Situation 4 vs Situation 2
Situation 4 vs Situation 3

Q2-6.Using the additive model, fill out the following table of effect sizes and confidence intervals

Show code

> starling.lm2 <- lm(MASS ~ SITUATION + MONTH, data = starling)

> cbind(coef(starling.lm2), confint(starling.lm2))

2.5 % 97.5 %

(Intercept) 91.75 89.670025 93.829975

SITUATIONS2 -2.40 -5.030983 0.230983

SITUATIONS3 -3.80 -6.430983 -1.169017

SITUATIONS4 -7.40 -10.030983 -4.769017

MONTHNovember -9.10 -10.960386 -7.239614

Estimate	Mean	Lower 95% CI	Upper 95% CI
December Situation 1
Effect size (Dec:Sit2 - Dec:Sit1)
Effect size (Dec:Sit3 - Dec:Sit1)
Effect size (Dec:Sit4 - Dec:Sit1)
Effect size (Nov:Sit1 - Dec:Sit1)

Q2-7.Generate a bargraph to summarize the data

Show code

> opar <- par(mar = c(5, 5, 1, 1))

> star.means <- with(starling, tapply(MASS, list(MONTH, SITUATION),

> mean))

> star.sds <- with(starling, tapply(MASS, list(MONTH, SITUATION),

> sd))

> star.len <- with(starling, tapply(MASS, list(MONTH, SITUATION),

> length))

> star.ses <- star.sds/sqrt(star.len)

> xs <- barplot(star.means, ylim = range(starling$MASS), beside = T,

> axes = F, ann = F, axisnames = F, xpd = F, axis.lty = 2,

> col = c(0, 1))

> arrows(xs, star.means, xs, star.means + star.ses, code = 2, length = 0.05,

> ang = 90)

> axis(2, las = 1)

> axis(1, at = apply(xs, 2, median), lab = c("Situation 1", "Situation 2",

> "Situation 3", "Situation 4"))

> mtext(2, text = "Mass (g) of starlings", line = 3, cex = 1.25)

> legend("topright", leg = c("January", "November"), fill = c(0,

> 1), col = c(0, 1), bty = "n", cex = 1)

> box(bty = "l")

> par(opar)

Q2-8. Summarize your conclusions from the analysis.

Exercise 3 - Two factor ANOVA - Type III SS

Here is a modified example from Quinn and Keough (2002). Stehman and Meredith (1995) present data from an experiment that was set up to test the hypothesis that healthy spruce seedlings break bud sooner than diseased spruce seedlings. There were 2 factors: pH (3 levels: 3, 5.5, 7) and HEALTH (2 levels: healthy, diseased). The dependent variable was the average (from 5 buds) bud emergence rating (BRATING) on each seedling. The sample size varied for each combination of pH and health, ranging from 7 to 23 seedlings. With two factors, this experiment should be analyzed with a 2 factor (2 x 3) ANOVA.

Download Stehman data set

Format of stehman.csv data files

PH	HEALTH	GROUP	BRATING
3	D	D3	0.0
..	..	..	..
3	H	H3	0.8
..	..	..	..
5.5	D	D5.5	0.0
..	..	..	..
5.5	H	H5.5	0.0
..	..	..	..
7	D	D7	0.2
..	..	..	..

PH	Categorical listing of pH (not however that the levels are numbers and thus by default the variable is treated as a numeric variable rather than a factor - we need to correct for this)
HEALTH	Categorical listing of the health status of the seedlings, D = diseased, H = healthy
GROUP	Categorical listing of pH/health combinations - used for checking ANOVA assumptions
BRATING	Average bud emergence rating per seedling

Open the stehman data file.

Show code

> stehman <- read.table("stehman.csv", header = T, sep = ",", strip.white = T)

> head(stehman)

PH HEALTH GROUP BRATING

1 3 D D3 0.0

2 3 D D3 0.8

3 3 D D3 0.8

4 3 D D3 0.8

5 3 D D3 0.8

6 3 D D3 0.8

The variable PH contains a list of pH values and is supposed to represent a factorial variable. However, because the contents of this variable are numbers, R initially treats them as numbers, and therefore considers the variable to be numeric rather than categorical. In order to force R to treat this variable as a factor (categorical) it is necessary to first convert this numeric variable into a factor (HINT).

Show code

> stehman$PH <- as.factor(stehman$PH)

Q3-1. Test the assumptions

by producing boxplots

and mean vs variance plot.

Show code

> boxplot(BRATING ~ HEALTH * PH, data = stehman)

> means <- with(stehman, tapply(BRATING, list(HEALTH, PH), mean))

> vars <- with(stehman, tapply(BRATING, list(HEALTH, PH), var))

> plot(means, vars, pch = 16)

Is there any evidence that one or more of the assumptions are likely to be violated? (Y or N)
Is the proposed model balanced?
(Y or N)

Show code

> replications(BRATING ~ HEALTH * PH, data = stehman)

$HEALTH

HEALTH

D H

67 28

$PH

3 5.5 7

34 30 31

$`HEALTH:PH`

HEALTH 3 5.5 7

D 23 23 21

H 11 7 10

> !is.list(replications(BRATING ~ HEALTH * PH, data = stehman))

[1] FALSE

As the model is not balanced, we will likely want to examine the ANOVA table based on Type III (marginal) Sums of Squares. In preparation for doing so, we must define something other than treatment contrasts for the factors.

Show code

> contrasts(stehman$HEALTH) <- contr.sum

> contrasts(stehman$PH) <- contr.sum

Q3-2. Now fit a two-factor ANOVA model

and examine the residuals.

Any evidence of skewness or unequal variances? Any outliers? Any evidence of violations? ('Y' or 'N') . As the model is not balanced, we will base hypothesis tests on Type II sums of squares. Produce an ANOVA table (HINT) and fill in the following table:

Show code

> stehman.lm <- lm(BRATING ~ HEALTH * PH, data = stehman)

> library(car)

> Anova(stehman.lm, type = "III")

Anova Table (Type III tests)

Response: BRATING

Sum Sq Df F value Pr(>F)

(Intercept) 114.114 1 444.8737 < 2.2e-16 ***

HEALTH 2.412 1 9.4049 0.002866 **

PH 1.861 2 3.6285 0.030558 *

HEALTH:PH 0.191 2 0.3731 0.689691

Residuals 22.829 89

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Source of Variation	SS	df	MS	F-ratio	Pvalue
PH
HEALTH
PH : HEALTH
Residual (within groups)

Q3-3.Summarize these trends using a interaction plot.

Show code

> library(car)

> with(stehman, interaction.plot(PH, HEALTH, BRATING))

Q3-4. In the absence of an interaction, we can examine the effects of each of the main effects in isolation. It is not necessary to examine the effect of HEALTH any further, as there were only two groups. However, if we wished to know which pH levels were significantly different to one another, we need to perform additional multiple comparisons.

Since no one comparison is any more or less meaningful than any other comparisons, a Tukey's test is most appropriate. Perform a Tukey's test

Show code

> library(multcomp)

> summary(glht(stehman.lm, linfct = mcp(PH = "Tukey")))

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: lm(formula = BRATING ~ HEALTH * PH, data = stehman)

Linear Hypotheses:

Estimate Std. Error t value Pr(>|t|)

5.5 - 3 == 0 -0.3861 0.1434 -2.692 0.0228 *

7 - 3 == 0 -0.1728 0.1345 -1.285 0.4068

7 - 5.5 == 0 0.2133 0.1463 1.457 0.3161

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Adjusted p values reported -- single-step method)

and summarize indicate which of the following comparisons were significant (put * in the box to indicate P< 0.05, ** to indicate P< 0.001, and NS to indicate not-significant).
pH 3 vs pH 5.5
pH 3 vs pH 7
pH 5.5 vs pH 7

Q3-5.Generate a bargraph to summarize the findings of the above Tukey's test.

The above analysis reflects a very classic approach to investigating the effects of PH and HEALTH via NHST (null hypothesis testing). Many argue that a more modern/valid approach is to:

Abandon hypothesis testing
Estimate effect sizes and associated effect sizes
As PH is an ordered factor, this is arguably better modelled via polynomial contrasts

Show code

> stehman.lm2 <- lm(BRATING ~ PH * HEALTH, data = stehman, contrasts = list(PH = contr.poly(3,

> scores = c(3, 5.5, 7)), HEALTH = contr.treatment))

> summary(stehman.lm2)

Call:

lm(formula = BRATING ~ PH * HEALTH, data = stehman, contrasts = list(PH = contr.poly(3,

scores = c(3, 5.5, 7)), HEALTH = contr.treatment))

Residuals:

Min 1Q Median 3Q Max

-1.2286 -0.3238 -0.0087 0.3818 0.9913

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.04127 0.06193 16.813 < 2e-16 ***

PH.L -0.08793 0.10766 -0.817 0.41625

PH.Q 0.27510 0.10688 2.574 0.01171 *

HEALTH2 0.35431 0.11553 3.067 0.00287 **

PH.L:HEALTH2 -0.13598 0.18987 -0.716 0.47576

PH.Q:HEALTH2 -0.10075 0.20986 -0.480 0.63232

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5065 on 89 degrees of freedom

Multiple R-squared: 0.1955, Adjusted R-squared: 0.1503

F-statistic: 4.326 on 5 and 89 DF, p-value: 0.001435

> confint(stehman.lm2)

2.5 % 97.5 %

(Intercept) 0.91821292 1.1643268

PH.L -0.30183758 0.1259805

PH.Q 0.06273462 0.4874743

HEALTH2 0.12475025 0.5838789

PH.L:HEALTH2 -0.51324208 0.2412834

PH.Q:HEALTH2 -0.51773380 0.3162242

Healthy spruce trees have a higher bud emergence rating than diseased (ES=0.35 CI=0.12-0.58)
The bud emergence rating follows a quadratic change with PH

Q3-6. Summarize your biological conclusions from the analysis.

Q3-7. Why aren't the 5 buds from each tree true replicates? Given this, why bother observing 5 buds, why not just use one?

Exercise 4 - Two factor ANOVA

An ecologist studying a rocky shore at Phillip Island, in southeastern Australia, was interested in how clumps of intertidal mussels are maintained. In particular, he wanted to know how densities of adult mussels affected recruitment of young individuals from the plankton. As with most marine invertebrates, recruitment is highly patchy in time, so he expected to find seasonal variation, and the interaction between season and density - whether effects of adult mussel density vary across seasons - was the aspect of most interest.

The data were collected from four seasons, and with two densities of adult mussels. The experiment consisted of clumps of adult mussels attached to the rocks. These clumps were then brought back to the laboratory, and the number of baby mussels recorded. There were 3-6 replicate clumps for each density and season combination.

Download Quinn data set

Format of quinn.csv data files

SEASON	DENSITY	RECRUITS	SQRTRECRUITS	GROUP
Spring	Low	15	3.87	SpringLow
..	..	..	..	..
Spring	High	11	3.32	SpringHigh
..	..	..	..	..
Summer	Low	21	4.58	SummerLow
..	..	..	..	..
Summer	High	34	5.83	SummerHigh
..	..	..	..	..
Autumn	Low	14	3.74	AutumnLow
..	..	..	..	..

SEASON	Categorical listing of Season in which mussel clumps were collected independent variable
DENSITY	Categorical listing of the density of mussels within mussel clump independent variable
RECRUITS	The number of mussel recruits response variable
SQRTRECRUITS	Square root transformation of RECRUITS - needed to meet the test assumptions
GROUPS	Categorical listing of Season/Density combinations - used for checking ANOVA assumptions

Open the quinn data file.

Show code

> quinn <- read.table("quinn.csv", header = T, sep = ",", strip.white = T)

> head(quinn)

SEASON DENSITY RECRUITS SQRTRECRUITS GROUP

1 Spring Low 15 3.872983 SpringLow

2 Spring Low 10 3.162278 SpringLow

3 Spring Low 13 3.605551 SpringLow

4 Spring Low 13 3.605551 SpringLow

5 Spring Low 5 2.236068 SpringLow

6 Spring High 11 3.316625 SpringHigh

Confirm the need for a square root transformation, by examining boxplots

and mean vs variance plots

for both raw and transformed data. Note that square root transformation was selected because the data were counts (count data often includes values of zero - cannot compute log of zero).

Show code

> par(mfrow = c(2, 2))

> boxplot(RECRUITS ~ SEASON * DENSITY, data = quinn)

> means <- with(quinn, tapply(RECRUITS, list(SEASON, DENSITY),

> mean))

> vars <- with(quinn, tapply(RECRUITS, list(SEASON, DENSITY), var))

> plot(means, vars, pch = 16)

> boxplot(SQRTRECRUITS ~ SEASON * DENSITY, data = quinn)

> means <- with(quinn, tapply(SQRTRECRUITS, list(SEASON, DENSITY),

> mean))

> vars <- with(quinn, tapply(SQRTRECRUITS, list(SEASON, DENSITY),

> var))

> plot(means, vars, pch = 16)

Also confirm that the design (model) is unbalanced

and thus warrants the use of Type III sums of squares. (HINT)

Show code

> !is.list(replications(sqrt(RECRUITS) ~ SEASON * DENSITY, data = quinn))

[1] FALSE

> replications(sqrt(RECRUITS) ~ SEASON * DENSITY, data = quinn)

$SEASON

SEASON

Autumn Spring Summer Winter

10 11 12 9

$DENSITY

DENSITY

High Low

24 18

$`SEASON:DENSITY`

DENSITY

SEASON High Low

Autumn 6 4

Spring 6 5

Summer 6 6

Winter 6 3

> contrasts(quinn$SEASON) <- contr.sum

> contrasts(quinn$DENSITY) <- contr.sum

Q4-1. Now fit a two-factor ANOVA model

(using the square-root transformed data and examine the residuals.

Any evidence of skewness or unequal variances? Any outliers? Any evidence of violations? ('Y' or 'N')
. Produce an anova table based on Type III SS and fill in the following table:

Show code

> quinn.lm <- lm(SQRTRECRUITS ~ SEASON * DENSITY, data = quinn)

> library(car)

> Anova(quinn.lm, type = "III")

Anova Table (Type III tests)

Response: SQRTRECRUITS

Sum Sq Df F value Pr(>F)

(Intercept) 539.72 1 529.0381 < 2.2e-16 ***

SEASON 90.64 3 29.6135 1.341e-09 ***

DENSITY 6.48 1 6.3510 0.01659 *

SEASON:DENSITY 11.35 3 3.7098 0.02068 *

Residuals 34.69 34

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Source of Variation	SS	df	MS	F-ratio	Pvalue
SEASON
DENSITY
SEASON : DENSITY
Residual (within groups)

Q4-2.Summarize these trends using a interaction plot.

Note that graphs do not place the restrictive assumptions on data sets that formal analyses do (since graphs are not statistical analyses). Therefore, it data transformations were used for the purpose of meeting test assumptions, it is usually better to display raw data (non transformed) in graphical presentations. This way readers can easily interpret actual values in a scale that they are more familiar with.

Show code

> library(car)

> with(quinn, interaction.plot(SEASON, DENSITY, SQRTRECRUITS))

Q4-3. The presence of a significant interaction means that we cannot make general statements about the effect of one factor (such as density) in isolation of the other factor (e.g. season). Whether there is an effect of density depends on which season you are considering (and vice versa). One way to clarify an interaction is to analyze subsets of the data. For example, you could examine the effect of density separately at each season (using four, single factor ANOVA's), or analyze the effect of season separately (using two, single factor ANOVA's) at each mussel density.
For the current data set, the effect of density is of greatest interest, and thus the former option is the most interesting. Perform the simple main effects anovas.

Download biology package

Show code

> library(biology)

> AnovaM(quinn.lm.summer <- mainEffects(quinn.lm, at = SEASON ==

> "Summer"), type = "III")

Df Sum Sq Mean Sq F value Pr(>F)

INT 6 91.200 15.2000 14.899 2.848e-08 ***

DENSITY 1 14.697 14.6974 14.406 0.0005794 ***

Residuals 34 34.687 1.0202

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> library(biology)

> AnovaM(quinn.lm. <- mainEffects(quinn.lm, at = SEASON == "Autumn"),

> type = "III")

Df Sum Sq Mean Sq F value Pr(>F)

INT 6 105.895 17.6492 17.2998 4.684e-09 ***

DENSITY 1 0.002 0.0022 0.0021 0.9636

Residuals 34 34.687 1.0202

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> library(biology)

> AnovaM(quinn.lm. <- mainEffects(quinn.lm, at = SEASON == "Winter"),

> type = "III")

Df Sum Sq Mean Sq F value Pr(>F)

INT 6 102.362 17.0603 16.7225 7.109e-09 ***

DENSITY 1 3.536 3.5356 3.4656 0.07132 .

Residuals 34 34.687 1.0202

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> library(biology)

> AnovaM(quinn.lm. <- mainEffects(quinn.lm, at = SEASON == "Spring"),

> type = "III")

Df Sum Sq Mean Sq F value Pr(>F)

INT 6 105.689 17.6148 17.2660 4.799e-09 ***

DENSITY 1 0.209 0.2086 0.2044 0.654

Residuals 34 34.687 1.0202

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Was the effect of DENSITY on recruitment consistent across all levels of SEASON? (Y or N)
How would you interpret these results?

Workshop 6 - Factorial Analysis of Variance

Basic statistics references

Nested ANOVA - one between factor

Download Curdies data set

Exercise 2 - Two factor ANOVA

Download Starling data set

Exercise 3 - Two factor ANOVA - Type III SS

Download Stehman data set

Exercise 4 - Two factor ANOVA

Download Quinn data set

Download biology package

Welcome to the end of Workshop 6