Worksheet 8 - ANCOVA and BACI designs

Question 1 - Analysis of Covariance

To investigate the impacts of sexual activity on the fruitfly longevity, Partridge and Farquhar (1981), measured the longevity of male fruitflies with access to either one virgin female (potential mate), eight virgin females, one pregnant female (not a potential mate), eight pregnant females or no females. The pool of available male fruitflies varied in size and since size is known to impact longevity, the researchers also measured thorax length as a covariate.

Format of partridge1.csv data file
TREATMENTTHORAXLONGEV
Preg80.6435
Preg80.6837
Preg80.6849
......

TREATMENTCategorical listing of the number and sort of female partners(None - no female partners, Preg1 - one pregnant female partner, Preg8 - eight pregnant partners, Virg1 - one virgin female, Virg8 - eight virgin females)
THORAXContinuous covariate, the length of the thorax (mm)
LONGEVLongevity of male fruitflies. Response variable.
Saltmarsh

Open the partridge1 data file. HINT.

Q1-1. In the table below, list the assumptions of nested ANOVA along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

AssumptionDiagnostic/Risk Minimization
I.
II.
III.
IV.
V.

Q1-2. Check the assumptions of normality and homogeneity of variances (HINT). Is there any evidence of violations of the assumptions? (Y or N)
If so, assess whether a transformation will address the violations (HINT).

Q1-3. Check these assumptions of linearity and homogeneity of slopes GRAPHICALLY (HINT). Is there any evidence of violations of the assumptions? (Y or N)
.

Q1-4. Check these assumptions of linearity and homogeneity of slopes EMPIRICALLY (HINT). Is there any evidence of violations of the assumptions? (Y or N)
.

Q1-5. Finally, check that the ranges of the covariate are similar for each level of the categorical variable (HINT). Is there any evidence that the ranges of the covariate differ between the different levels of the categorical variable? (Y or N)
.

Fit the linear model and produce an ANOVA table to test the null hypotheses that there no effects of treatment (female type) on the (log transformed) longevity of male fruitflies adjusted for thorax length. Note that as the design is inherently imbalanced (since there is a different series of thorax lengths within each treatment type), Type I sums of squares are inappropriate. Therefore Type III sums of squares will be used.

Q1-6. In addition to the global Ancova, the researchers likely to have been interested in examining a set of specific planned comparisons. Two such contrasts could be:
  1. "pregnant versus virgin partners (to investigate the impacts of any sexual activity)?" (contrast coefficients: 0, .5, .5, -.5, -.5)
  2. "one virgin versus eight virgin partners (to investigate the impacts of sexual frequency)?" (contrast coefficients: 0, 0, 0, 1, -1)

Q1-7. Before we fit the linear model (perform the ANOVA), we need to define the contrast coefficients (and thus comparisons) that we wish to perform in addition to the global ANOVA. Define the contrasts for the TREATMENT variable (HINT).

Q1-8. If there is no evidence that the assumptions have been violated and the contrasts were successfully defined, run the linear model, check residuals and examine the ANOVA table.

Q1-9. Present the results of the planned comparisons as part of the following ANOVA table:
Source of VariationSSdfMSF-ratioPvalue
THORAX
TREATMENT
  Preg vs Virg
  1 Virg vs 8 Virg
Residual (within groups)   

Question 2 - Analysis of Covariance

Constable (1993) compared the inter-radial suture widths of urchins maintained on one of three food regimes (Initial: no additional food supplied above what was in the initial sample, low: food supplied periodically and high: food supplied ad libitum. In an attempt to control for substantial variability in urchin sizes, the initial body volume of each urchin was measured as a covariate.

Format of constable.csv data file
TREATIVSUTW
Initial3.50.010
Initial5.00.020
Initial8.00.061
......

TREATCategorical listing of the foot treatment (Initial: no additional food supplied above what was in the initial sample, low: food supplied periodically and high: food supplied ad libium)
IVContinuous covariate, the initial volume of the urchin
SUTWWidth of the suture. Response variable.
Sea Urchin

Open the constable data file. HINT.

Q2-1. In the table below, list the assumptions of nested ANOVA along with how violations of each assumption are diagnosed and/or the risks of violations are minimized.

AssumptionDiagnostic/Risk Minimization
I.
II.
III.
IV.
V.

Q2-2. Check the assumptions of normality and homogeneity of variances (HINT). Is there any evidence of violations of the assumptions? (Y or N)
.

Q2-3. Check these assumptions of linearity and homogeneity of slopes GRAPHICALLY (HINT). Is there any evidence of violations of the assumptions? (Y or N)
.

Q2-4. Check these assumptions of linearity and homogeneity of slopes EMPIRICALLY (HINT). Is there any evidence of violations of the assumptions? (Y or N)
.

Q2-5. Finally, check that the ranges of the covariate are similar for each level of the categorical variable (HINT). Is there any evidence that the ranges of the covariate differ between the different levels of the categorical variable? (Y or N)
.

There is clear evidence that the relationships between suture width and initial volume differ between the three food regimes (slopes are not parallel and a significant interaction between food treatment and initial volume). Regular Ancova is not appropriate.

Q2-6. Determine the regions of difference between each of the food regimes pairwise using the Wilcox modification of the Johnson-Newman procedure (with Games-Howell critical value approximation).

Q2-7. Present the results of the Wilcox modification of the Johnson-Newman procedure in the following table:
ComparisondfCritical valuelowerupper
Initial vs low
Initial vs High
Low vs High

Question 3 - Simple BACI design

As part of assessing the impact of a nuclear power plant in southern California, data were collected from a location near the outfall pipe, which released heated seawater into the ocean, and at a control location a few km along the coast. The species that was of most concern when the plant was built, was the Giant Kelp, a large brown seaweed growing to a height of 20m, and providing a major habitat for a range of fish. The monitoring program collected data on the density of these algae, by counting the number of plants in large (10 m x 10 m) quadrats on the sea floor of two rocky reefs, one near the discharge point (Impact) and one situated approximately 10 km along the coast (Control). Monitoring started in 1981, before the outfall had been constructed, providing 9 quarterly samples Before the thermal discharge began, and continued for a further 11 quarters After the discharge.
Format of Nuclear power plant data set
Before the discharge
Control10.410.38.614.113.114.18.96.35.8
Impact4.23.46.38.35.88.34.83.53.3


After the discharge
Control16.536.943.041.438.335.528.212.19.37.66.6
Impact2.01.81.51.62.82.92.50.90.80.70.6

Each column represents the number of plants per 100m2 for 9 quarters before and 11 quarters after the impact.

Saltmarsh


Q3-1. One way to determine whether there has been an impact of the nuclear power plant is to examine whether the mean difference in algae density between control and impact sites differs between the before and after treatment.
a. What sort of statistical test would be appropriate for such an analysis?

b. What are the variables?

c. What are the replicates?

d. What are the assumptions of this test?


Q3-2. Generate the appropriate data set. Here is a suggested set of procedures;
a. Generate a factorial variable listing the 'Before' and 'After' for the 9 quarters before and 11 after the impact respectively (HINT)

b. Generate a numeric variable containing the algae densities within control sites(HINT)

c. Generate a numeric variable containing the algae densities within impact sites(HINT)

d. Combine the BA variable and the control and impact variables into a single data frame (data set) (HINT)

e. Generate a numeric variable containing the difference in algae densities between each pair of control and impact sites(HINT)

Q3-3. Recall the assumptions of a t-test or ANOVA and test and comment on the these assumptions. (HINT)


Q3-4. How could the data be made to conform to the test assumptions? Note, if transformations are required, it is probably because the original observations for both control and impact quarters were skewed. Therefore, transform the original variables (control and impact) and recreate the dependent variable used in the analysis (logCI).(HINT)



Q3-5. Perform the single factor ANOVA (HINT) or t-test (HINT) using the transformed data. Check the diagnostics (primarily residual plot, HINT). If these summaries do not reveal any additional assumption violations, examine the hypothesis test output (HINT) and generate a bar graph. Note, since graphical displays do not have underlying distributional assumptions (c.f. ANOVA model), the untransformed data should be used in the construction of the bar graph. It is easier for us humans to interpret raw data.


Q3-6.Based on the analysis and graph, did the power station have a significant impact on Giant Kelp plants?



Question 2 - Split-plot BACI design

In the previous question, we analysed the data from the nuclear power station example, using either a ttest or a one-way ANOVA, with the dependent variable as the difference in kelp density between Control and Impact locations (i.e., a BACI analysis). Now, we will revisit that analysis, using instead the full ANOVA model. The advantage of the full model is that we can deal with situations in which there are multiple control and/or impact locations.

Format of songs.csv data file
BATIMECIKELP
Before1Control10.4
Before2Control10.3
Before3Control8.6
Before4Control14.1
Before5Control13.1
Before6Control14.1
........

BACategorical listing of whether the sampling quarter was before (Before) or after (After) the impact (introduction of the power plant). Factor A (between plot factor).
TIMEListing of the sampling quarters. These are the plots (Factor B) and are nested within the Before and After sampling quarters. Numbers in this column represent numerical labels given to each plate.
CICategorical listing for the location (Control = control site, Impact = impact site). Factor C (within plot factor)
KELPDensity of kelp (#/100m2) measured. Response variable.
Saltmarsh

Open the songs data file. HINT. Notice that the TIME variable contains only numbers. Make sure that you define this variable as a factor (HINT)

Q4-1. This now represents a split-plot design. What are the null hypotheses being tested, and what are the correct MS terms to be used as the denominators in each of the F-ratio calculations?
a. H0 Main Effect 1 (Factor A):

F-ratio = MSBA/MS

b. H0 Nested effect (Factor B):

F-ratio = MSTIME/MS

c. H0 Main Effect 2 (Factor C):

F-ratio = MSCI/MS

d. H0 Main Effect 3 (Factor A:C):

F-ratio = MSCI/MS


Q4-3. Of these hypotheses, which is of greatest interest to the aims of this study? (BA, TIME, CI, BA:CI)

Q4-4.What are the assumptions associated with testing this hypothesis? How might they be tested? If need be transform data?

Q4-5. What are the replicates for this hypothesis?


Q4-6. Perform a split-plot ANOVA (HINT), and complete the following table (HINT). To obtain the hypothesis test for the random factor (Factor B: PLATE), examine the full anova table as if all factors were fixed and thus all terms are tested against the overall residuals, HINT)
Source of variationdfMean SqF-ratioP-value
BA
TIME  
CI
BA:CI
Residuals  

Note that the main test of interest is the interaction. You might not expect to find any difference in the density of kelp between control and impact sites before the impact (power station), but you might expect that there would be a difference after the impact - hence an interaction between before-after and control-impact. Note also that the test of this interaction gives the same degrees of freedom, F-ratio and P-value as is achieved via the simple ANOVA from Q3 above!

Q4-7.Construct an interaction plot to accumpany these results. HINT


Q4-8. What conclusions would you draw from the analysis (and graph)? Does it concur with the outcome from question 1?



Q4-9. What options are available for increasing the power of this particular sampling program?

Welcome to the end of Worksheet8!