Quick Navigation
Here is an example from Fowler, Cohen and Parvis (1998). An agriculturalist was interested in the effects of fertilizer load on the yield of grass. Grass seed was sown uniformly over an area and different quantities of commercial fertilizer were applied to each of ten 1 m2 randomly located plots. Two months later the grass from each plot was harvested, dried and weighed. The data are in the file fertilizer.csv.
Open the fertilizer data file.
If there is no evidence that the assumptions of simple linear regression have been violated, fit the linear model YIELD = intercept + (SLOPE * FERTILIZER). At this stage ignore any output.
Let’s use the example from Quinn and Keough (2002) to become familiar with fitting linear models and interpreting the output. Christensen et al. (1996) studied the relationships between coarse woody debris (CWD) and, shoreline vegetation and lake development in a sample of 16 lakes. They defined CWD as debris greater than 5cm in diameter and recorded, for a number of plots on each lake, the basal area (m2.km-1) of CWD in the nearshore water, and the density (no.km-1) of riparian trees along the shore. The data are in the file christ.csv and the relevant variables are the response variable, CWDBASAL (coarse woody debris basal area, m2.km-1), and the predictor variable, RIPDENS (riparian tree density, trees.km-1).
Open the christ data file.
If there is no evidence that the assumptions of simple linear regression have been violated, fit the linear model CWDBASAL = intercept + (SLOPE * RIPDENS). At this stage ignore any output.
Here is a modified example from Quinn and Keough (2002). Peake & Quinn (1993) investigated the relationship between the number of individuals of invertebrates living in amongst clumps of mussels on a rocky intertidal shore and the area of those mussel clumps.
Open the peakquinn data file.
The relationship between two continuous variables can be analyzed by simple linear regression, as was seen in question 1. Before performing the analysis we need to check the assumptions. To evaluate the assumptions of linearity, normality and homogeneity of variance, construct a scatterplot of INDIV against AREA (INDIV on y-axis, AREA on x-axis) including a lowess smoother and boxplots on the axes.
To get an appreciation of what a residual plot would look like when there is some evidence that the assumption of homogeneity of variance assumption has been violated, perform the appropriate linear regression (by fitting a linear model) purely for the purpose of examining the regression diagnostics (particularly the residual plot)
Transform both variables to logs (base 10), replot the scatterplot using the transformed data, refit the linear model (again using transformed data) and examine the residual plot.
If you are satisfied that the assumptions of the analysis are likely to have been met, perform the linear regression analysis (fit the linear model), examine the output, and use the information to construct the regression equation relating the number of individuals in the a clump to the clump size (note that as the estimates are based on model I OLS regression, the estimates may be heavily biased):
Rademaker and Cerqueira (2006), compiled data from the literature on the reproductive traits of opossoms (Didelphis) so as to investigate latitudinal trends in reproductive output. In particular, they were interested in whether there were any patterns in mean litter size across a longitudinal gradient from 44oN to 34oS. Analyses of data compiled from second hand sources are called metaanalyses and are very usefull at revealing overal trends across a range of studies.
Open the rademaker data file.
The main variables of interest in this data set are MLS (mean litter size) and LATITUDE. The other variables were included so as to enable you to see how meta data might be collected and collated from a number of other sources.
The relationship between two continuous variables can be analyzed by simple linear regression, as was seen in question 1. Before performing the analysis we need to check the assumptions. To evaluate the assumptions of linearity, normality and homogeneity of variance, construct a scatterplot of MLS against LATITUDE including a lowess smoother and boxplots on the axes.
To get an appreciation of what a residual plot would look like when there is some evidence that the linearity assumption has been violated, perform the simple linear regression (by fitting a linear model) purely for the purpose of examining the regression diagnostics (particularly the residual plot)
For this sort of trend that is clearly non-linear (yet the boxplots suggest normal data), transformations are of no use. Therefore, rather than attempt to model the data on a simple linear relationship (straight line), it is better to attempt to model the data on a curvilinear linear relationship (curved line). Note it is important to make the distinction between line (relationship) linearity and model linearity
Recall from question 3 above the data set of Peake and Quinn (1993) that demonstrated the relationship between island (individual mussel clumps) size and the number of individuals supported on the islands. Peake and Quinn (1993) also investigated the relationship between the size of mussel clumps (m2) and the number of other invertebrate species supported.
Open the peake data file.