Quick Navigation
The little spotted kiwi (Apteryx owenii) is a very rare flightless bird that is extinct on mainland New Zealand and survives as 1000 individuals on Kapiti Island. In order to monitor the population, researches in the recovery team systematically captured all of the individuals in the population over a two week period. Each individual was weighed, banded, assessed and released. The file *.csv lists the weights of each individual male little spotted kiwi in the population.
Open the kiwi data file.
Generate a frequency histogram of male kiwi weights. This distribution represents the population (all possible observations) of male kiwi weights. Note that this is the statistical population and not a biological population - obviously a biological population entirely lacking in females would not last long!
Since we have the weights of all male kiwi in the population, is possible to calculate population parameters (such as population mean and standard deviation) directly!
Assuming, the population is normally distributed, it is possible to calculate the probability that a randomly recaptured male kiwi will weigh greater than a particular value, less than a particular value, or weigh between a range of weights. This probability is just the area under a particular region of a normal distribution and can be calculated using the normal probabilities.
For data sets with large numbers of observations, the distribution of observations can be examined via a histogram - as demonstrated above. However, histograms are only meaningful for summarizing large data sets. For smaller data sets other exploratory tools (such as boxplots) are necessary. To appreciate the relationship between boxplots and the underlying distribution of data, construct a boxplot of male kiwi weights.
Here is a modified example from Quinn and Keough (2002). Lovett et al. (2000) studied the chemistry of forested watersheds in the Catskill Mountains in New York State. They had 38 sites and recorded the concentrations of ten chemical variables, averaged over three years. We will look at two of these variables, dissolved organic carbon (DOC) and hydrogen ions (H).
Open the lovett data file.
Before continuing, make sure you are clear on what the observations, variables and populations are.
Construct a boxplot of dissolved organic carbon (DOC) from the sample observations.
Provided the data were collected without bias (ideally random) and with adequate replication, the sample should reflect the entire population. Therefore sample statistics should be good estimates of the population parameters.
The mean of a sample is considered to be a location characteristic of the sample. Along with the mean, it is often desirable to characterize the spread of data in a sample - that is to determine how variable the sample is.
For most purposes, the sample itself is of little interest - it is purely used to estimate the population. Therefore it is necessary to be able to estimate how well the sample mean estimates the true population mean. The Standard error (SE) of the mean is a measure of the precision of the mean.
Following on from the idea of precision of the mean, is the concept of confidence intervals, by which an interval is calculated that we are 95% confident will contain the true population mean.
Construct a boxplot of hydrogen concentration (H) from the sample observations
Many statistical analyses assume that the population from which the sample was collected is normally distributed. However, biological data is not always normally distributed. To normalize the data, try transforming to logs.
Earlier we identified the presence of an outlier in the DOC variable. To investigate the impact of this outlier on a range of summary statistics, calculate the following measures of location (mean and median) and spread (standard deviation and interquartile range) for DOC, with and without the outlying observation and complete the table below.
Sánchez-Piñero & Polis (2000) studied the effects of seabirds on tenebrionid beetles on islands in the Gulf of California. These beetles are the dominant consumers on these islands and it was envisaged that seabirds leaving guano and carrion would increase beetle productivity. They had a sample of 25 islands and recorded the beetle density, the type of bird colony (roosting, breeding, no birds), % cover of guano and % plant cover of annuals and perennials.
Open the sanchez data file.
Before proceeding, make sure you are familiar with the significance of normally distributed sample data and thus why it is necessary to examine the distribution of sample data as part of routine exploratory data analysis (EDA) prior to any formal data analysis.
Often it is necessary to examine the nature of the relationship or association between variables as part of routine exploratory data analysis (EDA) prior to any formal data analysis. The nature of relationships/associations between continuous data is explored using scatterplots.
Sánchez-Piñero & Polis (2000) measured a number of continuous variables (% cover of guano, % cover or plants and abundance of beetles. Therefore, they might be interested in exploring the relationships between each of these variables. That is, the relationship between guano and plants, guano and beetles, and beetles and plants. While it is possible to create separate scatterplots for each pair (in this case three separate scatterplots), a scatterplot matrix is usually more informative and efficient.
Many statistical hypothesis tests assume that populations are equally varied. For hypothesis tests that compare populations (such as t-tests - see Question 4), it is important that one of the populations is not substantially more or less variable than the other population(s). Thus, such tests assume homogeneity of variance.
Furness & Bryant (1996) studied the energy budgets of breeding northern fulmars (Fulmarus glacialis) in Shetland. As part of their study, they recorded the body mass and metabolic rate of eight male and six female fulmars.
Open the furness data file.
The appropriate statistical test for testing the null hypothesis that the means of two independent populations are equal is a t-test
Before proceeding, make sure you understand what is meant by normality and equal variance as well as the principles of hypothesis testing using a t-test.
Since most hypothesis tests follow the same basic procedure, confirm that you understand the basic steps of hypothesis tests.
So, we wish to investigate whether or not male and female fulmars have the same metabolic rates, and that we intend to use a t-test to test the null hypothesis that the population mean metabolic rate of males is equal to the population mean metabolic rate of females. Having identified the important assumptions of a t-test, use the samples to evaluate whether the assumptions are likely to be violated and thus whether a t-test is likely to be reliability.
Here is a modified example from Quinn and Keough (2002). Elgar et al. (1996) studied the effect of lighting on the web structure or an orb-spinning spider. They set up wooden frames with two different light regimes (controlled by black or white mosquito netting), light and dim. A total of 17 orb spiders were allowed to spin their webs in both a light frame and a dim frame, with six days `rest' between trials for each spider, and the vertical and horizontal diameter of each web was measured. Whether each spider was allocated to a light or dim frame first was randomized. The H0's were that each of the two variables (vertical diameter and horizontal diameter of the orb web) were the same in dim and light conditions. Elgar et al. (1996) correctly treated these as paired comparisons because the same spider spun her web in a light frame and a dark frame.
Open the elgar data file.
We will now revisit the data set of Furness & Bryant (1996) that was used in Question 4 to investigate the effects of gender on the metabolic rates of breeding northern fulmars (Fulmarus glacialis). Furness & Bryant (1996) also recorded the body mass of the eight male and six female fulmars they captured.
Since the males and female fulmars were all independent of one another, a t-test would be appropriate to test the null hypothesis of no difference in mean body weight of male and female fulmars.
When the distributional assumptions are violated, parametric tests are unreliable. Under these circumstances, non-parametric tests can be very useful.
The tuatara (Sphenodon punctatum) is the sole surviving member of a group of ancient reptiles that otherwise disappeared 100 million years ago and occurs only in New Zealand. Evolutionary biologist interested in studying rates of evolution in this ancient reptilian lineage, measured the masses of 13 adult males from each of two isolated islets in the Cook Strait. The masses are in the file called tuatara.csv
Open the tuatara data file.
So, we wish to investigate whether the population mean mass of adult male tuataras from Islet A is equal to that of Islet B, and that we intend to use a t-test to test this null hypothesis. Having identified the important assumptions of a t-test, use the samples to evaluate whether the assumptions are likely to be violated and thus whether a t-test is likely to be reliability.