Quick Navigation
Before beginning this worksheet make sure that you have either fully installed R (according to the instructions in section 1 of the R manual) or are running R from the Research Methods CD provided. Other versions of R and Rcmdr obtained elsewhere have not been customized for Bio3011 and therefore do not necessarily offer all the procedures necessary.
Everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object, etc. Furthermore, all objects have unique names (that you provide) to enable each object to be referred to.
R consists of a base set of statistical and graphical functions that cover most of the common basic procedures for summarizing, plotting and analyzing data. Additional functionality is provided by an ever expanding library of add-on packages. These are collections of methods and functions that perform very specific tasks and are freely available for download and use. Before the functions provided in a add-on package can be accessed, the package must first be loaded into memory. There are literally hundreds of available packages, however at any one time, only a small fraction are likely to be required. Therefore, R is able to save system resources by loading only what is needed.
R itself is a statistical and graphical environment that is interfaced by typed commands at a prompt (>). It is acknowledged that command driven applications are often difficult to learn and can be daunting for students new to statistics. Consequently, a member of the R community by the name of John Fox has put together a add-on package comprising of a graphical user interface (GUI) that sits over the top of R and provides a way of interacting with R via a set of menus, dialog boxes, buttons and other common graphical interface elements. This project is called R commander (Rcmdr).
Note that Rcmdr (R Commander) sits above R and uses input from the user to generate R commands (syntax), which it passes on to R for processing. The output from R is then passed back to R Commander were it is displayed. All the statistical and graphical procedures of R Commander are actually completed by R. In fact, R Commander does not display any graphs/plots. When graphs/plots are constructed in R Commander, they appear in R. As a result, you need to be comfortable with the idea of switching between Rcmdr (the application that constructs the R commands) and R, the application that actually performs the tasks.
Rarely is only a single biological variable collected. Data are usually collected in sets of variables reflecting tests of relationships, differences between groups, multiple characterizations etc. Consequently, data sets are best organized into collections of variables (vectors). Such collections are called data frames in R. Data frames are generated by combining multiple vectors together whereby each vector becomes a separate column in the data frame. In for a data frame to represent the data properly, the sequence in which observations appear in the vectors (variables) must be the same for each vector and each vector should have the same number of observations. For example, the first observations from each of the vectors to be included in the data frame must represent observations collected from the same sampling unit.
To demonstrate the use of dataframes in R, we will use fictitious data representing the areas of leaves of two species of Japanese Boxwood
The data set can be viewed/edited at any time by clicking the button in the R Commander application. Note that the R syntax required to initiate the R Data Editor is displayed in the R Commander Script Window as well as in the R Commander Output Window (in red font). Hence the data set could have been created by entering that command directly into R.
R/Rcmdr does not actually save data in its own specific format (unlike most other software). Instead, R encourages the use of simple text files for all storage of data and output. The reason for this is that it is then possible to read, write and modify the files in almost any other software thereby offering universal sharing of information. Additionally, any corruption to a file will only result in loss of information for the specific part of the file that is affected rather than rendering the entire file unreadable.
This file can then be opened and examined in a number of other programs including Word, Excel, Notepad etc.
Although it is possible to generate a data set from scratch using the procedures demonstrated in the above demonstration module, often data sets are better managed with spreadsheet software such as Microsoft Excel. R is not designed to be a spreadsheet, and thus, it is necessary to import data into R. We will use the following small data set (in which the feeding metabolic rate of stick insects fed two different diets was recorded)to demonstrate how a data set is imported into R from Excel.