Worksheet 1 - R and Rcmdr

Before beginning this worksheet make sure that you have either fully installed R (according to the instructions in section 1 of the R manual) or are running R from the Research Methods CD provided. Other versions of R and Rcmdr obtained elsewhere have not been customized for Bio3011 and therefore do not necessarily offer all the procedures necessary.

Demonstration 1 - Running R and Rcmdr

Q1-1. Lets begin by running R.

RGui

The picture above depicts RGui 2.0.0 running on a Windows system. RGui itself consists of a number of windows (although not all of them are necessarily always on display).
  1. R Console - this window is the main R window. It accepts typed R commands and displays R results. When Rcmdr is running, the R Console window is rarely examined or used.
  2. R Graphics - this window displays all graphs produced by R (and Rcmdr)
  3. Data Editor - this window provides a very crude spreadsheet for entering and modifying data sets.
Note, that both the R Graphics and Data Editor windows are not initially present - they only appear as required.

Everything in R is an object. For example, a single number is an object, a variable is an object, output is an object, a data set is an object, etc. Furthermore, all objects have unique names (that you provide) to enable each object to be referred to.

R consists of a base set of statistical and graphical functions that cover most of the common basic procedures for summarizing, plotting and analyzing data. Additional functionality is provided by an ever expanding library of add-on packages. These are collections of methods and functions that perform very specific tasks and are freely available for download and use. Before the functions provided in a add-on package can be accessed, the package must first be loaded into memory. There are literally hundreds of available packages, however at any one time, only a small fraction are likely to be required. Therefore, R is able to save system resources by loading only what is needed.

R itself is a statistical and graphical environment that is interfaced by typed commands at a prompt (>). It is acknowledged that command driven applications are often difficult to learn and can be daunting for students new to statistics. Consequently, a member of the R community by the name of John Fox has put together a add-on package comprising of a graphical user interface (GUI) that sits over the top of R and provides a way of interacting with R via a set of menus, dialog boxes, buttons and other common graphical interface elements. This project is called R commander (Rcmdr).

Q1-2. Load the Rcmdr package.

Rcmdr

Note that Rcmdr (R Commander) sits above R and uses input from the user to generate R commands (syntax), which it passes on to R for processing. The output from R is then passed back to R Commander were it is displayed. All the statistical and graphical procedures of R Commander are actually completed by R. In fact, R Commander does not display any graphs/plots. When graphs/plots are constructed in R Commander, they appear in R. As a result, you need to be comfortable with the idea of switching between Rcmdr (the application that constructs the R commands) and R, the application that actually performs the tasks.


Demonstration 2 - Data sets - Data frames(R)

Rarely is only a single biological variable collected. Data are usually collected in sets of variables reflecting tests of relationships, differences between groups, multiple characterizations etc. Consequently, data sets are best organized into collections of variables (vectors). Such collections are called data frames in R.
Data frames are generated by combining multiple vectors together whereby each vector becomes a separate column in the data frame. In for a data frame to represent the data properly, the sequence in which observations appear in the vectors (variables) must be the same for each vector and each vector should have the same number of observations. For example, the first observations from each of the vectors to be included in the data frame must represent observations collected from the same sampling unit.

To demonstrate the use of dataframes in R, we will use fictitious data representing the areas of leaves of two species of Japanese Boxwood

Format of the fictitious data set
PLANTSPECIESAREA
P1B.semp25
P2B.semp22
P3B.semp29
P4B.micro15
P5B.micro17
P6B.micro20

PLANTAn identifier for each individual plant that was measured (a single leaf was measured from each individual plant)
SPECIESCategorical listing of whether the individual plant was Buxus sempervirens (B.semp) or Buxus microphyllum (B.micro)
AREAThe surface area (mm2) of the leaf measured - Response variable
Leaves

Q2-1. Lets create the above data set in Rcmdr. This involves the following steps
  1. Select the Data menu from R Commander
  2. Select the New data set... submenu
  3. The New Data Set dialog box will appear
  4. Provide a name for the new data set. This can be any name, but it is good practice to use names that reflect the nature of the data set. In this case, since the data are about Japanese Boxwood leaves, an appropriate name might be boxwood or leaves
    You will be switched back to R (if not, switch to R manually) and the Data Editor Window will be displayed. This Data Editor is a simplistic spreadsheet that enables small data sets to be entered and offers very limited data manipulation.
  5. In R, variables are entered in columns, and the names of variables are entered in at the top of each column. To change the name of a column, click on the column name (var1, var2,....) and enter an appropriate variable name. You also need to indicate whether the information in the variable is numeric (purely numbers) or character (contains words). Changes are activated when you click the in the top right hand corner of the Variable editor dialog box or hit the Enter (return) key

  6. Enter the values (observations) for each of the variables. Make sure that AREA was defined as a numeric variable

  7. The data set is constructed when you click the in the top right hand corner of the Data Editor Window.

The data set can be viewed/edited at any time by clicking the button in the R Commander application. Note that the R syntax required to initiate the R Data Editor is displayed in the R Commander Script Window as well as in the R Commander Output Window (in red font). Hence the data set could have been created by entering that command directly into R.

R/Rcmdr does not actually save data in its own specific format (unlike most other software). Instead, R encourages the use of simple text files for all storage of data and output. The reason for this is that it is then possible to read, write and modify the files in almost any other software thereby offering universal sharing of information. Additionally, any corruption to a file will only result in loss of information for the specific part of the file that is affected rather than rendering the entire file unreadable.

Q2-2. To save the current data set, it is saved (exported) as a text file using the following sequence:
  1. From the R Commander application, select the Data menu
  2. Select the Active data set submenu
  3. Select the Export active data set...submenu
    The Export Active Data Set dialog box will be displayed
  4. Click
    The Save As dialog box will be displayed
  5. Provide a path and a filename for the file. Provide the file extension .csv, which is the standard extension for a comma delimited text file (a file in which items are separated by commas).
  6. Click

This file can then be opened and examined in a number of other programs including Word, Excel, Notepad etc.

Demonstration 3 - Importing data and data files

Although it is possible to generate a data set from scratch using the procedures demonstrated in the above demonstration module, often data sets are better managed with spreadsheet software such as Microsoft Excel. R is not designed to be a spreadsheet, and thus, it is necessary to import data into R. We will use the following small data set (in which the feeding metabolic rate of stick insects fed two different diets was recorded)to demonstrate how a data set is imported into R from Excel.

Format of the fictitious data set
PHASMIDDIETMET.RATE
P1tough1.25
P2tough1.22
P3tough1.29
P4soft1.51
P5soft1.55
P6soft1.48

PHASMIDAn identifier for each individual stick insect (Phasmid) that was measured
DIETCategorical listing of whether the food consumed was considered to be tough or soft
MET.RATEThe feeding metabolic rate (mg 02/min/g) of phasmids - Response variable
Leaves

Q3-1.Importing data into R from Excel is a multistage stage process.

  1. Enter the above data set into Excel and save the sheet as a comma delimited text file (CSV). Ensure that the column titles (variable names) are in the first row and that you take note where the file is saved. To see the format of this file, open it in Notepad (the windows accessory program). Notice that it is just a straight text file, there is no encryption or encoding.
  2. Read (import) the data set into a data table.
  3. To ensure that the data have been successfully imported, examine the file in the R Data Editor.
Q3-2.Cutting and pasting data into R from Excel is a multistage stage process.

  1. Enter the above data set into Excel. Ensure that the column titles (variable names) are in the first row.
  2. Highlight the data to import (including the column titles) and hit the CTRL-C key sequency (or alternatively, select Copy from the Edit menu).
  3. Read (import) the data set from the clipboard (copy and paste from Excel) into a data table.
  4. To ensure that the data have been successfully imported, examine the file in the R Data Editor.

Welcome to the end of Worksheet 1