Traditional R Graphics

19 Oct 2011

Topics

High level plotting
- plot()
  - type()
  - axes limits

This Workshop has been thrown together a little hastily and is therefore not very well organized - sorry! Graphical features are demonstrated either via tables of properties or as clickable graphics that reveal the required R code. Click on a graphic to reveal/toggle the source code.

High level plotting functions

Most graphics in R are performed by issuing a series (one or more) graphical statements that sequentially add additional features to a graphical device. A graphical device is any device capable of receiving and interpreting graphical statements. Common examples

A window within R
A graphics file (such as a pdf, jpg, png etc)

I will cover more on graphical devices here

The
plot()
function

The plot() function is an overloaded function, the output of which depends on the class of object(s) taken as input. That said, the most common use of the plot() function is to prepare a plotting device (define the axes limits etc) and to apply very basic plot characteristics (axes, points, labels etc) to the device.

> plot(BOD)

The
type
parameter

The type parameter controls how the data points are represented on the graph.

The
xlim
and
ylim
parameters

These parameters control the range or span of the axes.

# Same as the default

> plot(BOD, xlim = NULL, main = "xlim=NULL")

# Minimum of zero, maximum of 10

> plot(BOD, xlim = c(0, 10), main = "xlim=c(0,10)")

The
xlab
and
ylab
parameters

These define the axes titles.

# Blank - no axes title

> plot(BOD, xlab = "", main = "xlab=' '")

# Custom axis title

> plot(BOD, xlab = "Time (days)", main = "xlab='Time (days)'")

The
axes
and
ann
parameters

These are logical parameters that indicates whether (=TRUE) or not (=FALSE) to plot axes and axes titles respectively.

# Suppress axes

> plot(BOD, axes = F, main = "axes=F")

# Suppress axes titles (including the main title)

> plot(BOD, ann = F, main = "ann=F")

The
log
parameter

These are logical parameters that indicates which (if any) of the axes should be plotted on a logarithmic scale.

# log x-axis

> plot(BOD, log = "x")

# log y-axis

> plot(BOD, log = "y")

# log both axis

> plot(BOD, log = "xy")

I now present a selection of commonly used high-level plotting functions. These functions typically provide quick and convenient graphical representations primarily for data exploration and diagnostics. As such, the aesthetics of these graphics is of little concern.

The
hist()
function

For this example, we will use the rivers dataset which provides the lengths (in miles) of 141 'major' rivers in North America.

# Histogram

> hist(rivers)

The
boxplot()
function

# Boxplot of river lengths

> boxplot(rivers)

# Boxplot of river lengths

> boxplot(rivers, horizontal = T)

# Boxplot of the number of breaks against wool type

> boxplot(breaks ~ wool, data = warpbreaks)

The
violin()
function

Violin plots are an alternative to boxplots. Arguably, these plots hide less of the underlying data than do boxplots.

# Violin plot of the number of breaks against wool type.

> library(UsingR)

> simple.violinplot(breaks ~ wool, data = warpbreaks, col = "grey",

> bw = "SJ")

The
scatterplot()
function

As we have seen, the plot() function already creates scatterplots. In the spirit of exploratory data analysis, we will illustrate the scatterplot() function in the car package. In addition to plotting the raw data, the scatterplot() function also includes a number of useful regression diagnostics including marginal boxplots, the line of best fit (fitted regression line) and a lowess smoother.

# Scatterplot of the relationship between black cherry tree volume and height.

> library(car)

> scatterplot(Volume ~ Height, data = trees)

Scatterplot matrices

Scatterplot matrices are an extension of scatterplots in which each variable is plotted against each other variable in a gridded arrangement. They are useful for visually exploring the relationships amongst multiple variables simultaneously.

# ScatterplotMatrix of various petal and sepal dimensions of iris flowers

> library(car)

> scatterplotMatrix(~Sepal.Length + Sepal.Width + Petal.Length +

> Petal.Width + Species, data = iris)

Interaction plots

# Interaction plot of tooth length against vitamin dose and supplement delivery method

> library(car)

> with(ToothGrowth, interaction.plot(dose, supp, len))

# Interaction plot of tooth length against vitamin dose and supplement delivery method

> library(gplots)

> plotmeans(len ~ interaction(supp, dose), data = ToothGrowth,

> connect = list(c(1, 3, 5), c(2, 4, 6)))

Mosaic and association plots

Mosaic and association plots are both conditioning plots that represent contingency table frequencies as a matrix of rectangles, the dimensions of which are proportional to the observed frequencies of each cross-classification. Furthermore, shading reflects the magnitudes of the Pearson's residuals. The main difference between mosaic and association plots is that the rectangles in association plots also indicate the polarity of the differences between observed and expected frequencies.

# Mosaic plot for the number of wool breaks tabulated according to wool type and level of tension classifiers
# Indicates (for example) that there were more breaks of wool type A under low tension and type B under medium tension than would be expected in the absence of an association between wool type and tension.

> library(vcd)

> wb.xtab <- xtabs(breaks ~ wool + tension, data = warpbreaks)

> strucplot(wb.xtab, gp = shading_max)

# Association plot for the number of wool breaks tabulated according to wool type and level of tension classifiers
# Indicates (for example) that there were more breaks of wool type A under low tension and type B under medium tension than would be expected in the absence of an association between wool type and tension.

> library(vcd)

> wb.xtab <- xtabs(breaks ~ wool + tension, data = warpbreaks)

> assoc(wb.xtab, gp = shading_max)

(Partial) effects plots

Graphical parameters - more control

Graphical parameters apply to an entire graphical device (are global) and provide additional aesthetics control over many of the characteristics of all the high and low level plotting functions applied in that device. That is, rather than specify a particular setting (such as font size) for each graphical function, the global parameters can be specified once and apply across all functions (although they can be individually overridden by any subsequent high or low level plotting function.

Graphical parameters can also control the layout, margins and spacing within a graphical device.

Global graphical parameters are specified in the par() function. When the par() function is used to alter a global graphical setting, it returns a list containing the previous settings (the settings that applied before the current change(s) were made) that applied to any of the altered parameters. Using this list as an argument to a subsequent par() function thereby restore the previous graphical parameters on the current device.

# set the plot margins of the current device to be four, five, one and one text lines
# from the bottom, left, top and right of the figure boundary.
# Then print out the original settings for the altered parameters.

> opar <- par(mar = c(4, 5, 1, 1))

> opar

$mar

[1] 5.1 4.1 4.1 2.1

# Restore the original plotting settings

> par(opar)

Plot dimensions and layout parameters

Parameter	Value	Description
din,fin,pin	=c(width,height)	Dimensions (width and height) of the device, figure and plotting regions (in inches)
fig	=c(left,right,bottom,top)	Coordinates of the figure region within the device. Coordinates expressed as a fraction of the device region.
mai,mar	=c(bottom,left,top,right)	Size of each of the four figure margins in inches and lines of text (relative to current font size).
mfg	=c(row,column)	Position of the currently active figure within a grid of figures defined by either mfcol or mfrow.
mfcol,mfrow	=c(rows,columns)	Number of rows and columns in a multi-figure grid.
new	=TRUE or =FALSE	Indicates whether to treat the current figure region as a new frame (and thus begin a new plot over the top of the previous plot (TRUE) or to allow a new high level plotting function to clear the figure region first (FALSE).
oma,omd,omi	=c(bottom,left,top,right)	Size of each of the four outer margins in lines of text (relative to current font size), inches and as a fraction of the device region dimensions
plt	=c(left,right,bottom,top)	Coordinates of the plotting region expressed as a fraction of the device region.
pty	="s" or "m"	Type of plotting region within the figure region. Is the plotting region a square (="s") or is it maximized (="m") to fit within the shape of the figure region.
usr	=c(left,right,bottom,top)	Coordinates of the plotting region corresponding to the axes limits of the plot.

Altered margins	Multiple figures	Figures within figures
# Boxplot of the number of breaks against wool type with wider margins	# Boxplot of the number of breaks against wool type with wider margins	# Boxplot of the number of breaks against wool type with wider margins

More on layout

In addition to splitting a graphics device up into a matrix of figures with the mfrow and mfcol graphical parameters, it is also possible to specify the size and arrangement of figures in a matrix with the layout() function. However, unlike the mfrow/mfcol parameters, the layout function does not force each row to have the same number of columns and vice verse.

> nc <- matrix(c(1, 1, 2, 3), ncol = 2, byrow = T)

> lay <- layout(nc)

> layout.show(lay)

> par(mar = c(4, 4, 1, 1))

> nc <- matrix(c(1, 1, 2, 3), ncol = 2, byrow = T)

> lay <- layout(nc)

> library(car)

> plot(Sepal.Length ~ Petal.Length, data = iris)

> boxplot(Sepal.Length ~ Species, data = iris, ylab = "Sepal length",

> xlab = "Species")

> boxplot(Petal.Length ~ Species, data = iris, ylab = "Petal length",

> xlab = "Species")

Axes characteristics

Parameter	Value	Description
ann,axes	=T or =F	High level plotting parameters that specify whether or not titles (main, sub and axes) and axes should be plotted.
bty	="o","l","7","c","u" or "]"	Single character whose upper case letter resembles the sides of the box or axes to be included with the plot.
lab	=c(x,y,length)	Specifies the length and number of tickmarks on the x and y axes.
las	=0, 1, 2 or 3	Specifies the style of the axes tick labels. 0 = parallel to axes, 1 = horizontal, 2 = perpendicular to axes, 3= vertical.
mgp	=c(title,labels,line)	Distance (in multiples of the height of a line of text) of the axis title, labels and line from the plot boundary.
tck,tcl	=length	The length of tick marks as a fraction of the plot dimensions (tck) and as a fraction of the height of a line of text (tcl)
xaxp,yaxp	=c(min,max,num)	Minimum, maximum and number of tick marks on the x and y axes
xaxs,yaxs	="r" or ="i"	Determines how the axes ranges are calculated. The "r" option results in ranges that extend 4% beyond the data ranges, whereas the "i" option uses the raw data ranges.
xlog,ylog	=FALSE or =TRUE	Specifies whether or not the x and y axes should be plotted on a (natural) logarithmic scale.
xpd	=FALSE, =TRUE or ='NA'	Specifies whether plotting is clipped to the plotting (=FALSE), figure (=TRUE) or device (='N') region

Character sizes

Rather than specify the exact point size of each set of characters in a figure, R defines a base size (by default, 12pt), and thereafter, character sizes of elements are defined relative to this base size. For example, if you wanted a label to be in 6pt, this would be 0.5 (half) the base point size. If you wanted the font to be 18pt, this would be 1.5 times the base size. Hence, character sizes are defined via character expansion (cex) factors.

The advantage of this system is the font sizes are scalable. That is, if you later decide to increase the size of a figure also want to increase the font sizes, you only need to alter the base point size for that device. I will discuss more of graphical devices here.

Parameter	Applies to
cex	All subsequent characters
cex.axis	Axis tick labels
cex.lab	Axes titles
cex.main	Main plot title
cex.sub	Plot sub-titles

Axes titles	Tick mark labels	Plotting character

Line characteristics

Parameter	Description	Examples
lty	The type of line. Specified as either a single integer in the range of 1 to 6 (for predefined line types) or as a string of 2 or 4 numbers that define the relative lengths of dashes and spaces within a repeated sequence.
lwd	The thickness of a line as a multiple of the default thickness (which is device specific).
lend	The line end style (square, butt or round).
ljoin	The line end style (square, butt or round).

Plotting character - pch

The plotting character (pch) can be of the following forms:

a number from 1 to 25 corresponding to one of the 25 basic plotting symbols
when used with font=5 (extended symbol font), Adobe symbol encoding can be specified. This encoding system uses integers between 1:128 and 160:254. In the Extended plotting characters figure below, the y-axis shows the first two digits of the Adobe symbol encoding, and the x-axis shows the third digit.
a quoted keyboard printing character (letter, number or punctuation)

Basic plotting characters	Extended plotting characters - used with font=5
	To plot the heart symbol: .., pch=169, font=5,..

Basic symbols	Tick mark labels	Plotting character

The size of plotting symbols is controlled by the character expansion (cex) parameter and the style of the of the lines that make up the plotting symbols is controlled by other line characteristics.

Fonts

The shape of text characters is controlled by the family (the typeface) and the font (the shape of the typeface). The families supported varies for each graphical device as do the names by which they are referred.

To get a list of the available font families for a specific device on your system, issue a command whose name starts with the name of the device and ends with "Fonts". For example, to query the available fonts for a pdf device on your system:

> pdfFonts()

Different fonts can also be applied to each of the main plotting components (font.axis: axes labels, font.lab: axes titles, font.main: Main plot title and font.sub: plot sub-title).

> plot(rnorm(5, 0, 1), rnorm(5, 0, 1), pch = "A", family = "serif",

> font = 4, xlab = "Predictor", ylab = "Response")

> plot(rnorm(5, 0, 1), rnorm(5, 0, 1), pch = "A", family = "serif",

> font = 4, font.lab = 2, xlab = "Predictor", ylab = "Response")

Hershey (vector) fonts

R also supports Hershey (vector) fonts that greatly extend the range of characters and symbols available. In contrast to regular (bitmap) fonts that consist of a set of small images (one for each character of each style and size), vector fonts consist of the coordinates of each of the curves required to create the character. That is, vector fonts store the information on how to draw the character rather than store the character itself. Hershey fonts can therefore be scaled to any size without distortion. Unfortunately however, Hershey fonts cannot be combined with regular fonts in a single plotting statement and thus they cannot be easily incorporated into mathematical formulae.

View Hershey (vector) font tables

Text orientation and justification

Parameter	Description	Examples
adj	Specifies the justification of a text string relative to the coordinates of its origin. A single number between 0 and 1 specifies horizontal justification. A vector of two numbers (=c(x,y)) indicates justification in horizontal and vertical directions.
crt,srt	Specifies the amount of rotation (in degrees) of single characters (crt) and strings (srt)

Colors

The color of most plotting elements is controlled by the col parameter. There are also separate parameters that control the color of each of the major components of a figure (col.axis: the axes tick labels, col.lab: the axes titles, col.main: the main plot title, col.sub: plot sub-titles) and when specified, take precedence over the col parameter. Two additional parameters, bg and fg can be used to control the color of the background and foreground (boxes and axes) respectively.

Here are a few of the ways in which colors can be specified

by an index (numbers 0-8) to a small palette of eight colors (0 indicates the background color). The colors in this palette can be reviewed with the palette() (color palette) function

by name. The names of the 657 defined colors can be reviewed with the colors() (color palette) function. The epitools package provides the colors.plot() (display palette) function which generates a graphic that displays a matrix of all the colors. When used with the locator=TRUE argument, a series left mouse clicks on the color squares, terminated by a right mouse click, will result in a matrix of corresponding color names.

View colors

via one of the other built-in color palettes that essentially sets of colors within themes. These palettes return n number of colors and the color transparency/opacity is controlled via a alpha parameter (values between 0 and 1, where 1 is completely opaque).
- rainbow(n) - Red->Violet
- heat.colors(n) - White->Orange->Red
- terrain.colors(n) - White->Brown->Green
- topo.colors(n) - White->Brown->Green->Blue
- grey(n) - White->Black
by direct specification of the red, green and blue components of the RGB spectrum as a character string in the form "#RRGGBB". This string consists of a # followed by a pair of hexadecimal digits in the range 00:FF for each component. For those devices supporting transparency, two additional digits can be added on the end of the hex code to indicate the degree of transparency/opacity (00: fully transparent, 99: fully opaque).
via rgb(), hsv(), hcl() and col2rgb() also provide other ways to specify colors.

Enhancing and customizing plots with low-level plotting functions

Having set up a plotting device (typically by calling a high level plotting function), additional graphical elements can be manually added to a plot via specific low-level plotting functions. The most aesthetically pleasing graphics are typically produced by preparing a blank plotting device (essentially defining the size, layout and axes limits), and then manually building up the desired features via low-level plotting functions.

In addition to their specific parameters, each of the following functions accept many of the graphical parameters. In the function definitions, these capabilities are represented by three consecutive dots (...). Technically, ... indicates that any supplied arguments that are not explicitly part of the definition of a function are passed on to the relevant underlying functions (in this case, par()).

Adding points - `points()`

Points can be added to a plot using the points(x, y, pch, ...) function. This function plots a plotting character (specified by the pch parameter) at the coordinates specified by the vectors x,y. Alternatively, the coordinates can be passed as a formula of the form, y~x

# plot two series of random data

> opar <- par(mar = c(4, 5, 0, 0))

> set.seed(1)

> X <- seq(9, 12, l = 10)

> Y1 <- (1 * X + 2) + rnorm(10, 3, 1)

> Y2 <- (1.2 * X + 2) + rnorm(10, 3, 1)

> plot(c(Y1, Y2) ~ c(X, X), type = "n", axes = T, ann = F, bty = "l",

> las = 1)

> points(Y1 ~ X, pch = 21, type = "b")

> points(Y2 ~ X, pch = 16, type = "b")

> par(opar)

Adding text - `text()`

The text() function adds text strings (labels parameter) to the plot at the supplied coordinates (x,y) and is defined as:

text (x, y = NULL, labels = seq\_along(x), adj = NULL, pos = NULL, offset = 0.5, vfont = NULL, cex = 1, col = NULL, font = NULL, ...)

Descriptions and examples of the arguments not previously outlined in the graphical parameters section, are outlined in the table below.

Parameter	Description	Examples
pos	Simplified text justification that overrides the `adj` parameter. 1=below, 2=left, 3=above and 4=right.
offset	Offset used by `pos` as a fraction of the width of a character.
vfont	Provision for Hershey (vector) font specification (`vfont=c(typeface, style)`.

Constructing character strings - `paste()`

The paste() function concatenates vectors together after converting each of the elements to characters. This is particularly useful for making labels and is equally useful in non-graphical applications. Paste has two other optional parameters (sep and collapse) which define extra character strings to be placed between strings joined. sep operates on joins between paired vector elements whereas collapse operates on joints of elements within a vector.

> temp <- c("H", "M", "L")

> temp

[1] "H" "M" "L"

> paste(temp, 1:3, sep = ":")

[1] "H:1" "M:2" "L:3"

> paste(temp, collapse = ":")

[1] "H:M:L"

> paste(temp, 1:3, sep = "-", collapse = ":")

[1] "H-1:M-2:L-3"

> opar <- par(mar = c(4, 5, 0, 0))

> set.seed(10)

> X <- rnorm(5, 10, 1)

> Y <- rnorm(5, 10, 1)

> plot(X, Y, type = "n", axes = T, ann = F, bty = "l", las = 1,

> xlim = c(8, 11), ylim = c(8, 11))

> points(X, Y, col = "grey", pch = 16)

> text(X, Y, paste("Site", 1:5, sep = "-"), cex = 1, pos = 4)

> par(opar)

Adding text to plot margins - `mtext()`

The mtext() function adds text (text) to the plot margins and is typically used to create fancy or additional axes titles. The mtext() function is defined as:

mtext(text, side = 3, line = 0, outer = FALSE, at = NA, adj = NA, padj = NA, cex = NA, col = NA, font = NA, ...)

Descriptions and examples of the arguments not previously outlined in the graphical parameters section, are outlined in the following Table.

Parameter	Description	Examples
side	Specifies which margin the title should be plotted in. 1=bottom, 2=left, 3=top and 4=right.
line	Number of text lines out from the plot region into the margin to plot the marginal text.
outer	For multi-plot figure, if `outer=TRUE`, put the marginal text in the margin (if there is one).
at	Position along the axis (in user coordinates) of the text
adj,padj	Adjustment (justification) of the position of the marginal text parallel (`adj`) and perpendicular (`padj`) to the axis. Justification depends on the orientation of the text string and the margin (axis).

Adding a legend - `legend()`

The legend() function brings together a rich collection of plotting functions to produce highly customizable figure legends in a single call. A sense of the rich functionality of the legend function is reflected in Table table below and the function definition:

legend(x, y = NULL, legend, fill = NULL, col = par("col"), lty, lwd, pch, angle = 45, density = NULL, bty = "o", bg = par("bg"), box.lwd = par("lwd"), box.lty = par("lty"), pt.bg = NA, cex = 1, pt.cex = cex, pt.lwd = lwd, xjust = 0, yjust = 1, x.intersp = 1, y.intersp = 1, adj = c(0, 0.5), text.width = NULL, text.col = par("col"), merge = do.lines && has.pch, trace = FALSE, plot = TRUE, ncol = 1, horiz = FALSE, title = NULL, inset = 0)

In addition to the usual methods for specifying the positioning coordinates, convenient keywords reflecting the four corners ("bottomleft", "bottomright", "topleft", "topright") and boundaries ("bottom", "left", "top", "right") of the plotting region can alternatively be specified.\\

Parameter	Description	Examples
legend	A vector of strings or expressions to comprise the labels of the legend.
title	A string or expression for a title at the top of the legend
bty, box.lty, box.lwd	The type (`"o"` or `"n"`), line thickness and line style of box framing the legend.
bg, text.col	The colors used for the legend background and legend labels
horiz	Whether or not to produce a horizontal legend instead of a vertical legend.
ncol	The number of columns in which to arrange the legend labels.
cex	Character expansion for all elements of the legend relative to the plot `cex` graphical parameter.
Boxes	If any of the following parameters are set, the legend labels will be accompanied by boxes.
fill	Specifies the fill color of the boxes. A vector of colors will result in different fills.
angle, density	Specifies the angle and number of lines that make up the stripy fill of boxes. Negative density values result in solid fills.
Points	If any of the following parameters are set, the legend labels will be accompanied by lines.
pch	Specifies the type of plotting character.
pt.cex, pt.lwd	Specifies the character expansion and line width of the plotting characters.
col, pt.bg	Specifies the foreground and background color of the plotting characters (and lines for `col`).
Lines	If any of the following parameters are set, the legend labels will be accompanied by lines.
lwd, lty	Specifies the width and type of lines.
merge	Whether or not to merge points and lines.

More advanced text formatting

The text plotting functions described above (text(), mtext() and legend()) can also build plotting text from objects that constitute the R language itself. These are referred to as language objects and include:

names - the names of objects
expressions - unevaluated syntactically correct statements that could otherwise be evaluated at the command prompt
calls - these are specific expressions that comprise of an unevaluated named function (complete with arguments)

Any language object passed as an argument to one of the text plotting functions described above (text(), mtext() and legend()) will be coerced into an expression and evaluated as a mathematical expression prior to plotting. In so doing, the text plotting functions will also apply TeX-like formatting (the extensive range of which can be sampled by issuing the demo(plotmath) command) where appropriate.

Hence, advanced text construction, formatting and plotting is thus achieved by skilled use of a variety of functions (described below) that assist in the creation of \textit{language objects} for passing to the text plotting functions.

Complex expressions - `expression()`

The expression function is used to build complex expressions that incorporate TeX-like mathematical formatting. Hence, the expression function is typically nested within one of the text plotting functions to plot complex combinations of characters and symbols.

# plot two series of random data

> opar <- par(mar = c(4, 6, 0, 0), cex = 1.5, cex.lab = 1.2)

> set.seed(10)

> X <- rnorm(5, 10, 1)

> Y <- rnorm(5, 10, 1)

> plot(X, Y, type = "p", axes = T, ann = F, bty = "l", las = 1)

> mtext(expression(Temperature ~ (degree * C)), 1, line = 3, cex = 1.5)

> mtext(expression(Respiration ~ (mL ~ O[2] ~ h^-1)), 2, line = 3.5,

> cex = 1.5)

> par(opar)

# plot two series of random data

> opar <- par(mar = c(4, 6, 0, 0), cex = 1.5, cex.lab = 1.2)

> set.seed(10)

> X <- rnorm(5, 10, 1)

> Y <- rnorm(5, 10, 1)

> plot(X, Y, type = "p", axes = T, ann = F, bty = "l", las = 1)

> text(9.3, 10, expression(f(y) == frac(1, sqrt(2 * pi * sigma^2)) *

> e^frac(-(y - mu)^2, 2 * sigma^2)), cex = 1.25)

> par(opar)

Complex expressions - `bquote()`

The bquote() function generates a language object by converting the argument after first evaluating any objects wrapped in `.()'. This provides a way to produce text strings that combine mathematical formatting and the output statistical functions.

In the example below, note the required use of the tilde (~) character to allow a space between the words corr and coef. Alternatively, a space can be provided by the keyword phantom(char), where char is a character whose width is equal to the amount of space required. Had we have put a space between e words corr and coef in the R code, we would have created a syntactically incorrect mathematical expression (not good).

# Combining strings and R objects into a text label

> opar <- par(mar = c(4, 5, 0, 0))

> set.seed(3)

> X <- rnorm(20, 0, 1)

> Y <- rnorm(20, 0, 1)

> cc <- cor(X, Y)

> plot(X, Y, type = "n", axes = T, ann = F, bty = "l", las = 1)

> points(X, Y, col = "grey", pch = 16)

> text(0, 0, bquote(corr ~ coef == .(round(cc, 2))), cex = 3)

> par(opar)

Complex expressions - `substitute()`

Alternatively, for situations in which substitutions are required within non-genuine mathematical expressions (such as straight character strings), the substitute() function is useful.

# Combining strings and R objects into a text label

> opar <- par(mar = c(4, 5, 0, 0))

> X <- c(2, 4, 6, 10, 14, 18, 24, 30, 36, 42)

> Y <- c(5, 8, 10, 11, 15, 18, 16, 15, 19, 16)

> n <- nls(Y ~ SSasymp(X, a, b, c))

> plot(Y ~ X, type = "p", ann = F)

> lines(1:40, predict(n, data.frame(X = 1:40)))

> a <- round(summary(n)$coef[1, 1], 2)

> b <- round(summary(n)$coef[2, 1], 2)

> c <- round(summary(n)$coef[3, 1], 2)

> text(40, 8, substitute(y == a - b * e^{

> c * x

> }, list(y = "Nutrient uptake", a = a, b = b, c = c, x = "Time")),

> cex = 1.25, pos = 2)

> mtext("Time (min)", 1, line = 3)

> mtext(expression(Nutrient ~ uptake ~ (mu ~ mol ~ g^-1)), 2, line = 3)

> par(opar)

Combinations of advanced text formatting functions

It is possible to produce virtually any text representation on an R plot, however, some representations require complex combinations of the above functions. Whilst, these functions are able to be nested within one another, the combinations often appear to behave counter-intuitively. Great understanding and consideration of the exact nuances of each of the functions is required in order to successfully master their combined effects. Nevertheless, the following scenarios should provide some appreciation of the value and uses of some of these combinations.

For example, the formula for calculating the mean of a sample

μ=∑y_i/n

as represented by an R mathematical expression is: mu == frac(sum(y[i]),n). What if however, we wished to represent not only the formula applied to the data, but the result of the formula as well?;

μ = ∑y_i/n = 10

To substitute the actual result, the bquote() function is appropriate. However, the following mathematical expression is not syntactically correct, as a mathematical expression cannot have two relational operators (==) in the one statement.
mu == frac(sum(y[i]),n) == .(meanY) .

Building such an expression is achieved by combining the bquote() \textit{function} with a paste() function.

The more observant and discerning reader may have noticed the y-axis label in the substitute() example above had a space between the μ and the word `mol'. Using just the expression() function, this was unavoidable. A more eligant solution would have been to employ a expression(paste()) combination.

# plot two series of random data

> opar <- par(mar = c(4, 5, 0, 0))

> set.seed(1)

> Y <- rnorm(100, 10, 1)

> plot(density(Y), type = "l", axes = T, ann = F, bty = "l", las = 1,

> col = "grey")

> text(10, 0.2, bquote(paste(mu == frac(sum(y[i]), n)) == .(mean(Y))),

> cex = 2)

> par(opar)

# plot two series of random data

> opar <- par(mar = c(4, 5, 0, 0))

> X <- c(2, 4, 6, 10, 14, 18, 24, 30, 36, 42)

> Y <- c(5, 8, 10, 11, 15, 18, 16, 15, 19, 16)

> n <- nls(Y ~ SSasymp(X, a, b, c))

> plot(Y ~ X, type = "p", ann = F)

> lines(1:40, predict(n, data.frame(X = 1:40)))

> a <- round(summary(n)$coef[1, 1], 2)

> b <- round(summary(n)$coef[2, 1], 2)

> c <- round(summary(n)$coef[3, 1], 2)

> text(40, 8, substitute(y == a - b * e^{

> c * x

> }, list(y = "Nutrient uptake", a = a, b = b, c = c, x = "Time")),

> cex = 1.25, pos = 2)

> mtext("Time (min)", 1, line = 3)

> mtext(expression(paste("Nutrient uptake", " (", mu, "mol.", g^-1,

> ")", sep = "")), 2, line = 3)

> par(opar)

Adding axes - `axis()`

Although most of the high-level plotting functions provide some control over axes construction (typically via graphical parameters), finer control over the individual axes is achieved by constructing each axis separately with the axis() function. The axis() function is defined as:

axis(side, at = NULL, labels = TRUE, tick = TRUE, line = NA, pos = NA, outer = FALSE, font = NA, lty = "solid", lwd = 1, col = NULL, hadj = NA, padj = NA, ...)

# plot two series of random data

> opar <- par(mar = c(4, 1, 0, 0))

> set.seed(1)

> X <- rnorm(200, 10, 1)

> m <- mean(X)

> s <- sd(X)

> plot(density(X), type = "l", axes = F, ann = F)

> axis(1, at = c(0, m, m + s, m - s, m + 2 * s, m + 2 * -s, 100),

> lab = expression(NA, mu, 1 * sigma, -1 * sigma, 2 * sigma,

> -2 * sigma, NA), pos = 0, cex.axis = 2)

> par(opar)

Parameter	Description	Examples
side	Simplifies which axis to construct. 1=bottom, 2=left, 3=top and 4=right.
at	Where the tick marks are to be drawn. Axis will span between minimum and maximum values supplied.
labels	Specifies the labels to draw at each tickmark. `TRUE` or `FALSE` - should labels be drawn a character or expression vector defining the text appear at each tickmark specified by the `at` parameter.
tick	Specifies whether or not (`TRUE` or `FALSE`) the axis line and tickmarks should be drawn
line	Specifies the number of text lines into the margin to place the axis (along with the tickmarks and labels).
pos	Specifies where along the perpendicular axis, the current axis should be drawn.
outer	Specifies whether or not (`TRUE` or `FALSE`) the axis should be drawn in the outer margin.
font	The font used for the tickmark labels.
lwd, lty, col	Specifies the line width, style and color of the axis line and tickmarks.
hadj, padj	Specifies the parallel and perpendicular adjustment of tick labels to the axis. Units of movement (for example) are `padj=0`: right or top, `padj=1`: left or bottom. Other values are multipliers of this justification.

Adding lines and shapes to a plot

There are a number of low-level plotting functions for plotting lines and shapes. Individually and collectively, they provide the tools to construct any custom graphic.

The following demonstrations will utilize a dataset by Christensen et al. (1996) that consists of course woody debris (CWD) measurements as well as a number of human impact/land use characteristics for riparian zones around freshwater lakes in North America.

Download Christensen data set

Straight lines - `abline()}`

The low-level plotting abline() function is used to fit straight lines with a given intercept (a) and gradient (b) or single values for horizontal (h) or vertical (v) lines. The function can also be passed a fitted linear model (reg) or coefficient vector from which it extracts the intercept and slope parameters.

The definition of the abline() function is:

abline(a = NULL, b = NULL, h = NULL, v = NULL, reg = NULL, coef = NULL, untf = FALSE, ...)

# plot two series of random data

> opar <- par(mar = c(4, 5, 1, 1))

> plot(CWDDENS ~ RIPDENS, data = christ1)

> abline(lm(CWDBASAL ~ RIPDENS, data = christ1))

> abline(h = mean(christ1$CWDBASAL), lty = 2)

> par(opar)

Lines joining a succession of points - `lines()}`

The lines() function can be used to add lines between points and is particularly useful for adding multiple trends (or non-linear trends, see section on smoothers) through a data cloud. As with the points() function, the lines() function is a generic function whose actions depend on the type of objects passed as arguments. Notably, for simple coordinate vectors, the points() and lines() functions are virtually interchangeable (accept in the type of points they default to). Consequently, a more complex example involving the predict() (predicted values)|(} function (a function that predicts new values from fitted models) will be used to demonstrate the power of the lines function.

Assessing departures from linearity and homogeneity of variance can be assisted by fitting a linear (least squares regression) line through the data cloud.

# this example also uses the cut() function to create a categorical variable by partitioning a continuous variable.

> opar <- par(mar = c(4, 5, 1, 1))

> plot(CWDDENS ~ RIPDENS, data = christ1, typ = "p")

> area <- cut(christ1$AREA, 2, lab = c("small", "large"))

> lm.small <- lm(CWDDENS ~ RIPDENS, data = christ1, subset = area ==

> "small")

> lm.large <- lm(CWDDENS ~ RIPDENS, data = christ1, subset = area ==

> "large")

> lines(christ1$RIPDENS[area == "small"], predict(lm.small))

> lines(christ1$RIPDENS[area == "large"], predict(lm.large), lty = 2)

> legend("bottomright", title = "Area", legend = c("small", "large"),

> lty = c(1, 2))

> par(opar)

Lines between pairs of points - `segments()}`

The segments \textit{function} draws straight lines between points ((x0,y0) and (x1,y1)). When each of the coordinates are given as vectors, multiple lines are drawn.

segments(x0, y0, x1, y1, col = par("fg"), lty = par("lty"), lwd = par("lwd"), ...)

Assessing departures from linearity and homogeneity of variance can also be further assisted by adding lines to represent the residuals (segments that join observed and predicted responses for each predictor). This example also makes use of the with() \textit{function} which evaluates any expression or call (in this case the segments function) in the context of a particular data frame (christ) or other environment.

# this example also uses the cut() function to create a categorical variable by partitioning a continuous variable.

> opar <- par(mar = c(4, 5, 1, 1))

> plot(CWDDENS ~ RIPDENS, data = christ1, type = "p")

> christ.lm <- lm(CWDDENS ~ RIPDENS, data = christ1)

> abline(christ.lm)

> with(christ1, segments(RIPDENS, CWDDENS, RIPDENS, predict(christ.lm),

> lty = 2))

> par(opar)

Arrows and connectors - `arrows()}`

The arrows() function builds on the segments function to add provisions for simple arrow heads. Furthermore, as the length, angle and end to which the arrow head applies are all controllable, the arrows() function is also particularly useful for annotating figures and creating flow diagrams. The function can also be useful for creating customized error bars (as demonstrated in the following example).

# this example also uses ci() function from the gmodels package to calculate confidence intervals

> opar <- par(mar = c(4, 5, 1, 1))

> area <- cut(christ1$AREA, 2, lab = c("small", "large"))

> library(gmodels)

> s <- tapply(christ1$CWDDENS, area, ci)

> plot(christ1$CWDDENS ~ area, border = "white", ylim = range(s))

> points(1, s$small["Estimate"])

> points(2, s$large["Estimate"])

> with(s, arrows(1, small["CI lower"], 1, small["CI upper"], length = 0.1,

> angle = 90, code = 3))

> with(s, arrows(2, large["CI lower"], 2, large["CI upper"], length = 0.1,

> angle = 90, code = 3))

> par(opar)

Arrows and connectors - `arrows()`

The rect() function draws rectangles from left-bottom, right-top coordinates that can be filled with solid or striped patterns (according to the line type, width, angle, density and color):

rect(xleft, ybottom, xright, ytop, density = NULL, angle = 45, col = NA, border = NULL, lty = par("lty"), lwd = par("lwd"), ...)

The main use of rectangles is to produce frames for items within plots.

> opar <- par(mar = c(4, 5, 0, 0))

> set.seed(1)

> Y <- rnorm(200, 10, 1)

> plot(density(Y), type = "l", axes = T, ann = F, bty = "l", las = 1,

> col = "grey")

> rect(7.5, 0.1, 12.5, 0.3, ang = 45, density = 20, col = "grey",

> border = "black")

> text(10, 0.2, bquote(paste(mu == frac(sum(y[i]), n)) == .(mean(Y))),

> cex = 2)

> par(opar)

Smoothers

Smoothing functions can be useful additions to scatterplots, particularly for assessing (non)linearity and the nature of underlying trends. There are many different types of smoothers, including loess and lowess (locally weighted smoothers), kernel smoothers and splines.

Smoothers are added to a plot by first fitting the smoothing function (loess(), ksmooth()) to the data before plotting the values predicted by this function across the span of the data.

# this example fits loess smoother and kernel smoothers through the data

> opar <- par(mar = c(4, 5, 1, 1))

> plot(CWDDENS ~ RIPDENS, data = christ1)

> christ.loess <- loess(CWDDENS ~ RIPDENS, data = christ1)

> xs <- sort(christ1$RIPDENS)

> lines(xs, predict(christ.loess, data.frame(RIPDENS = xs)))

> christ.kern <- ksmooth(christ1$RIPDENS, christ1$CWDDENS, "norm",

> bandwidth = 200)

> lines(christ.kern, lty = 2)

> par(opar)

Confidence ellipses - `matlines()`

Confidence bands and ellipses can be added to a plot using the lines function. However, the matlines() function, along with the similar matplot() and matpoints() functions plot multiple columns of matrices against one another, thereby providing a convenient means to plot predicted trends and confidence intervals in a single statement.

Confidence bands are added by using the value(s) returned by a predict() function as the second argument to the matlines() function.

# this example fits loess smoother and kernel smoothers through the data

> opar <- par(mar = c(4, 5, 1, 1))

> plot(CWDDENS ~ RIPDENS, data = christ1)

> christ.lm <- lm(CWDDENS ~ RIPDENS, data = christ1)

> xs <- with(christ1, seq(min(RIPDENS), max(RIPDENS), l = 1000))

> matlines(xs, predict(christ.lm, data.frame(RIPDENS = xs), interval = "confidence"),

> lty = c(1, 2, 2), col = 1)

> par(opar)

Exporting graphics - graphical devices

Graphics can also be written to several graphical file formats via specific graphics devices which oversee the conversion of graphical commands into actual graphical elements. In order to write graphics to a file, an appropriate graphics device must first be `opened'. A graphics device is opened by issuing one of the device functions listed below and essentially establishes the devices global parameters and readies the device stream for input. Opening such a device also creates (or overwrites) the nominated file.

As graphical commands are issued, the input stream is evaluated and accumulated. The file is only guaranteed to be fully written to disk when the device is closed via the dev.off() (close device) function.

Note that as the capabilities and default global parameters of different devices differ substantially, some graphical elements may appear differently on different devices. This is particularly true of dimensions, locations, fonts and colors.

By default, R uses the window() graphical device (X11() in UNIX/Linux and typically quartz() in MacOSX), which provides a representation of graphics on the screen within the R application. However, it is often necessary to produce graphics that can be printed or used within other applications. This is achieved by starting an alternative device (such as a graphics file) driver, redirecting graphical commands to this alternative device, and finally completing the process by closing the alternative device driver. The device driver is responsible for converting the graphical command(s) into a format that is appropriate for that sort of device.

Most installations of R come complete with a number of alternative graphics devices, each of which have their own set of options. A list of graphics devices available on your installation can be obtained by examining the Devices help file after issuing the following command:

> `?`(Devices)

This will bring up a help file listing all the devices available on your system along with pointers to additional information about the capabilities of each device.

Device	Example of use	Comments
Screen devices
X11 (Linux)		device units are inches, specific device type.
windows (Windows)		device units are inches.
quartz (Mac OSX)		device units are inches.
File devices
jpeg	# default dimensions in pixels	dimension units can be "px","in", "cm", "mm". Quality controls compression.
png	# default dimensions in pixels	dimension units can be "px","in", "cm", "mm". Resolution.
postscript		device units are inches when used with `paper='special'`. Portrait orientation. Font family.
pdf		device units are inches. Font family.

Whilst there are a greater variety of devices and options than demonstrated in the table above, the ones listed are the most commonly used. Files will be created in the current working directory. The full capabilities (options) of a specific device on your system can be queried by entering the name of the device proceeded by a question mark.

> `?`(pdf)

Multiple graphical devices

It is possible to have multiple graphical devices (of the same or different type) open simultaneously, thereby enabling multiple graphics to be viewed and/or modified concurrently. However, only one device can be active (receptive to plotting commands) at a time. Once a device has been opened (see section \ref{graphics:export}), the device object is given an automatically iterated reference number in the range of 1 to 63. Device 1 will always be a null device that cannot accept plotting commands and is essentially just a placeholder for the device counter.

The set of functions for managing multiple devices are described in the following Table.

Function	Description	Example
dev.list()	Returns the numbers of open devices (with device types as column headings).	X11 X11 2 3
dev.cur()	Returns the number (and name) of the currently active device.	X11 3
dev.next()	Returns the number (and name) of the next available device after the device specified by the `which=` argument (after current if `which=` absent).	X11 2
dev.pred()	Returns the number (and name) of the previous available device after the device specified by the `which=` argument (before current if `which=` absent).	X11 2
dev.set()	Makes the device specified by the `which=` argument the currently active device and returns the number (and name) of this device. If `which=` argument absent, it is set to the next device.	X11 2
dev.copy(which=3)	Copies the graphic on one device to the third device (device specified by the `which=` argument)	X11 3
dev.copy(device=pdf,...)	Copies the graphic on one device to a named device type (device specified by the `device=` argument). Other options can be supplied to control device sizes etc.	X11 3
dev.off()	Closes the device specified by the `which=` argument (or current device if `which=` argument absent), makes the next device active and returns the number (and name) of this device.	X11 3

The latest version of an R installation binary (or source code) can be downloaded from one of the Comprehensive R Archive Network (or CRAN) mirrors. Having selected one of the (Australian) mirrors, follow one of the sets of instructions below (depending on your operating system).

Windows

From the list of precompiled binaries, select 'Windows'
Select the 'base' subdirectory
Click on 'Download R_2.XX.X for Windows' (where XX.X is the current version number and release).
Depending on the version of Windows you are running (XP, Vista or Windows 7), run (execute) this downloaded install binary. Note, if you are running Vista, you should execute this binary as Administrator. This ensures that R is installed in the system area (c:\Program Files) rather than in the user space which can lead to issues.
Under Vista or Windows 7, when warned of either 'unidentified publisher' or 'unknown publisher' respectively, elect to allow or continue
The installer is fairly straight forward and will guide you through the installation. The default options are typically adequate.
Startup menus and a desktop icon will be generated

MacOSX

From the list of precompiled binaries, select 'MacOS X'
Click on 'R-2.XX.X.pkg' (where XX.X is the current version number and release) to download the installer package. Note, this is for MacOS X 10.5 (Leopard) and 10.6 (Snow Leapard) only. For earlier versions of MacOSX, you will need to scroll down to the link to 'old'
Run the installer (double click on the image in the new finder window)
If you are not already logged in as Administrator, you will be prompted for the Administrator password
The installer is fairly straight forward and will guide you through the installation. The default options are typically adequate.

Linux

From the list of precompiled binaries, select 'Linux'
R is available in range of pre-compiled binaries for Debian (Ubuntu), Red Hat and SuSE based distributions
Quite frankly, if you are a linux user, you do not require instructions!

Basic Syntax

The R environment and command line

A first look at R

Double click on the RGui icon on the desktop

Click the START button from the Windows task bar
Click Program Files
Click R2.XX.X (where XX.X is the version and release numbers)
Click RGui (Gui stands for Graphics User Interface)

Upon opening R, you are presented with the R Console along with the command prompt (>). R is a command driven application (as opposed to a 'point-and-click' application) and despite the steep learning curve, there are many very good reasons for this.

Commands that you type are evaluated once the Enter key has been pressed

Enter the following command at the command prompt (>);

> 5 + 1

[1] 6

This evaluates the command five plus one and returns the result (six).. The [1] before the 6 indicates that the object immediately to its right is the first element in the returned object. In this case there is only one object returned. However, when a large set of objects (e.g. numbers) are returned, each row will start with an index number thereby making it easier to count through the elements.

Important definitions

Object

As an object oriented language, everything in R is an object. Data, functions even output are objects.

Vector

A collection of one or more objects of the same type (e.g. all numbers or all characters).

Function

A set of instructions carried out on one or more objects. Functions are typically wrappers for a sequence of instructions that perform specific and common tasks.

Parameter

The kind of information passed to a function.

Argument

The specific information passed to a function.

Operator

A symbol that has a pre-defined meaning. Familiar operators include + - * and /.

Assignment operators
<-	Assigning a name to an object
=	Used when defining and specifying function arguments
Logical operators (return `TRUE` or `FALSE`)
	Less than
	Greater than
<=	Less than or equal
>=	Greater than or equal
==	Is the left hand side equal to the right hand side (a query)
!=	Is the left hand side NOT equal to the right hand side (a query)
&&	Are BOTH left hand and right hand conditions TRUE
\|\|	Are EITHER the left hand OR right hand conditions TRUE

Expressions, Assignment and Arithmetic

Instead of evaluating a statement and printing the result directly to the console, the results of evaluations can be stored in an object via a process called 'Assignment'. Assignment assigns a name to an object and stores the result of an evaluation in that object. The contents of an object can be viewed (printed) by typing the name of the object at the command prompt and hitting Enter.

> VAR1 <- 2 + 3

> VAR1

[1] 5

A single command (statement) can spread over multiple lines. If the Enter key is pressed before R considers the statement complete, the next line in the console will begin with the prompt + indicating that the statement is not complete.

> VAR2 <- 2 + 3

> VAR2

[1] 5

When the contents of an object are numbers, standard arithmetic applies;

> VAR2 - 1

[1] 4

> ANS1 <- VAR1 * VAR2

> ANS1

[1] 25

Objects can be concatenated (joined together) to create objects with multiple entries. Object concatenation is performed using the c() function.

> c(1, 2, 6)

[1] 1 2 6

> c(VAR1, VAR2)

[1] 5 25

Sessions and Workspaces

A number of objects have been created in the current session (a session encapsulates all the activity since the current instance of the R application was started). To review the names of all of the objects in the users current workspace (storage of user created objects);

> ls()

[1] "a" "ANS1" "area"

[4] "b" "base" "c"

[7] "cc" "christ1" "christ.kern"

[10] "christ.lm" "christ.loess" "code"

[13] "ColumnSize" "draw.cell" "draw.sample.cell"

[16] "draw.title" "draw.vf.cell" "face"

[19] "fam" "fi" "ft"

[22] "get.c" "GetColorHexAndDecimal" "get.r"

[25] "HexAndDec" "HTMLCommand" "i"

[28] "ii" "index" "j"

[31] "jj" "k" "lay"

[34] "lm.large" "lm.small" "m"

[37] "make.table" "n" "nc"

[40] "nr" "oldpar" "oldparameters"

[43] "opar" "page" "PerColumn"

[46] "PerPage" "printVerbatim" "remaining"

[49] "Routput" "RweaveHTML1" "RweaveHTMLRuncode"

[52] "s" "save" "SetTextContrastColor"

[55] "ShowChars" "ShowPch" "SOUTH"

[58] "string" "SweaveSyntaxHTML" "temp"

[61] "TextContrastColor" "tf" "VAR1"

[64] "VAR2" "wb.xtab" "wd"

[67] "X" "xs" "Y"

[70] "Y1" "Y2"

From the above output, ignore the elements 'printVerbatim', 'Routput' and 'SweaveSyntaxHTML' - they are objects that are involved in the production of this web-page! Note, the [5] indicating that the second row of output starts with the fifth element of the output.

You can also refine the scope of the ls() function to search for object names that match a pattern;

> ls(pat = "VAR")

[1] "VAR1" "VAR2"

> ls(pat = "A*1")

[1] "ANS1" "christ1" "RweaveHTML1" "VAR1" "Y1"

The longer the session is running, the more objects will be created resulting in a very cluttered workspace. Unneeded objects can be removed using the rm() function;

> rm(VAR1, VAR2)

> rm(list = ls())

Current working directory

The R working directory (location from which files/data are read and written) is by default, the location of the R executable (or execution path in Linux). The current working directory can be reviewed and changed (for the session) using the getwd() function and setwd() function respectively.

> setwd("~/Documents/")

> getwd()

/home/murray/Documents

[1] "/home/murray/Documents"

Quitting elegantly

To quit R, issue the following command; Note in Windows and MacOSX, the application can also be terminated using the standard Exiting protocols.

> q()

You will then be asked whether or not you wish to save the current workspace. If you do, enter 'Y' otherwise enter 'N'. Unless you have a very good reason to save the workspace, I would suggest that you do not. A workspace generated in a typical session will have numerous poorly named objects (objects created to temporarily store information whilst testing). Next time R starts, it could restore this workspace thereby starting with a cluttered workspace, but becoming a potential source of confusion if you inadvertently refer to an object stored during a previous session.

Getting help

There are numerous ways of seeking help on R syntax and functions (the following all ways of finding information about a function that calculates the mean of a vector).

Providing the name of the function as an argument to the help(mean)function
> help(mean)
Typing the name of the function preceded by a '?'
> ?mean
To run the examples within the standard help files, use the example() function
> example(mean)
Some packages include demonstrations that showcase their features and use cases. The demo() function provides a user-friendly way to access these demonstrations. For example, to respectively get an overview of the basic graphical procedures in R and get a list of available demonstrations;
> demo(graphics) #run the graphics demo
> demo() #list all demos available on your system
Note in the above example everything following the # (comment character) is ignored. This provides a way of including comments.
If you don't know the exact name of the function, the apropos() function is useful as it returns the name of all objects from the current search list that match a specific pattern;

[1] "colMeans"           "frameApply"         "influence.measures"
[4] "kmeans"             "mean"               "mean.data.frame"
[7] "mean.Date"          "mean.default"       "mean.difftime"
[10] "mean.POSIXct"       "mean.POSIXlt"       "plotmeans"
[13] "rowMeans"           "runmean"            "weighted.mean"
If you have no idea what the function is called, the help.search() and help.start() functions search through the regular manuals and the local HTML manuals (via a web browser) respectively for specific terms;
> help.search('mean') #search the local R manuals
> help.start() #search the local HTML R manuals

Functions

As a wrapper for a collection of commands used together to perform a task, functions provide a convenient way of interacting with all of these commands in sequence. Most functions require one or more inputs (arguments), and while a particular function can have multiple arguments, not all are necessarily required (some could have default values).

Consider the seq() function, which generates a sequence of values (a vector) according to the values of the arguments. This function has the following definition;

function (from=1, to=1, by=((to - from)/(length.out - 1)), length.out=NULL, along.with=NULL, ...)

If the function is called without any arguments (e.g. seq()), it will return a single number 1. Using the default arguments for the function, it returns a vector starting at 1 (from=1), going up to 1 (to=1) and thus having a length of 1.
We can alter this behavior by specifically providing values for the named arguments. The following generates a sequence of numbers from 2 to 10 incrementing by 1 (default);

> seq(from = 2, to = 10)

[1] 2 3 4 5 6 7 8 9 10

The following generates a sequence of numbers from 2 to 10 incrementing by 2;

> seq(from = 2, to = 10, by = 2)

[1] 2 4 6 8 10

Alternatively, instead of manipulating the increment space of the sequence, we could specify the desired length of the sequence;

> seq(from = 2, to = 10, length.out = 3)

[1] 2 6 10

Named arguments need not include the full name of the parameter, so long as it is unambiguous which parameter is being referred to. For example, length.out could be shortened to just l since there are no other parameters of this function that start with 'l';

> seq(from = 2, to = 10, l = 4)

[1] 2.000000 4.666667 7.333333 10.000000

Parameters can also be specified as unnamed arguments provided they are in the order specified in the function definition. For example to generate a sequence of numbers from 2 to 10 incrementing by 3;

> seq(2, 10, 2)

[1] 2 4 6 8 10

Named and unnamed arguments can be mixed, just remember the above rules about parameter order and unambiguous names;

> seq(2, 10, l = 4)

[1] 2.000000 4.666667 7.333333 10.000000

Data Types

Vectors

Vectors are a collection of one or more entries (values) of the same type (class) and are the basic storage unit in R. Vectors are one-dimensional arrays (have a single dimension - length) and can be thought of as a single column of data. Each entry in a vector has a unique index (like a row number) to enable reference to particular entries in the vector.

The `c()` function

The c() function concatenates values together into a vector. To create a vector with the numbers 1, 4, 7, 21;

> c(1, 4, 7, 21)

[1] 1 4 7 21

As an example, we could store the temperature recorded at 10 sites;

> TEMPERATURE <- c(36.1, 30.6, 31, 36.3, 39.9, 6.5, 11.2, 12.8,

> 9.7, 15.9)

> TEMPERATURE

[1] 36.1 30.6 31.0 36.3 39.9 6.5 11.2 12.8 9.7 15.9

To create a vector with the words 'Fish', 'Rock', 'Tree', 'Git';

> c("Fish", "Rock", "Tree", "Git")

[1] "Fish" "Rock" "Tree" "Git"

Regular or patterned sequences

We have already seen the use of the seq() function to create sequences of entries.

Sequences of repeated entries are supported with the rep() function;

> rep(4, 5)

[1] 4 4 4 4 4

> rep("Fish", 5)

[1] "Fish" "Fish" "Fish" "Fish" "Fish"

The `paste()` function

To create a sequence of quadrat labels we could use the c() function as illustrated above,e.g.

> QUADRATS <- c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8",

> "Q9", "Q10")

> QUADRATS

[1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q9" "Q10"

A more elegant way of doing this is to use the paste() function;

> QUADRATS <- paste("Q", 1:10, sep = "")

> QUADRATS

[1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q9" "Q10"

This can be useful for naming vector elements. For example, we could use the names() function to name the elements of the temperature variable according to the quadrat labels.

> names(TEMPERATURE) <- QUADRATS

> TEMPERATURE

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

36.1 30.6 31.0 36.3 39.9 6.5 11.2 12.8 9.7 15.9

Vector Classes

Vector Class	Example
`integer` (whole numbers)	`> 2:4` `[1] 2 3 4` `> c(1,3,9)` `[1] 1 3 9`
`numeric` (real numbers)	`> c(8.4, 2.1)` `[1] 8.4 2.1`
`character` (letters)	`> c('A', 'ABC')` `[1] "A" "ABC"`
`logical` (TRUE or FALSE)	`> c(2:4)==3` `[1] FALSE TRUE FALSE`

Factors

Factors are more than a vector of characters. Factors have additional properties that are utilized during statistical analyses and graphical procedures. To illustrate the difference, we will create a vector to represent a categorical variable indicating the level of shading applied to 10 quadrats. Firstly, we will create a character vector;

> SHADE <- rep(c("no", "full"), each = 5)

> SHADE

[1] "no" "no" "no" "no" "no" "full" "full" "full" "full" "full"

Now we convert this into a factor;

> SHADE <- factor(SHADE)

> SHADE

[1] no no no no no full full full full full

Levels: full no

Notice the additional property (Levels) at the end of the output. Notice also that unless specified otherwise, the levels are ordered alphabetically. Whilst this does not impact on analyses, it does effect interpretations and graphical displays. If the alphabetical ordering does not reflect the natural order of the data, it is best to reorder the levels whilst defining the factor;

> SHADE <- factor(SHADE, levels = c("no", "full"))

> SHADE

[1] no no no no no full full full full full

Levels: no full

A more convenient way to create a balanced (equal number of replicates) factor is to use the gl() function. To create the shading factor from above;

> SHADE <- gl(2, 5, 10, c("no", "full"))

> SHADE

[1] no no no no no full full full full full

Levels: no full

Matrices

Matrices have two dimensions (length and width). The entries (which must be all of the same type - class) are in rows and columns.

We could arrange the vector of shading into two columns;

> matrix(SHADE, nrow = 5)

[,1] [,2]

[1,] "no" "full"

[2,] "no" "full"

[3,] "no" "full"

[4,] "no" "full"

[5,] "no" "full"

As another example, we could store the X,Y coordinates for five quadrats within a grid. We start by generating separate vectors to represent the X and Y coordinates and then we bind them together using the cbind() function;

> X <- c(16.92, 24.03, 7.61, 15.49, 11.77)

> Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)

> XY <- cbind(X, Y)

> XY

X Y

[1,] 16.92 8.37

[2,] 24.03 12.93

[3,] 7.61 16.65

[4,] 15.49 12.20

[5,] 11.77 13.12

We could even alter the row names;

> rownames(XY) <- LETTERS[1:5]

> XY

X Y

A 16.92 8.37

B 24.03 12.93

C 7.61 16.65

D 15.49 12.20

E 11.77 13.12

Lists

Lists provide a way to group together multiple objects of different type. For example, whilst the contents of any single vector or matrix must all be of the one type (e.g. all numeric or all character) a list can contain a vector or numerics and a matrix or characters. Furthermore, the objects contained in a list do not need to be of the same lengths (c.f data frames). The output of most analyses are stored as lists.

As an example, we could group together the previously created isolated vectors and matrices into a single object that encapsulates the entire experiment;

> EXPERIMENT <- list(QUADRATS = QUADRATS, COORDINATES = XY, SHADE = SHADE,

> TEMPERATURE = TEMPERATURE)

> EXPERIMENT

$QUADRATS

[1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q9" "Q10"

$COORDINATES

X Y

A 16.92 8.37

B 24.03 12.93

C 7.61 16.65

D 15.49 12.20

E 11.77 13.12

$SHADE

[1] no no no no no full full full full full

Levels: no full

$TEMPERATURE

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

36.1 30.6 31.0 36.3 39.9 6.5 11.2 12.8 9.7 15.9

Object Manipulation

Indexing

Indexing is the means by which data are filtered (subsetted) to include and exclude certain entries.

Vector indexing

Subsets of vectors are produced by appending an index vector (inclosed in square brackets []) to a vector name. There are four common forms of vector indexing used to extract a subset of vectors

Vector of positive integers - a set of integers that indicate which elements of the vector should be included;

> TEMPERATURE[2]

30.6

> TEMPERATURE[2:5]

Q2 Q3 Q4 Q5

30.6 31.0 36.3 39.9

> TEMPERATURE[c(1, 5, 6, 9)]

Q1 Q5 Q6 Q9

36.1 39.9 6.5 9.7

Vector of negative integers - a set of integers that indicate which elements of the vector should be excluded;

> TEMPERATURE[-2]

Q1 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

36.1 31.0 36.3 39.9 6.5 11.2 12.8 9.7 15.9

> TEMPERATURE[c(1, 5, 6, 9) * -1]

Q2 Q3 Q4 Q7 Q8 Q10

30.6 31.0 36.3 11.2 12.8 15.9

Vector of character strings (referencing names) - for vectors whose elements have been named, a vector of names can be used to select elements to include;

> TEMPERATURE["Q1"]

36.1

> TEMPERATURE[c("Q1", "Q4")]

Q1 Q4

36.1 36.3

Vector of logical values - a vector of logical values (TRUE or FALSE) the same length as the vector being subsetted. Entries corresponding to a logical TRUE are included, FALSE are excluded;

> TEMPERATURE[TEMPERATURE < 15]

Q6 Q7 Q8 Q9

6.5 11.2 12.8 9.7

> TEMPERATURE[SHADE == "no"]

Q1 Q2 Q3 Q4 Q5

36.1 30.6 31.0 36.3 39.9

> TEMPERATURE[TEMPERATURE < 34 & SHADE == "no"]

Q2 Q3

30.6 31.0

> TEMPERATURE[TEMPERATURE < 10 | SHADE == "no"]

Q1 Q2 Q3 Q4 Q5 Q6 Q9

36.1 30.6 31.0 36.3 39.9 6.5 9.7

Matrix indexing

Similar to vectors, matrices can be indexed using positive integers, negative integers, character strings and logical vectors. However, whereas vectors have a single dimension (length), matrices have two dimensions (length and width). Hence, indexing needs to reflect this. It is necessary to specify both the row and column number. Matrix indexing takes of the form of [row.indices, col.indices] where row.indices and col.indices respectively represent sequences of row and column indices. If a row or column index sequence is omitted, it is interpreted as the entire row or column respectively.

> XY[3, 2]

[1] 16.65

> XY[3, ]

X Y

7.61 16.65

> XY[, -2]

A B C D E

16.92 24.03 7.61 15.49 11.77

> XY["A", 1:2]

X Y

16.92 8.37

> XY[, "X"]

A B C D E

16.92 24.03 7.61 15.49 11.77

> XY[XY[, "X"] > 12, ]

X Y

A 16.92 8.37

B 24.03 12.93

D 15.49 12.20

Sorting

The sort() function is used to sort vector entries in increasing order.

> sort(TEMPERATURE)

Q6 Q9 Q7 Q8 Q10 Q2 Q3 Q1 Q4 Q5

6.5 9.7 11.2 12.8 15.9 30.6 31.0 36.1 36.3 39.9

The order() function is used to get the position of each entry in a vector if it were sorted.

> order(TEMPERATURE)

[1] 6 9 7 8 10 2 3 1 4 5

The rank() function is used to get the ranking of each entry in a vector if it were sorted.

> rank(TEMPERATURE)

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

8 6 7 9 10 1 3 4 2 5

Pivot tables

The apply() family of functions applies a function to the margins (1=row margins, 2=column margins) of a matrix.

Lets say we wanted to represent the abundance of three species of fish in three types

> FISH <- cbind(SpA = c(25, 6, 3), SpB = c(12, 12, 3), SpC = c(7,

> 2, 19))

> rownames(FISH) <- paste("Habitat", 1:3, sep = ".")

> FISH

SpA SpB SpC

Habitat.1 25 12 7

Habitat.2 6 12 2

Habitat.3 3 3 19

We could now calculate the column means (mean abundance of each species across habitats);

> apply(FISH, 2, mean)

SpA SpB SpC

11.333333 9.000000 9.333333

The tapply() function applies a function to a vector separately for each level of a factorial variable. For example, if we wanted to calculate the mean temperature for each level of the shade variable;

> tapply(TEMPERATURE, SHADE, mean)

no full

34.78 11.22

R Editors

Notepad++

Features

Syntax highlighting - text colored according to syntax rules
Code folding - mainly useful when writing functions
Supports huge range of languages (not just R)
Bracket matching
Submit code to R line-by-line or selected lines (F8 key)
Windows only

Installation and setup

Download Notepad ++ from here
Select 'Download the current version'
Select 'Notepad++ v5.9 Installer'
Click 'Run' and when prompted verify the publisher
Use all of the defaults while installing
Similarly, download and install NppToR (acts as a conduit between Notepad++ and R) from here
Start Notepad++ (and R) by right clicking on the NppToR icon in the Windows Task Tray and selecting 'Start Notepad++'

RStudio

Features

Syntax highlighting - text colored according to syntax rules
Specifically designed for R
Bracket matching
Console integrated
Cross platform
Submit (Run) code line-by-line or selected lines (Cntr-ENTER)
Submit (Run) all code (Cntr-Shift-ENTER)
Auto-complete (Cntr-SPACE)
Parameter prompting and integrated help (Cntr-SPACE)
Live workspace
Live command history
Fully integrated File manager
Intuitive and user-friendly package manager
Integrated R help browser

Installation and setup

Download Notepad ++ from here
Select the installation package recommended for your system (e.g. 'RStudio Desktop 0.93.84 - Windows XP/Vista/7')
Click 'Run' and when prompted verify the publisher
Use all of the defaults while installing
Start RStudio from the Windows Start Menu

HowTo's

Start a new script
1. File->New->Rscript

Emacs

MMM - nice choice.... Emacs can do everything. Its a super editor. Actually, it is more than an editor, it is an operating system in its own rite. To many it is THE editor - actually to many it is the Messiah. However, with power comes the learning curve (which I guess you are experiencing with R). Should you want to explore emacs as an R editor - perhaps come and see me (Murray).

Feature	Notepad ++	RStudio	Emacs
Platform	Win only	Win,Mac,Linux	Win,Mac,Linux
Syntax Highlighting	Yes	Yes	Yes
Bracket matching	Yes		Yes
Integrated Console	No	Yes	Yes
Auto-complete	Yes	Yes	Yes
Parameter prompting and integrated help	No	Yes	No
Code folding	Yes	No	Yes

More on layout

Exporting graphics - graphical devices

Current working directory

Quitting elegantly

Getting help

The c() function

Regular or patterned sequences

The paste() function

Factors

Vector indexing

Matrix indexing

Pivot tables

Features

Installation and setup

Features

Installation and setup

HowTo's

The `c()` function

The `paste()` function