Traditional R Graphics
19 Oct 2011
This Workshop has been thrown together a little hastily and is therefore not very well organized - sorry! Graphical features are demonstrated either via tables of properties or as clickable graphics that reveal the required R code. Click on a graphic to reveal/toggle the source code.
High level plotting functions
Most graphics in R are performed by issuing a series (one or more) graphical statements that sequentially add additional features to a graphical device. A graphical device is any device capable of receiving and interpreting graphical statements. Common examples
- A window within R
- A graphics file (such as a pdf, jpg, png etc)
I will cover more on graphical devices here
The plot() function
The plot() function is an overloaded function, the output of which depends on the class of object(s) taken as input. That said, the most common use of the plot() function is to prepare a plotting device (define the axes limits etc) and to apply very basic plot characteristics (axes, points, labels etc) to the device.
The type parameter
The type parameter controls how the data points are represented on the graph.
|
|
|
|
|
|
|
The xlim and ylim parameters
These parameters control the range or span of the axes.
# Same as the default
|
# Minimum of zero, maximum of 10
|
The xlab and ylab parameters
These define the axes titles.
# Blank - no axes title
|
# Custom axis title
|
The axes and ann parameters
These are logical parameters that indicates whether (=TRUE) or not (=FALSE) to plot axes and axes titles respectively.
# Suppress axes
|
# Suppress axes titles (including the main title)
|
The log parameter
These are logical parameters that indicates which (if any) of the axes should be plotted on a logarithmic scale.
# log x-axis
|
# log y-axis
|
# log both axis
|
I now present a selection of commonly used high-level plotting functions. These functions typically provide quick and convenient graphical representations primarily for data exploration and diagnostics. As such, the aesthetics of these graphics is of little concern.
The hist() function
For this example, we will use the rivers dataset which provides the lengths (in miles) of 141 'major' rivers
in North America.
The boxplot() function
# Boxplot of river lengths
|
# Boxplot of river lengths
|
# Boxplot of the number of breaks against wool type
|
The violin() function
Violin plots are an alternative to boxplots. Arguably, these plots hide less of the underlying data than do boxplots.
The scatterplot() function
As we have seen, the plot() function already creates scatterplots.
In the spirit of exploratory data analysis, we will illustrate the scatterplot() function in the
car package. In addition to plotting the raw data, the scatterplot() function also includes
a number of useful regression diagnostics including marginal boxplots, the line of best fit (fitted regression line)
and a lowess smoother.
Scatterplot matrices
Scatterplot matrices are an extension of scatterplots in which each variable is plotted against each other variable in a gridded arrangement. They are useful for visually exploring the relationships amongst multiple variables simultaneously.Interaction plots
# Interaction plot of tooth length against vitamin dose and supplement delivery method
|
# Interaction plot of tooth length against vitamin dose and supplement delivery method
|
Mosaic and association plots
Mosaic and association plots are both conditioning plots that represent contingency table frequencies as a matrix of rectangles, the dimensions of which are proportional to the observed frequencies of each cross-classification. Furthermore, shading reflects the magnitudes of the Pearson's residuals. The main difference between mosaic and association plots is that the rectangles in association plots also indicate the polarity of the differences between observed and expected frequencies.
# Indicates (for example) that there were more breaks of wool type A under low tension and type B under medium tension than would be expected in the absence of an association between wool type and tension.
# Indicates (for example) that there were more breaks of wool type A under low tension and type B under medium tension than would be expected in the absence of an association between wool type and tension.
(Partial) effects plots
Graphical parameters - more control
Graphical parameters apply to an entire graphical device (are global) and provide additional aesthetics control over many of the characteristics of all the high and low level plotting functions applied in that device. That is, rather than specify a particular setting (such as font size) for each graphical function, the global parameters can be specified once and apply across all functions (although they can be individually overridden by any subsequent high or low level plotting function.
Graphical parameters can also control the layout, margins and spacing within a graphical device.
Global graphical parameters are specified in the par() function. When the par() function is used to alter a global graphical setting, it returns a list containing the previous settings (the settings that applied before the current change(s) were made) that applied to any of the altered parameters. Using this list as an argument to a subsequent par() function thereby restore the previous graphical parameters on the current device.
# from the bottom, left, top and right of the figure boundary.
# Then print out the original settings for the altered parameters.
Plot dimensions and layout parameters
![]() |
![]() |
Parameter | Value | Description |
---|---|---|
din,fin,pin | =c(width,height) | Dimensions (width and height) of the device, figure and plotting regions (in inches) |
fig | =c(left,right,bottom,top) | Coordinates of the figure region within the device. Coordinates expressed as a fraction of the device region. |
mai,mar | =c(bottom,left,top,right) | Size of each of the four figure margins in inches and lines of text (relative to current font size). |
mfg | =c(row,column) | Position of the currently active figure within a grid of figures defined by either mfcol or mfrow. |
mfcol,mfrow | =c(rows,columns) | Number of rows and columns in a multi-figure grid. |
new | =TRUE or =FALSE | Indicates whether to treat the current figure region as a new frame (and thus begin a new plot over the top of the previous plot (TRUE) or to allow a new high level plotting function to clear the figure region first (FALSE). |
oma,omd,omi | =c(bottom,left,top,right) | Size of each of the four outer margins in lines of text (relative to current font size), inches and as a fraction of the device region dimensions |
plt | =c(left,right,bottom,top) | Coordinates of the plotting region expressed as a fraction of the device region. |
pty | ="s" or "m" | Type of plotting region within the figure region. Is the plotting region a square (="s") or is it maximized (="m") to fit within the shape of the figure region. |
usr | =c(left,right,bottom,top) | Coordinates of the plotting region corresponding to the axes limits of the plot. |
Altered margins | Multiple figures | Figures within figures |
---|---|---|
# Boxplot of the number of breaks against wool type with wider margins
|
# Boxplot of the number of breaks against wool type with wider margins
|
# Boxplot of the number of breaks against wool type with wider margins
|
More on layout
In addition to splitting a graphics device up into a matrix of figures with the mfrow and mfcol graphical parameters, it is also possible to specify the size and arrangement of figures in a matrix with the layout() function. However, unlike the mfrow/mfcol parameters, the layout function does not force each row to have the same number of columns and vice verse.
|
|
Axes characteristics
Parameter | Value | Description |
---|---|---|
ann,axes | =T or =F | High level plotting parameters that specify whether or not titles (main, sub and axes) and axes should be plotted. |
bty | ="o","l","7","c","u" or "]" | Single character whose upper case letter resembles the sides of the box or axes to be included with the plot. |
lab | =c(x,y,length) | Specifies the length and number of tickmarks on the x and y axes. |
las | =0, 1, 2 or 3 | Specifies the style of the axes tick labels. 0 = parallel to axes, 1 = horizontal, 2 = perpendicular to axes, 3= vertical. |
mgp | =c(title,labels,line) | Distance (in multiples of the height of a line of text) of the axis title, labels and line from the plot boundary. |
tck,tcl | =length | The length of tick marks as a fraction of the plot dimensions (tck) and as a fraction of the height of a line of text (tcl) |
xaxp,yaxp | =c(min,max,num) | Minimum, maximum and number of tick marks on the x and y axes |
xaxs,yaxs | ="r" or ="i" | Determines how the axes ranges are calculated. The "r" option results in ranges that extend 4% beyond the data ranges, whereas the "i" option uses the raw data ranges. |
xlog,ylog | =FALSE or =TRUE | Specifies whether or not the x and y axes should be plotted on a (natural) logarithmic scale. |
xpd | =FALSE, =TRUE or ='NA' | Specifies whether plotting is clipped to the plotting (=FALSE), figure (=TRUE) or device (='N') region |
Character sizes
Rather than specify the exact point size of each set of characters in a figure, R defines a base size (by default, 12pt), and thereafter, character sizes of elements are defined relative to this base size. For example, if you wanted a label to be in 6pt, this would be 0.5 (half) the base point size. If you wanted the font to be 18pt, this would be 1.5 times the base size. Hence, character sizes are defined via character expansion (cex) factors.
The advantage of this system is the font sizes are scalable. That is, if you later decide to increase the size of a figure also want to increase the font sizes, you only need to alter the base point size for that device. I will discuss more of graphical devices here.
Parameter | Applies to |
---|---|
cex | All subsequent characters |
cex.axis | Axis tick labels |
cex.lab | Axes titles |
cex.main | Main plot title |
cex.sub | Plot sub-titles |
Axes titles | Tick mark labels | Plotting character |
---|---|---|
|
|
|
Line characteristics
Parameter | Description | Examples |
---|---|---|
lty | The type of line. Specified as either a single integer in the range of 1 to 6 (for predefined line types) or as a string of 2 or 4 numbers that define the relative lengths of dashes and spaces within a repeated sequence. | ![]() |
lwd | The thickness of a line as a multiple of the default thickness (which is device specific). | ![]() |
lend | The line end style (square, butt or round). | ![]() |
ljoin | The line end style (square, butt or round). | ![]() |
Plotting character - pch
The plotting character (pch) can be of the following forms:
- a number from 1 to 25 corresponding to one of the 25 basic plotting symbols
- when used with font=5 (extended symbol font), Adobe symbol encoding can be specified. This encoding system uses integers between 1:128 and 160:254. In the Extended plotting characters figure below, the y-axis shows the first two digits of the Adobe symbol encoding, and the x-axis shows the third digit.
- a quoted keyboard printing character (letter, number or punctuation)
Basic plotting characters | Extended plotting characters - used with font=5 | |
---|---|---|
![]() |
To plot the heart symbol: .., pch=169, font=5,..
![]() |
Basic symbols | Tick mark labels | Plotting character |
---|---|---|
|
|
|
The size of plotting symbols is controlled by the character expansion (cex) parameter and the style of the of the lines that make up the plotting symbols is controlled by other line characteristics.
![](images/graphics-Ppch.png)
Fonts
The shape of text characters is controlled by the family (the typeface) and the font (the shape of the typeface). The families supported varies for each graphical device as do the names by which they are referred.
![](images/graphics-Fonts.png)
To get a list of the available font families for a specific device on your system,
issue a command whose name starts with the name of the device and ends with "Fonts".
For example, to query the available fonts for a pdf device on your system:
Different fonts can also be applied to each of the main plotting components (font.axis: axes labels, font.lab: axes titles, font.main: Main plot title and font.sub: plot sub-title).
|
|
Hershey (vector) fonts
R also supports Hershey (vector) fonts that greatly extend the range of characters and symbols available. In contrast to regular (bitmap) fonts that consist of a set of small images (one for each character of each style and size), vector fonts consist of the coordinates of each of the curves required to create the character. That is, vector fonts store the information on how to draw the character rather than store the character itself. Hershey fonts can therefore be scaled to any size without distortion. Unfortunately however, Hershey fonts cannot be combined with regular fonts in a single plotting statement and thus they cannot be easily incorporated into mathematical formulae.
![](images/graphics-Hershey1.png)
![](images/graphics-HersheySymbol.png)
![](images/graphics-HersheySymbol2.png)
![](images/graphics-HersheySymbol3.png)
![](images/graphics-HersheySymbol4.png)
![](images/graphics-HersheySymbol5.png)
![](images/graphics-HersheySymbol6.png)
Text orientation and justification
Parameter | Description | Examples |
---|---|---|
adj | Specifies the justification of a text string relative to the coordinates of its origin. A single number between 0 and 1 specifies horizontal justification. A vector of two numbers (=c(x,y)) indicates justification in horizontal and vertical directions. |
![]() |
crt,srt | Specifies the amount of rotation (in degrees) of single characters (crt) and strings (srt) |
![]() |
Colors
The color of most plotting elements is controlled by the col parameter. There are also separate parameters that control the color of each of the major components of a figure (col.axis: the axes tick labels, col.lab: the axes titles, col.main: the main plot title, col.sub: plot sub-titles) and when specified, take precedence over the col parameter. Two additional parameters, bg and fg can be used to control the color of the background and foreground (boxes and axes) respectively.
Here are a few of the ways in which colors can be specified
- by an index (numbers 0-8) to a small palette of eight colors (0 indicates the background color). The colors in this palette can be reviewed with the palette() (color palette) function
- by name. The names of the 657 defined colors can be reviewed with the colors() (color palette) function. The epitools package provides the colors.plot() (display palette) function which generates a graphic that displays a matrix of all the colors. When used with the locator=TRUE argument, a series left mouse clicks on the color squares, terminated by a right mouse click, will result in a matrix of corresponding color names.
- via one of the other built-in color palettes that essentially sets of colors within themes.
These palettes return n number of colors and the color transparency/opacity
is controlled via a alpha parameter (values between 0 and 1, where 1 is completely opaque).
- rainbow(n) - Red->Violet
- heat.colors(n) - White->Orange->Red
- terrain.colors(n) - White->Brown->Green
- topo.colors(n) - White->Brown->Green->Blue
- grey(n) - White->Black
- by direct specification of the red, green and blue components of the RGB spectrum as a character string in the form "#RRGGBB". This string consists of a # followed by a pair of hexadecimal digits in the range 00:FF for each component. For those devices supporting transparency, two additional digits can be added on the end of the hex code to indicate the degree of transparency/opacity (00: fully transparent, 99: fully opaque).
- via rgb(), hsv(), hcl() and col2rgb() also provide other ways to specify colors.
![](images/graphics-Col.png)
![](images/graphics-Col21.png)
![](images/graphics-Col22.png)
![](images/graphics-Col23.png)
![](images/graphics-Col24.png)
![](images/graphics-Col25.png)
![](images/graphics-Col26.png)
![](images/graphics-Col27.png)
Enhancing and customizing plots with low-level plotting functions
Having set up a plotting device (typically by calling a high level plotting function), additional graphical elements can be manually added to a plot via specific low-level plotting functions. The most aesthetically pleasing graphics are typically produced by preparing a blank plotting device (essentially defining the size, layout and axes limits), and then manually building up the desired features via low-level plotting functions.
In addition to their specific parameters, each of the following functions accept many of the graphical parameters. In the function definitions, these capabilities are represented by three consecutive dots (...). Technically, ... indicates that any supplied arguments that are not explicitly part of the definition of a function are passed on to the relevant underlying functions (in this case, par()).
Adding points - points()
Points can be added to a plot using the points(x, y, pch, ...) function. This function plots a plotting character (specified by the pch parameter) at the coordinates specified by the vectors x,y. Alternatively, the coordinates can be passed as a formula of the form, y~x
Adding text - text()
The text() function adds text strings (labels parameter) to the plot at the supplied coordinates (x,y) and is defined as:
Parameter | Description | Examples |
---|---|---|
pos | Simplified text justification that overrides the adj parameter. 1=below, 2=left, 3=above and 4=right. |
![]() |
offset | Offset used by pos as a fraction of the width of a character. |
![]() |
vfont | Provision for Hershey (vector) font specification (vfont=c(typeface, style). |
![]() |
Constructing character strings - paste()
The paste() function concatenates vectors together after converting each of the elements to characters. This is particularly useful for making labels and is equally useful in non-graphical applications. Paste has two other optional parameters (sep and collapse) which define extra character strings to be placed between strings joined. sep operates on joins between paired vector elements whereas collapse operates on joints of elements within a vector.
[1] "H:1" "M:2" "L:3"
|
[1] "H:M:L"
|
[1] "H-1:M-2:L-3"
|
![](images/graphics-pastePlot1.png)
Adding text to plot margins - mtext()
The mtext() function adds text (text) to the plot margins and is typically used to create fancy or additional axes titles. The mtext() function is defined as:
Parameter | Description | Examples |
---|---|---|
side | Specifies which margin the title should be plotted in. 1=bottom, 2=left, 3=top and 4=right. |
![]() |
line | Number of text lines out from the plot region into the margin to plot the marginal text. |
![]() |
outer | For multi-plot figure, if outer=TRUE, put the marginal text in the margin (if there is one). | |
at | Position along the axis (in user coordinates) of the text |
![]() |
adj,padj | Adjustment (justification) of the position of the marginal text parallel (adj) and perpendicular (padj) to the axis. Justification depends on the orientation of the text string and the margin (axis). |
![]() ![]() |
Adding a legend - legend()
The legend() function brings together a rich collection of plotting functions to produce highly customizable figure legends in a single call. A sense of the rich functionality of the legend function is reflected in Table table below and the function definition:
Parameter | Description | Examples |
---|---|---|
legend | A vector of strings or expressions to comprise the labels of the legend. | |
title | A string or expression for a title at the top of the legend |
![]() |
bty, box.lty, box.lwd |
The type ("o" or "n"), line thickness and line style of box framing the legend. |
![]() |
bg, text.col |
The colors used for the legend background and legend labels |
![]() |
horiz | Whether or not to produce a horizontal legend instead of a vertical legend. |
![]() |
ncol | The number of columns in which to arrange the legend labels. |
![]() |
cex | Character expansion for all elements of the legend relative to the plot cex graphical parameter. | |
Boxes | If any of the following parameters are set, the legend labels will be accompanied by boxes. | |
fill | Specifies the fill color of the boxes. A vector of colors will result in different fills. |
![]() |
angle, density |
Specifies the angle and number of lines that make up the stripy fill of boxes. Negative density values result in solid fills. |
![]() |
Points | If any of the following parameters are set, the legend labels will be accompanied by lines. | |
pch | Specifies the type of plotting character. |
![]() |
pt.cex, pt.lwd |
Specifies the character expansion and line width of the plotting characters. |
![]() |
col, pt.bg |
Specifies the foreground and background color of the plotting characters (and lines for col). |
![]() |
Lines | If any of the following parameters are set, the legend labels will be accompanied by lines. | |
lwd, lty |
Specifies the width and type of lines. |
![]() |
merge | Whether or not to merge points and lines. |
![]() |
More advanced text formatting
The text plotting functions described above (text(), mtext() and legend()) can also build plotting text from objects that constitute the R language itself. These are referred to as language objects and include:
- names - the names of objects
- expressions - unevaluated syntactically correct statements that could otherwise be evaluated at the command prompt
- calls - these are specific expressions that comprise of an unevaluated named function (complete with arguments)
Any language object passed as an argument to one of the text plotting functions described above (text(), mtext() and legend()) will be coerced into an expression and evaluated as a mathematical expression prior to plotting. In so doing, the text plotting functions will also apply TeX-like formatting (the extensive range of which can be sampled by issuing the demo(plotmath) command) where appropriate.
Hence, advanced text construction, formatting and plotting is thus achieved by skilled use of a variety of functions (described below) that assist in the creation of \textit{language objects} for passing to the text plotting functions.
Complex expressions - expression()
The expression function is used to build complex expressions that incorporate TeX-like mathematical formatting. Hence, the expression function is typically nested within one of the text plotting functions to plot complex combinations of characters and symbols.
# plot two series of random data
|
# plot two series of random data
|
Complex expressions - bquote()
The bquote() function generates a language object by converting the argument after first evaluating any objects wrapped in `.()'. This provides a way to produce text strings that combine mathematical formatting and the output statistical functions.
In the example below, note the required use of the tilde (~) character to allow a space between the words corr and coef. Alternatively, a space can be provided by the keyword phantom(char), where char is a character whose width is equal to the amount of space required. Had we have put a space between
Complex expressions - substitute()
Alternatively, for situations in which substitutions are required within non-genuine mathematical expressions (such as straight character strings), the substitute() function is useful.
Combinations of advanced text formatting functions
It is possible to produce virtually any text representation on an R plot, however, some representations require complex combinations of the above functions. Whilst, these functions are able to be nested within one another, the combinations often appear to behave counter-intuitively. Great understanding and consideration of the exact nuances of each of the functions is required in order to successfully master their combined effects. Nevertheless, the following scenarios should provide some appreciation of the value and uses of some of these combinations.
For example, the formula for calculating the mean of a sample
mu == frac(sum(y[i]),n) == .(meanY) .
Building such an expression is achieved by combining the bquote() \textit{function} with a paste() function.
The more observant and discerning reader may have noticed the y-axis label in the substitute() example above had a space between the μ and the word `mol'. Using just the expression() function, this was unavoidable. A more eligant solution would have been to employ a expression(paste()) combination.
# plot two series of random data
|
# plot two series of random data
|
Adding axes - axis()
Although most of the high-level plotting functions provide some control over axes construction (typically via graphical parameters), finer control over the individual axes is achieved by constructing each axis separately with the axis() function. The axis() function is defined as:
Parameter | Description | Examples |
---|---|---|
side | Simplifies which axis to construct. 1=bottom, 2=left, 3=top and 4=right. | |
at | Where the tick marks are to be drawn. Axis will span between minimum and maximum values supplied. |
![]() |
labels | Specifies the labels to draw at each tickmark.
|
![]() |
tick | Specifies whether or not (TRUE or FALSE) the axis line and tickmarks should be drawn |
![]() |
line | Specifies the number of text lines into the margin to place the axis (along with the tickmarks and labels). |
![]() |
pos | Specifies where along the perpendicular axis, the current axis should be drawn. |
![]() |
outer | Specifies whether or not (TRUE or FALSE) the axis should be drawn in the outer margin. | |
font | The font used for the tickmark labels. | |
lwd, lty, col |
Specifies the line width, style and color of the axis line and tickmarks. |
![]() |
hadj, padj |
Specifies the parallel and perpendicular adjustment of tick labels to the axis. Units of movement (for example) are padj=0: right or top, padj=1: left or bottom. Other values are multipliers of this justification. |
![]() |
Adding lines and shapes to a plot
There are a number of low-level plotting functions for plotting lines and shapes. Individually and collectively, they provide the tools to construct any custom graphic.
The following demonstrations will utilize a dataset by Christensen et al. (1996) that consists of course woody debris (CWD) measurements as well as a number of human impact/land use characteristics for riparian zones around freshwater lakes in North America.
Download Christensen data set
Straight lines - abline()}
The low-level plotting abline() function is used to fit straight lines with a given intercept (a) and gradient (b) or single values for horizontal (h) or vertical (v) lines. The function can also be passed a fitted linear model (reg) or coefficient vector from which it extracts the intercept and slope parameters.
The definition of the abline() function is:
Lines joining a succession of points - lines()}
The lines() function can be used to add lines between points and is particularly useful for adding multiple trends (or non-linear trends, see section on smoothers) through a data cloud. As with the points() function, the lines() function is a generic function whose actions depend on the type of objects passed as arguments. Notably, for simple coordinate vectors, the points() and lines() functions are virtually interchangeable (accept in the type of points they default to). Consequently, a more complex example involving the predict() (predicted values)|(} function (a function that predicts new values from fitted models) will be used to demonstrate the power of the lines function.
Assessing departures from linearity and homogeneity of variance can be assisted by fitting a linear (least squares regression) line through the data cloud.
Lines between pairs of points - segments()}
The segments \textit{function} draws straight lines between points ((x0,y0) and (x1,y1)). When each of the coordinates are given as vectors, multiple lines are drawn.
Assessing departures from linearity and homogeneity of variance can also be further assisted by adding lines to represent the residuals (segments that join observed and predicted responses for each predictor). This example also makes use of the with() \textit{function} which evaluates any expression or call (in this case the segments function) in the context of a particular data frame (christ) or other environment.
Arrows and connectors - arrows()}
The arrows() function builds on the segments function to add provisions for simple arrow heads. Furthermore, as the length, angle and end to which the arrow head applies are all controllable, the arrows() function is also particularly useful for annotating figures and creating flow diagrams. The function can also be useful for creating customized error bars (as demonstrated in the following example).
Arrows and connectors - arrows()
The rect() function draws rectangles from left-bottom, right-top coordinates that can be filled with solid or striped patterns (according to the line type, width, angle, density and color):
Smoothers
Smoothing functions can be useful additions to scatterplots, particularly for assessing (non)linearity and the nature of underlying trends. There are many different types of smoothers, including loess and lowess (locally weighted smoothers), kernel smoothers and splines.
Smoothers are added to a plot by first fitting the smoothing function (loess(), ksmooth()) to the data before plotting the values predicted by this function across the span of the data.
Confidence ellipses - matlines()
Confidence bands and ellipses can be added to a plot using the lines function. However, the matlines() function, along with the similar matplot() and matpoints() functions plot multiple columns of matrices against one another, thereby providing a convenient means to plot predicted trends and confidence intervals in a single statement.
Confidence bands are added by using the value(s) returned by a predict() function as the second argument to the matlines() function.
Exporting graphics - graphical devices
Graphics can also be written to several graphical file formats via specific graphics devices which oversee the conversion of graphical commands into actual graphical elements. In order to write graphics to a file, an appropriate graphics device must first be `opened'. A graphics device is opened by issuing one of the device functions listed below and essentially establishes the devices global parameters and readies the device stream for input. Opening such a device also creates (or overwrites) the nominated file.
As graphical commands are issued, the input stream is evaluated and accumulated. The file is only guaranteed to be fully written to disk when the device is closed via the dev.off() (close device) function.
Note that as the capabilities and default global parameters of different devices differ substantially, some graphical elements may appear differently on different devices. This is particularly true of dimensions, locations, fonts and colors.
By default, R uses the window() graphical device (X11() in UNIX/Linux and typically quartz() in MacOSX), which provides a representation of graphics on the screen within the R application. However, it is often necessary to produce graphics that can be printed or used within other applications. This is achieved by starting an alternative device (such as a graphics file) driver, redirecting graphical commands to this alternative device, and finally completing the process by closing the alternative device driver. The device driver is responsible for converting the graphical command(s) into a format that is appropriate for that sort of device.
Most installations of R come complete with a number of alternative graphics devices, each of which have their own set of options. A list of graphics devices available on your installation can be obtained by examining the Devices help file after issuing the following command:
This will bring up a help file listing all the devices available on your system along with pointers to additional information about the capabilities of each device.
Device | Example of use | Comments |
---|---|---|
Screen devices | ||
X11 (Linux) |
|
device units are inches, specific device type. |
windows (Windows) |
|
device units are inches. |
quartz (Mac OSX) |
|
device units are inches. |
File devices | ||
jpeg |
# default dimensions in pixels
|
dimension units can be "px","in", "cm", "mm". Quality controls compression. |
png |
# default dimensions in pixels
|
dimension units can be "px","in", "cm", "mm". Resolution. |
postscript |
|
device units are inches when used with paper='special'. Portrait orientation. Font family. |
|
device units are inches. Font family. |
Whilst there are a greater variety of devices and options than demonstrated in the table above, the ones listed are the most commonly used. Files will be created in the current working directory. The full capabilities (options) of a specific device on your system can be queried by entering the name of the device proceeded by a question mark.
Multiple graphical devices
It is possible to have multiple graphical devices (of the same or different type) open simultaneously, thereby enabling multiple graphics to be viewed and/or modified concurrently. However, only one device can be active (receptive to plotting commands) at a time. Once a device has been opened (see section \ref{graphics:export}), the device object is given an automatically iterated reference number in the range of 1 to 63. Device 1 will always be a null device that cannot accept plotting commands and is essentially just a placeholder for the device counter.
The set of functions for managing multiple devices are described in the following Table.
Function | Description | Example |
---|---|---|
dev.list() | Returns the numbers of open devices (with device types as column headings). | X11 X11 2 3 |
dev.cur() | Returns the number (and name) of the currently active device. | X11 3 |
dev.next() | Returns the number (and name) of the next available device after the device specified by the which= argument (after current if which= absent). | X11 2 |
dev.pred() | Returns the number (and name) of the previous available device after the device specified by the which= argument (before current if which= absent). | X11 2 |
dev.set() | Makes the device specified by the which= argument the currently active device and returns the number (and name) of this device. If which= argument absent, it is set to the next device. | X11 2 |
dev.copy(which=3) | Copies the graphic on one device to the third device (device specified by the which= argument) | X11 3 |
dev.copy(device=pdf,...) | Copies the graphic on one device to a named device type (device specified by the device= argument). Other options can be supplied to control device sizes etc. | X11 3 |
dev.off() | Closes the device specified by the which= argument (or current device if which= argument absent), makes the next device active and returns the number (and name) of this device. | X11 3 |
The latest version of an R installation binary (or source code) can be downloaded from one of the Comprehensive R Archive Network (or CRAN) mirrors. Having selected one of the (Australian) mirrors, follow one of the sets of instructions below (depending on your operating system).
Windows
- From the list of precompiled binaries, select 'Windows'
- Select the 'base' subdirectory
- Click on 'Download R_2.XX.X for Windows' (where XX.X is the current version number and release).
- Depending on the version of Windows you are running (XP, Vista or Windows 7), run (execute) this downloaded install binary. Note, if you are running Vista, you should execute this binary as Administrator. This ensures that R is installed in the system area (c:\Program Files) rather than in the user space which can lead to issues.
- Under Vista or Windows 7, when warned of either 'unidentified publisher' or 'unknown publisher' respectively, elect to allow or continue
- The installer is fairly straight forward and will guide you through the installation. The default options are typically adequate.
- Startup menus and a desktop icon will be generated
MacOSX
- From the list of precompiled binaries, select 'MacOS X'
- Click on 'R-2.XX.X.pkg' (where XX.X is the current version number and release) to download the installer package. Note, this is for MacOS X 10.5 (Leopard) and 10.6 (Snow Leapard) only. For earlier versions of MacOSX, you will need to scroll down to the link to 'old'
- Run the installer (double click on the image in the new finder window)
- If you are not already logged in as Administrator, you will be prompted for the Administrator password
- The installer is fairly straight forward and will guide you through the installation. The default options are typically adequate.
Linux
- From the list of precompiled binaries, select 'Linux'
- R is available in range of pre-compiled binaries for Debian (Ubuntu), Red Hat and SuSE based distributions
- Quite frankly, if you are a linux user, you do not require instructions!
Basic Syntax
The R environment and command line
A first look at R
- Double click on the RGui icon on the desktop
- Click the START button from the Windows task bar
- Click Program Files
- Click R2.XX.X (where XX.X is the version and release numbers)
- Click RGui (Gui stands for Graphics User Interface)
![](../images/Rgui.jpg)
Upon opening R, you are presented with the R Console along with the command prompt (>). R is a command driven application (as opposed to a 'point-and-click' application) and despite the steep learning curve, there are many very good reasons for this.
Commands that you type are evaluated once the Enter key has been pressed
Enter the following command at the command prompt (>);
This evaluates the command five plus one and returns the result (six).. The [1] before the 6 indicates that the object immediately to its right is the first element in the returned object. In this case there is only one object returned. However, when a large set of objects (e.g. numbers) are returned, each row will start with an index number thereby making it easier to count through the elements.
Important definitions
- Object
- As an object oriented language, everything in R is an object. Data, functions even output are objects.
- Vector
- A collection of one or more objects of the same type (e.g. all numbers or all characters).
- Function
- A set of instructions carried out on one or more objects. Functions are typically wrappers for a sequence of instructions that perform specific and common tasks.
- Parameter
- The kind of information passed to a function.
- Argument
- The specific information passed to a function.
- Operator
- A symbol that has a pre-defined meaning. Familiar operators include + - * and /.
Assignment operators <- Assigning a name to an object = Used when defining and specifying function arguments Logical operators (return TRUE or FALSE) Less than Greater than <= Less than or equal >= Greater than or equal == Is the left hand side equal to the right hand side (a query) != Is the left hand side NOT equal to the right hand side (a query) && Are BOTH left hand and right hand conditions TRUE || Are EITHER the left hand OR right hand conditions TRUE
Expressions, Assignment and Arithmetic
Instead of evaluating a statement and printing the result directly to the console, the results of evaluations can be stored in an object via a process called 'Assignment'. Assignment assigns a name to an object and stores the result of an evaluation in that object. The contents of an object can be viewed (printed) by typing the name of the object at the command prompt and hitting Enter.
A single command (statement) can spread over multiple lines. If the Enter key is pressed before R considers the statement complete, the next line in the console will begin with the prompt + indicating that the statement is not complete.
When the contents of an object are numbers, standard arithmetic applies;
Objects can be concatenated (joined together) to create objects with multiple entries. Object concatenation is performed using the c() function.
Sessions and Workspaces
A number of objects have been created in the current session (a session encapsulates all the activity since the current instance of the R application was started). To review the names of all of the objects in the users current workspace (storage of user created objects);
From the above output, ignore the elements 'printVerbatim', 'Routput' and 'SweaveSyntaxHTML' - they are objects that are involved in the production of this web-page! Note, the [5] indicating that the second row of output starts with the fifth element of the output.
You can also refine the scope of the ls() function to search for object names that match a pattern;
The longer the session is running, the more objects will be created resulting in a very cluttered workspace. Unneeded objects can be removed using the rm() function;
Current working directory
The R working directory (location from which files/data are read and written) is by default, the location of the R executable (or execution path in Linux). The current working directory can be reviewed and changed (for the session) using the getwd() function and setwd() function respectively.
/home/murray/Documents
Quitting elegantly
To quit R, issue the following command; Note in Windows and MacOSX, the application can also be terminated using the standard Exiting protocols.
You will then be asked whether or not you wish to save the current workspace. If you do, enter 'Y' otherwise enter 'N'. Unless you have a very good reason to save the workspace, I would suggest that you do not. A workspace generated in a typical session will have numerous poorly named objects (objects created to temporarily store information whilst testing). Next time R starts, it could restore this workspace thereby starting with a cluttered workspace, but becoming a potential source of confusion if you inadvertently refer to an object stored during a previous session.
Getting help
There are numerous ways of seeking help on R syntax and functions (the following all ways of finding information about a function that calculates the mean of a vector).
- Providing the name of the function as an argument to the help(mean)function
> help(mean)
- Typing the name of the function preceded by a '?'
> ?mean
- To run the examples within the standard help files, use the example() function
> example(mean)
- Some packages include demonstrations that showcase their features and use cases. The demo() function provides a user-friendly way to access these demonstrations. For example, to respectively get an overview of the basic graphical procedures in R and get a list of available demonstrations;
> demo(graphics) #run the graphics demoNote in the above example everything following the # (comment character) is ignored. This provides a way of including comments.
> demo() #list all demos available on your system - If you don't know the exact name of the function, the apropos() function is useful as it returns the name of all objects from the current search list that match a specific pattern;
> apropos("mea") [1] "colMeans" "frameApply" "influence.measures"[4] "kmeans" "mean" "mean.data.frame"[7] "mean.Date" "mean.default" "mean.difftime"[10] "mean.POSIXct" "mean.POSIXlt" "plotmeans"[13] "rowMeans" "runmean" "weighted.mean" - If you have no idea what the function is called, the help.search() and help.start() functions search through the regular manuals and the local HTML manuals (via a web browser) respectively for specific terms;
> help.search('mean') #search the local R manuals
> help.start() #search the local HTML R manuals
Functions
As a wrapper for a collection of commands used together to perform a task, functions provide a convenient way of interacting with all of these commands in sequence. Most functions require one or more inputs (arguments), and while a particular function can have multiple arguments, not all are necessarily required (some could have default values).
Consider the seq() function, which generates a sequence of values (a vector) according to the values of the arguments. This function has the following definition;
- If the function is called without any arguments (e.g. seq()), it will return a single number 1. Using the default arguments for the function, it returns a vector starting at 1 (from=1), going up to 1 (to=1) and thus having a length of 1.
- We can alter this behavior by specifically providing values for the named arguments. The following generates a sequence of numbers from 2 to 10 incrementing by 1 (default);
- The following generates a sequence of numbers from 2 to 10 incrementing by 2;
- Alternatively, instead of manipulating the increment space of the sequence, we could specify the desired length of the sequence;
- Named arguments need not include the full name of the parameter, so long as it is unambiguous which parameter is being referred to. For example, length.out could be shortened to just l since there are no other parameters of this function that start with 'l';
- Parameters can also be specified as unnamed arguments provided they are in the order specified in the function definition. For example to generate a sequence of numbers from 2 to 10 incrementing by 3;
- Named and unnamed arguments can be mixed, just remember the above rules about parameter order and unambiguous names;
Data Types
Vectors
Vectors are a collection of one or more entries (values) of the same type (class) and are the basic storage unit in R. Vectors are one-dimensional arrays (have a single dimension - length) and can be thought of as a single column of data. Each entry in a vector has a unique index (like a row number) to enable reference to particular entries in the vector.
The c() function
The c() function concatenates values together into a vector. To create a vector with the numbers 1, 4, 7, 21;
As an example, we could store the temperature recorded at 10 sites;
To create a vector with the words 'Fish', 'Rock', 'Tree', 'Git';
Regular or patterned sequences
We have already seen the use of the seq() function to create sequences of entries.
Sequences of repeated entries are supported with the rep() function;
The paste() function
To create a sequence of quadrat labels we could use the c() function as illustrated above,e.g.
A more elegant way of doing this is to use the paste() function;
This can be useful for naming vector elements. For example, we could use the names() function to name the elements of the temperature variable according to the quadrat labels.
Vector Classes
Vector Class | Example |
integer (whole numbers) | > 2:4 [1] 2 3 4 > c(1,3,9) [1] 1 3 9 |
numeric (real numbers) | > c(8.4, 2.1) [1] 8.4 2.1 |
character (letters) | > c('A', 'ABC') [1] "A" "ABC" |
logical (TRUE or FALSE) | > c(2:4)==3 [1] FALSE TRUE FALSE |
Factors
Factors are more than a vector of characters. Factors have additional properties that are utilized during statistical analyses and graphical procedures. To illustrate the difference, we will create a vector to represent a categorical variable indicating the level of shading applied to 10 quadrats. Firstly, we will create a character vector;
Now we convert this into a factor;
Notice the additional property (Levels) at the end of the output. Notice also that unless specified otherwise, the levels are ordered alphabetically. Whilst this does not impact on analyses, it does effect interpretations and graphical displays. If the alphabetical ordering does not reflect the natural order of the data, it is best to reorder the levels whilst defining the factor;
A more convenient way to create a balanced (equal number of replicates) factor is to use the gl() function. To create the shading factor from above;
Matrices
Matrices have two dimensions (length and width). The entries (which must be all of the same type - class) are in rows and columns.
We could arrange the vector of shading into two columns;
As another example, we could store the X,Y coordinates for five quadrats within a grid. We start by generating separate vectors to represent the X and Y coordinates and then we bind them together using the cbind() function;
We could even alter the row names;
Lists
Lists provide a way to group together multiple objects of different type. For example, whilst the contents of any single vector or matrix must all be of the one type (e.g. all numeric or all character) a list can contain a vector or numerics and a matrix or characters. Furthermore, the objects contained in a list do not need to be of the same lengths (c.f data frames). The output of most analyses are stored as lists.
As an example, we could group together the previously created isolated vectors and matrices into a single object that encapsulates the entire experiment;
Object Manipulation
Indexing
Indexing is the means by which data are filtered (subsetted) to include and exclude certain entries.
Vector indexing
Subsets of vectors are produced by appending an index vector (inclosed in square brackets []) to a vector name. There are four common forms of vector indexing used to extract a subset of vectors
- Vector of positive integers - a set of integers that indicate which elements of the vector should be included;
- Vector of negative integers - a set of integers that indicate which elements of the vector should be excluded;
- Vector of character strings (referencing names) - for vectors whose elements have been named, a vector of names can be used to select elements to include;
- Vector of logical values - a vector of logical values (TRUE or FALSE) the same length as the vector being subsetted. Entries corresponding to a logical TRUE are included, FALSE are excluded;
Matrix indexing
Similar to vectors, matrices can be indexed using positive integers, negative integers, character strings and logical vectors. However, whereas vectors have a single dimension (length), matrices have two dimensions (length and width). Hence, indexing needs to reflect this. It is necessary to specify both the row and column number. Matrix indexing takes of the form of [row.indices, col.indices] where row.indices and col.indices respectively represent sequences of row and column indices. If a row or column index sequence is omitted, it is interpreted as the entire row or column respectively.
Sorting
The sort() function is used to sort vector entries in increasing order.
The order() function is used to get the position of each entry in a vector if it were sorted.
The rank() function is used to get the ranking of each entry in a vector if it were sorted.
Pivot tables
The apply() family of functions applies a function to the margins (1=row margins, 2=column margins) of a matrix.
Lets say we wanted to represent the abundance of three species of fish in three types
We could now calculate the column means (mean abundance of each species across habitats);
The tapply() function applies a function to a vector separately for each level of a factorial variable. For example, if we wanted to calculate the mean temperature for each level of the shade variable;
R Editors
Notepad++
Features
- Syntax highlighting - text colored according to syntax rules
- Code folding - mainly useful when writing functions
- Supports huge range of languages (not just R)
- Bracket matching
- Submit code to R line-by-line or selected lines (F8 key)
- Windows only
Installation and setup
- Download Notepad ++ from here
- Select 'Download the current version'
- Select 'Notepad++ v5.9 Installer'
- Click 'Run' and when prompted verify the publisher
- Use all of the defaults while installing
- Similarly, download and install NppToR (acts as a conduit between Notepad++ and R) from here
- Start Notepad++ (and R) by right clicking on the NppToR icon in the Windows Task Tray and selecting 'Start Notepad++'
RStudio
Features
- Syntax highlighting - text colored according to syntax rules
- Specifically designed for R
- Bracket matching
- Console integrated
- Cross platform
- Submit (Run) code line-by-line or selected lines (Cntr-ENTER)
- Submit (Run) all code (Cntr-Shift-ENTER)
- Auto-complete (Cntr-SPACE)
- Parameter prompting and integrated help (Cntr-SPACE)
- Live workspace
- Live command history
- Fully integrated File manager
- Intuitive and user-friendly package manager
- Integrated R help browser
Installation and setup
- Download Notepad ++ from here
- Select the installation package recommended for your system (e.g. 'RStudio Desktop 0.93.84 - Windows XP/Vista/7')
- Click 'Run' and when prompted verify the publisher
- Use all of the defaults while installing
- Start RStudio from the Windows Start Menu
HowTo's
- Start a new script
- File->New->Rscript
Emacs
Feature | Notepad ++ | RStudio | Emacs |
---|---|---|---|
Platform | Win only | Win,Mac,Linux | Win,Mac,Linux |
Syntax Highlighting | Yes | Yes | Yes |
Bracket matching | Yes | Yes | |
Integrated Console | No | Yes | Yes |
Auto-complete | Yes | Yes | Yes |
Parameter prompting and integrated help | No | Yes | No |
Code folding | Yes | No | Yes |