ESSENTIALS OF R

Table of Contents

1 Installing R

R download

2 Help in R:

There are several ways to access help in R.

?function

Help.search()

find(mean)- gives package where the function is located

apropos('lm') - gives all functions with the given word

ls() or objects()

rm()-remove variables

R can do most of the basic mathematical operations as there are many inbuilt mathematical functions such as log, exp, factorial etc

3 Operator tokens:

Operators tokens are one of the basic operations used for any statistical study to manage the amounts of data in an accessible way.


  +,-,*/%%,^            arithmetic
  >,>=,<,<=,==,!=       relational
  !,&,|                 logical
  ~                     model formulae
  <-,->,=               assignment
  $                     list indexing (the element name operator)

4 Data structures

There are several types datastructures in R. They are

Vectors - numeric, character, logical and complex Factor - derivative of vector; numeric and character; shows levels Matrix - vector with additional attribute(dim); Data Frame - set of vectors List - general kind and can have all types

5 Basic operations in R which are useful

x=seq(2,20,3) ##creates a sequence within the range with the specified difference
y=(1,19,along=x)
z=(1,26,along=x)
pmin(x,y,z) ##it gives parallel minimum of several vectors of equal length
pmax(x,y,z)
gl(4,3)-generating levels
sequence(5)
k=sequence(c(5,2,4))
sort(k)
price=rnorm(12,200,10)
ranks=rank(price)
sorted=sort(price)
ordered=order(price)
view=data.frame(price,ranks,sorted,ordered)
sample(y)= ##shuffles the set of data
sample(y,replace=T)

6 Basic dataframe operations

age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
village=data.frame(age,height)
remove(age,height)
village$age
village$height
village$height
attach(village)
plot(age,height)
res=lm(height~age)
abline(res)
detach(village)

7 loops

Loops are a major part of any programming languages as they facilitate us to perform many complex operations with the available functions.

They also make the functions applicable to several sets of data at a time

for(i in 1:5)print(i^2)
j=k=0
for(i in 1:5){
j=j+1
k=k+i*j
print(i+j+k)}

Try to get the following series using the loops

1 1 2 3 5 8

8 Disappearing graphics:

It is general characterstic for any code in R to overwrite an particular graph when another is executed. This is a problem when a preexisting code with several graphs is run. so the following operation shows a way for us to go through each of the graphs.

par(ask=TRUE)
par("ylog") 
plot(1 : 12, log = "y")
par("ylog")
plot(1:2, xaxs = "i")
par(c("usr", "xaxp"))
nr.prof = c(prof.pilots=16,lawyers=11,farmers=10,salesmen=9,physicians=9,mechanics=6,policemen=6,managers=6,engineers=5,teachers=4,housewives=3,students=3,armed.forces=1)
par(las = 3)
barplot(rbind(nr.prof))

9 Pattern Matching

Pattern matching is an essential operation in statistics because we need to either replace a particular text of get the position of an particular text.

This can be made easy by functions such as gsub, grep, sub etc..

text=c('arm','leg','head','foot','hand','hindleg','elbow')
gsub('h','H',text)
sub('o','O',text)
gsub('^.','O',text)
gsub('(\\w)(\\w*)','\\U\\1\\L\\2',text,perl=TRUE)
grep('o',text)

10 Testing and Coercing

Testing functions are always of the form is.type

Ex: is.array; is.character..

Coercing functions are used to change the object from one form to other and they are of type as.type

Ex: as.array; as.character….

11 Error bars

Most of the comparitive data is expressed in the bar plots and also error bars are essential in most of the cases one of the major incovinience in R is its base verrsion doesnt have any specific functions for error bars so the function is to be manually created using loops and function.

Here is a data from the book to try the error function (link)

  biomass clipping
  551 n25
  457 n25
  450 n25
  731 n25
  499 n25
  632 n25
  595 n50
  580 n50
  508 n50
  583 n50
  633 n50
  517 n50
  639 r5
  615 r5
  511 r5
  573 r5
  648 r5
  677 r5
  417 control
  449 control
  517 control
  438 control
  415 control
  555 control
  563 r10
  631 r10
  522 r10
  613 r10
  656 r10
  679 r10

Here is the function:

error.bars=function(yv,z,nn){
xv=barplot(yv,ylim=c(0,(max(yv)+max(z))),names=nn,ylab=deparse(substitute(yv)))
g=(max(xv)-min(xv))/50
for (i in 1:length(xv)) {
lines(c(xv[i],xv[i]),c(yv[i]+z[i],yv[i]-z[i]))
lines(c(xv[i]-g,xv[i]+g),c(yv[i]+z[i], yv[i]+z[i]))
lines(c(xv[i]-g,xv[i]+g),c(yv[i]-z[i], yv[i]-z[i]))
}}

Here is how the function can be used:

1:  comp<-read.table(yourfile,header=T)
2:  attach(comp)
3:  names(comp)
4:  se<-rep(28.75,5)
5:  labels<-as.character(levels(clipping))
6:  ybar<-as.vector(tapply(biomass,clipping,mean))
7:  error.bars(ybar,se,labels)

Author: SRIHARI <shrihari at src>

Date: 2009-07-03 13:58:31 CDT

HTML generated by org-mode 6.21b in emacs 23