Session 5

DOWNLOAD from the link http://www.bio.ic.ac.uk/research/mjcraw/therbook/data following files: worms.txt

Table of Contents

  • 1 Assignment
  • 1 Data Input

    1.1 Scan Function

    x<-scan() #is used if you want to type a few numbers into a vector from the keyboard

    x

    1.2 Data Input From Files

    Save your files in Excel as a tab-delimited text file (* .txt) to open it with read.table and as comma-delimited csv file (*.csv)to open it with read.csv function

    1.3 Setting the Working Directory

    setwd("c:\\temp")

    To find out the name of current working directory

    1.4 Checking Files from the Command Line

    file.exists("c:\\temp\\Decay.txt") #it can be useful to check whether a given filename exists in the path where you think it should be

    2 Dataframes

    worms <- read.table("c:\\temp\\worms.txt", header=T)
    names(worms)
    head(worms)
    summary(worms) ##gives values of continous variables
    by(worms,Vegetation,mean) ##allows summary of the database on the basis of factor level
    aggregate(worms[-4], worms[4], mean) ##allows summary of the database on the basis of factor level
    

    2.1 Subscripts and Indices

    worms[3,5]

    worms[14:19,7]

    worms[sample(1:20,8),] ##select a unique 8 of the 20 rows at random

    worms[order(worms$Slope),]

    worms[rev(order(worms$Slope)),]

    worms[order(worms$Vegetation, worms$Worm.density),]

    worms[order(worms$Vegetation,-worms$Worm.density),]

    unique(worms$Vegetation)

    worms[unique(worms$Vegetation),]

    Extract all records from worms with the greatest value of pH for each unique vegetation

    2.2 Using Logical Conditions to Select Rows from the Dataframe

    1. Select all data from damp fields
    2. Show the data where Worm.density was maximum
    3. Show the data where pH larger than median value of pH, and the Slope lager than 4
    4. Show the data for Vegetation Scrub
    5. Show all data except for Vegetation Scrub
    6. Show all numeric data from worms table
    7. Show all factor data from worms table

    2.3 Omitting Rows Containing Missing Values, NA

    mis<-read.table('c:\\temp\\worms.missing.txt',header=T)

    head(mis)

    na.omit(mis) or na.exclude(mis) ###omits data containing NA

    mis[is.na(mis)] <- 0 ##substitute all NA's to 0

    2.4 A Dataframe with Row Names instead of Row Numbers

    Make column Field.Names rownames of data frame worms rownames(worms)<-worms[,1];worms[,-1]

    or worms<-read.table("c:\\temp\\worms.txt", header=T, row.names=1)

    2.5 Creating dataframe from another kind of object

    x<-runif(10)

    y<-letters[1:10]

    z<-sample(c(rep(T,5),rep(F,5)))

    new<-data.frame(x,y,z)

    2.6 Eliminating duplicate rows from the dataframe

    dups<-read.table("c:\\temp\\dups.txt", header=T)

    unique(dups)

    dups[duplicated(dups),]

    2.7 Dates in Dataframes

    nums<-read.table("c:\\temp\\sortdata.txt",header=T)

    nums[order(nums$date),]

    dates<-strptime(nums$date,format ="%d/%m/%Y");dates

    nums<-cbind(nums,dates);head(nums)

    nums[order(as.character(dates)),1:4]

    3 Tables

    3.1 Summary Tables

    data<-read.table("c:\\temp\\daphnia.txt",header=T);head(data)

    1. Calculate mean value of Growth rate for each detergent
    2. Calculate median value of Growth rate for each river (Water)
    3. Calculate sum of Growth rate for each clone (Daphnia)
    4. To calculate the mean of Growth rate for for each clone in each Detergent use list function:

    tapply(data$Growth.rate,list(data$Daphnia, data$Detergent),mean)

    1. Calculate standard deviation, using sd function for each clone in each detergent:
    2. Three dimentional tables:

    tapply(data$Growth.rate,list(data$Daphnia, data$Detergent,data$Water),sd) tapply gives a stack of two-dimentional tables, use ftable function:

    ftable(tapply(data$Growth.rate,list(data$Daphnia, data$Detergent,data$Water),sd))

    1. When you have in table missing values (NA's), you can use an extra argument in tapply: na.rm=T
    2. Calculate the mean values of Soil.pH per Vegetation for dataframe mis. What is the value for Grassland?:
    3. You can use tapply function to extract the levels fro each calculated mean, and create a dataframe, containing 3 columns: mean, detergent, and daphnia:

    first you need to convert your factor to numeric: as.numeric(data$Detergent)

    tapply(as.numeric(data$Detergent),list(data$Detergent, data$Daphnia),mean)

    dets<-as.vector(tapply(as.numeric(data$Detergent),list(data$Detergent, data$Daphnia),mean))

    detergent<-levels(data$Detergent)[dets]

    clones<-as.vector(tapply(as.numeric(data$Daphnia),list(data$Detergent, data$Daphnia),mean))

    daphnia<-levels(data$Daphnia)[clones]

    means<-as.vector(tapply(data$Growth.rate,list(data$Detergent, data$Daphnia),mean))

    data.frame(means, detergent, daphnia)

    1. The same result can be obtained using as.data.frame.table function:

    as.data.frame.table(tapply(data$Growth.rate,list(data$Detergent, data$Daphnia),mean))

    1. Edit the names of the colums.

    3.2 Tables of Counts

    1. Expanding a table into a dataframe:

    count.table<-read.table("c:\\temp\\tabledata.txt", header=T)

    Create a dataframe with a separate row for each case. Using lapply function apply the repeat function to each variable in count.table, such that each row is repeated by the number of times specified in the count column

    lapply(count.table, function(x)rep(x,count.table$count))

    Then convert this object from a list to a dataframe using as.data.frame:

    dbtable<-as.data.frame(lapply(count.table, function(x)rep(x,count.table$count)))

    Now remove the count column:

    1. Converting from a dataframe to a table

    table(dbtable)

    frame<-as.data.frame(table(dbtable))

    Rename the last column to the 'count'

    1. Calculating tables of proportion.

    counts<-matrix(c(2,2,4,3,1,4,2,0,1,5,3,3),nrow=4)

    Calculate proportions as a fractions of the row totals:

    a<-prop.table(counts,1)

    rowSums(a)

    Calculate proportions as a fractions of the colums totals:

    b<-prop.table(counts,2)

    colSums(b)

    Calculate proportions as a fractions of the grand total:

    c<-prop.table(counts)

    sum(c)

    1. The expand.grid function

    Function is used to generate tables of combination of factor levels:

    expand.grid(height=seq(60,80,5), weight=seq(100,300,50), sex=c('Male','Female'))

    1 Assignment

    Use this file species to calculate the relative percent abundance for each species per site.

    Author: ELVIRA <kurmaevaer@titan.sfasu.edu>

    Date: 2009-07-10 15:12:00 CDT

    HTML generated by org-mode 6.21b in emacs 23