Getting Started

If you are using this code with the video, note that some slides have been added post-recording and are not shown. See 4.06.walkthrough.R in the compressed data tarball for the code from the recording.

For this tutorial we will be working with data from the Stevison lab, the “Exp5_rawdata.csv” dataset. Make an R Notebook for this walk-through tutorial to save all the code you will be learning. We will cover:

functions
if-else statements

Set up workspace

You are working within an R project (check in the top right corner of RStudio - you should see the project name “R_Mini_Course”).
This means that the project directory will also be set as working directory. The exception is in a R Notebook, where the working directory is where the R Notebook is saved.
You should be saving your notebooks in the R Project and using ../.. to point at the main project directory.
Before starting, it may be helpful to have a chunk of code that does the following:
- clear your workspace rm(list=ls())
- load your packages library(<package-name>)
- check your session information sessionInfo()
- list files in your working directory list.files(getwd())

Read in datasets

Let’s use some data from my lab! These are raw data from an experiment where we measured the impact of temperature on meiotic recombination rate.

We recorded phenotype as 0’s or 1’s, so we will use R to categorize the data into crossover classes.

You will need to add path information to the raw_data directory once you have uncompressed the data tarball.

Read “Exp5_rawdata.csv” into an object called exp5:

#exp5=read.csv(file="Exp5_rawdata.csv", header = T)
#View(exp5)

You may also read in the previously made object:

exp5 <- readRDS(file = "data/Exp5_rawdata.rds")
head(exp5, 3)

##     Date ViaNumber Day numbMales ct sd y se Initials Notes
## 1 26-Oct        14   V         1  0  0 1  1      UHA      
## 2 26-Oct        14   V         2  1  1 1  1      UHA      
## 3 26-Oct        13   V         2  0  1 1  1      UHA

How can we use `ifelse`?

An ifelse statement can help us categorize these values, similar to how we used cut to break the quake magnitudes into multiple quantiles in module 3.

Basic syntax is :

ifelse(<condition>,<yes>,<no>)

Default Usage

Let’s create vector, x, and determine whether each element is greater than or less than 5

x <- c(0.5,0.75,1.1,1.9,1,2,3,4,5,6,7,8,9)

ifelse(x > 5, "greaterthan5", "lessthan5")

##  [1] "lessthan5"    "lessthan5"    "lessthan5"    "lessthan5"    "lessthan5"    "lessthan5"   
##  [7] "lessthan5"    "lessthan5"    "lessthan5"    "greaterthan5" "greaterthan5" "greaterthan5"
## [13] "greaterthan5"

For each element in vector x, R returned the statement we requested when the test was true or false.

On your own, set up an ifelse conditional that subtracts 1 when x is greater than 5, and adds 1 when x is less than 5. Below, I have recreated the created the vector x by randomly sampling 5 values between 1 and 20:

x <- c(sample(1:20, size = 5))

#ifelse()

Nested `ifelse` statements

Sometimes, we may want to have more than a binary categorization. We can use a nested ifelse statement to do this. The more nests, the more categories we will create.

ifelse(x>5, "greaterthan5", ifelse(x<2, "lessthan2", "between 5 and 2"))

## [1] "lessthan2"    "greaterthan5" "greaterthan5" "greaterthan5" "greaterthan5"

ifelse(x<=2, ifelse(x<1, "Low","Medium"), "High")

## [1] "Medium" "High"   "High"   "High"   "High"

Both lines include a single nested ifelse condition. What is the difference between them?

Both nests give you three categories, but are specified slightly differently.

In the first example, we are adding an additional ifelse statement, when the first conditional is false. In the second example, the additional ifelse statement is included when the first conditional is true.

Now that we’ve converted a continuous string (i.e., the contents of x) into a categorical variable (i.e., “High”, “Medium”, “Low”), there are all sorts of things we can do, such as make box plots within each category.

But first, let’s return to our recombination data:

Let’s use this notation to define crossover classes where we will have several nested ifelse statements

exp5$co_class <- ifelse(exp5$sd==exp5$y & exp5$y==exp5$se,"non_CO", 
                     ifelse(exp5$sd!=exp5$y & exp5$y==exp5$se,"single_CO_1",
                            ifelse(exp5$sd==exp5$y & exp5$y!=exp5$se,"single_CO_2",
                                   ifelse(exp5$sd!=exp5$y & exp5$y!=exp5$se,"double_CO",
                                          "error"))))

# Check the levels of our new co_class factor (category). 
levels(as.factor(exp5$co_class))

## [1] "double_CO"   "non_CO"      "single_CO_1" "single_CO_2"

We can see from the levels present in the column that no errors were generated!

What other syntax could you use to determine if any of the entries were categorized as “error” (hint: think about the subset functions)?

Answer

exp5[exp5$co_class=="error",]

##  [1] Date      ViaNumber Day       numbMales ct        sd        y         se        Initials 
## [10] Notes     co_class 
## <0 rows> (or 0-length row.names)

Let’s repeat, but define crossovers numerically. Here non crossovers will count as zero crossovers, single crossovers will count as 1 CO, and double crossovers will count as 2 COs.

Since we only focused on males in our experiment, we will multiply by the male count to get the number of COs per row of data.

exp5$num_co <- ifelse(exp5$y==exp5$sd & exp5$y==exp5$se,0, 
                   ifelse(exp5$sd==exp5$y & exp5$y!=exp5$se,1*exp5$numbMales, 
                          ifelse(exp5$sd!=exp5$y & exp5$y==exp5$se,1*exp5$numbMales,  
                                 ifelse(exp5$sd!=exp5$y & exp5$y!=exp5$se,2*exp5$numbMales, 
                                        NA))))

Functions

Now we would like to calculate the percentage of crossovers, per vial, to obtain the recombination frequency. This seems like something we would want to do frequently, so let’s create a function!

The basic syntax of a function is:

<funtion-name> <- function(<input>) {

<function actions>

return(<output>)

}

Here, we are storing the new function into an object called percentage. We use the function, function, to create our custom function.

Within our function, we are performing a simple calculation in a code block. We include a return function to specify the output to the user from the function.

percentage <- function(x,y) {
  p=(x/y)*100
  return(p)  
}

Using our function

To use our function, let’s calculate the total number of crossovers and the total number of individuals.

a=sum(exp5$num_co)
b=sum(exp5$numbMales)

Now, let’s use the function to calculate recombination frequency

percentage(a,b)

## [1] 77.87403

Here is another way to use the function that is more streamlined, but has lower readability.

percent=percentage(sum(exp5$num_co),sum(exp5$numbMales))

Make your own function

Every time you type name() that is a function (e.g., sum()) created for reproducibility! Learn more about the functions you currently use by typing ?function-name without the () in your console.

Any time you begin to copy-paste a code block or generate a for-loop, you could write a function!

Based on your own work or interests, think of a calculation, conversion, or rearrangement you do commonly and create a function to do it for you.

Programming in R