Laurie Stevison & Amanda Clark
If you are using this code with the video, note that some slides have been added post-recording and are not shown. See 4.06.walkthrough.R in the compressed data tarball for the code from the recording.
For this tutorial we will be working with data from the Stevison lab, the “Exp5_rawdata.csv” dataset. Make an R Notebook for this walk-through tutorial to save all the code you will be learning. We will cover:
You are working within an R project (check in the top right corner of RStudio - you should see the project name “R_Mini_Course”).
This means that the project directory will also be set as working directory. The exception is in a R Notebook, where the working directory is where the R Notebook is saved.
You should be saving your notebooks in the R Project and using
../..
to point at the main project directory.
Before starting, it may be helpful to have a chunk of code that does the following:
rm(list=ls())
library(<package-name>)
sessionInfo()
list.files(getwd())
Let’s use some data from my lab! These are raw data from an experiment where we measured the impact of temperature on meiotic recombination rate.
We recorded phenotype as 0’s or 1’s, so we will use R to categorize the data into crossover classes.
You will need to add path information to the raw_data
directory once you have uncompressed the data tarball.
Read “Exp5_rawdata.csv” into an object called exp5
:
You may also read in the previously made object:
## Date ViaNumber Day numbMales ct sd y se Initials Notes
## 1 26-Oct 14 V 1 0 0 1 1 UHA
## 2 26-Oct 14 V 2 1 1 1 1 UHA
## 3 26-Oct 13 V 2 0 1 1 1 UHA
ifelse
?An ifelse
statement can help us categorize these values,
similar to how we used cut to break the quake magnitudes into multiple
quantiles in module 3.
Basic syntax is :
ifelse(<condition>,<yes>,<no>)
Let’s create vector, x
, and determine whether each
element is greater than or less than 5
## [1] "lessthan5" "lessthan5" "lessthan5" "lessthan5" "lessthan5" "lessthan5"
## [7] "lessthan5" "lessthan5" "lessthan5" "greaterthan5" "greaterthan5" "greaterthan5"
## [13] "greaterthan5"
For each element in vector x
, R returned the statement
we requested when the test was true or false.
On your own, set up an ifelse
conditional that subtracts
1 when x is greater than 5, and adds 1 when x is less than 5. Below, I
have recreated the created the vector x by randomly sampling 5 values
between 1 and 20:
ifelse
statementsSometimes, we may want to have more than a binary categorization. We
can use a nested ifelse
statement to do this. The more
nests, the more categories we will create.
## [1] "lessthan2" "greaterthan5" "greaterthan5" "greaterthan5" "greaterthan5"
## [1] "Medium" "High" "High" "High" "High"
Both lines include a single nested ifelse
condition.
What is the difference between them?
Both nests give you three categories, but are specified slightly differently.
In the first example, we are adding an additional ifelse
statement, when the first conditional is false. In the second example,
the additional ifelse
statement is included when the first
conditional is true.
Now that we’ve converted a continuous string (i.e., the contents of
x
) into a categorical variable (i.e., “High”, “Medium”,
“Low”), there are all sorts of things we can do, such as make box plots
within each category.
Let’s use this notation to define crossover classes where we will
have several nested ifelse
statements
exp5$co_class <- ifelse(exp5$sd==exp5$y & exp5$y==exp5$se,"non_CO",
ifelse(exp5$sd!=exp5$y & exp5$y==exp5$se,"single_CO_1",
ifelse(exp5$sd==exp5$y & exp5$y!=exp5$se,"single_CO_2",
ifelse(exp5$sd!=exp5$y & exp5$y!=exp5$se,"double_CO",
"error"))))
# Check the levels of our new co_class factor (category).
levels(as.factor(exp5$co_class))
## [1] "double_CO" "non_CO" "single_CO_1" "single_CO_2"
We can see from the levels present in the column that no errors were generated!
What other syntax could you use to determine if any of the entries were categorized as “error” (hint: think about the subset functions)?
## [1] Date ViaNumber Day numbMales ct sd y se Initials
## [10] Notes co_class
## <0 rows> (or 0-length row.names)
Let’s repeat, but define crossovers numerically. Here non crossovers will count as zero crossovers, single crossovers will count as 1 CO, and double crossovers will count as 2 COs.
Since we only focused on males in our experiment, we will multiply by the male count to get the number of COs per row of data.
Now we would like to calculate the percentage of crossovers, per vial, to obtain the recombination frequency. This seems like something we would want to do frequently, so let’s create a function!
The basic syntax of a function is:
<funtion-name> <- function(<input>) {
<function actions>
return(<output>)
}
Here, we are storing the new function into an object called
percentage
. We use the function, function
, to
create our custom function.
Within our function, we are performing a simple calculation in a code
block. We include a return
function to specify the output
to the user from the function.
To use our function, let’s calculate the total number of crossovers and the total number of individuals.
Now, let’s use the function to calculate recombination frequency
## [1] 77.87403
Here is another way to use the function that is more streamlined, but has lower readability.
Every time you type name()
that is a function (e.g.,
sum()
) created for reproducibility! Learn more about the
functions you currently use by typing ?function-name
without the () in your console.
Any time you begin to copy-paste a code block or generate a for-loop, you could write a function!
Based on your own work or interests, think of a calculation, conversion, or rearrangement you do commonly and create a function to do it for you.