R for Biology - Core Concepts
Overview
This module introduces you to the core concepts of the R language to help you build a solid foundation. Most importantly, it shows you how to think in terms of vectors and functions. You will also get introduced to a number of powerful libraries including tidyverse, stringr and rmarkdown.
Questions You Will Be Able to Answer after This Module
1) The temperature of an oven increases linearly from 30C to 150C in 10 minutes, stays the same for one hour and then decreases linearly back to 30C in 20 minutes. Create a vector in R that represents the temperature over the 90 minutes.
(b) Display time vs temperature data for the oven.
- Use R vector to compute the following sums -
(i) 1 + 3+ 5 + 7 + …. up to N terms.
(ii) 1x2x3 + 2x3x4 + 3x4x5 +…. up to N terms
- In research, you often have to select subset of data from a larger set for analysis. For example, you may have a column of numbers, where every odd entry contains the data from one sample and even entry from the other sample. In this exercise, you choose subset from a larger set using R.
(a) You have data from a 96-well measurement and you want to extract every 12th number from it. Use R to perform this analysis.
(b) Display the results from 96-well measurement and the selected subset as two histograms to compare.
-
Create a vector that selects every 3rd nucleotide from a given gene sequence.
-
Use R vector and logical operations to check whether a given integer is prime.
Prerequisites
None
Module Length
Three sessions of 2 hours each.
Topics
- Various ways to create vectors in R
- Functions in R
- Maggritr operator - ‘%>%’
- Simple data visualization
- Subsetting on vectors
- Logical operations
- ‘apply’ functions
- Factors and summary statistics
Sessions
First Session
-
Various ways to create vectors in R
-
Operations on vectors
-
Maggritr
-
Sum operation
-
Visualization of data - scatterplot
Second Session
-
Logical vectors and operations
-
Subsetting of vectors
-
Creating new functions
-
‘apply’ function in R
Third Session
-
Vectors with random numbers
-
Visualization of data - histogram
-
Character vectors and biological data
-
Factors
Details
Biology changed substantially due to the advent of high-throughput sequencing (NGS) technologies. These technologies give researchers immense power to ask novel questions at the genomic scale. However, analyzing NGS data is not trivial, because it requires skills in programming and statistics.
R has become an essential tool for biological data analysis. Many people find R difficult to learn, because they approach it in the wrong way. Unlike Python or Java, R is both a language and an analysis tool built on top of the language. New users often start to use the analysis tool without a good grasp of the language, and they suffer in the long term. Even some experienced programmers find R non-trivial, because it is very different from other languages.
With a proper foundation, R is actually quite easy to learn. This module introduces you to the core concepts of the R language to help you build a solid foundation. We will show you various ways to create vectors, important R functions, ways to write your own function, logical operations on vectors, ‘apply’ function and subsetting methods on vectors. Also, we will show simple statistical functions and plotting methods used frequently by all analysis procedures.
Class Style
The classes will be conducted through online interactive chat session.
Testimonials
You can read the testimonials for our summer R classes here.
Cost
$125 (20% discount for premium members).
Register
Please sign up for the module at the following link. At present no payment is necessary to register.