Statistical Analysis in R
Overview
This module helps you learn statistical concepts and perform statistical analysis of biological data using R. You will create random datasets, learn about statistical distributions, linear regression, clustering analysis, t-test and principle component analysis (PCA).
Questions You Will Be Able to Answer after This Module
-
Simulate a tossing experiment in R with three coins and find the probability of getting all heads. Does the result match what you expect from theory?
-
Create an AT-rich genome using R (A=40%, T=40%, G=10%, C=10%, genome size=5Mb). How many random proteins of length>100 amino acids do yo find?
-
In the given excel file, X represents time and Y represents growth of a microbial colony. Use R to fit a straight line between X and Y.
-
The given csv file contains measurement data from two sets of patients, among which one group is administered a new medicine. Use R to find whether the new medicine had any effect on the measurement profile.
Prerequisites
Basic familiarity with the concepts of vectors and data frames is needed. This is covered in R for Biology - Core Concepts.
Module Length
Three sessions of 2 hours each.
Topics
- Generate and plot random data,
- Statistical distributions (binomial, Poisson, normal/Gaussian),
- Preliminary statistical analysis,
- Linear regression,
- T test,
- ANOVA,
- Clustering.
- Visualization,
- Principle component analysis.
Sessions
First Session
image (Guassian distribution)
Mean, median, mode, etc.
random vector, coin toss, throwing dice, random DNA sequence
distributions
plots
Second Session
image (Linear regression)
linear regression
clustering
T-test
PCA
Third and Fourth Sessions
image (clustering and heatmap)
Three hands on exercises -
-
Nucleotide distribution
-
New medicine on cancer patients.
-
Clustering.
Details
R has become an essential tool for statistical analysis of biological data, because it includes numerous functions for performing simple and complex analysis tasks. Some of those functions come from latest research made freely available by the researchers.
R is also an excellent tool for learning statistics, because you can create random data and visualize the statistical behavior. Visual learning is more intuitive than the textbook approach of algebraic derivation of the same results.
Class Style
The classes will be conducted through online interactive chat session.
Testimonials
You can read the testimonials for our summer R classes here.
Cost
$195 (20% discount for premium members).
Register
Please sign up for the module at the following link. At present no payment is necessary to register.