Statistical Analysis in R

Overview

This module helps you learn statistical concepts and perform statistical analysis of biological data using R. You will create random datasets, learn about statistical distributions, linear regression, clustering analysis, t-test and principle component analysis (PCA).

Questions You Will Be Able to Answer after This Module

  1. Simulate a tossing experiment in R with three coins and find the probability of getting all heads. Does the result match what you expect from theory?

  2. Create an AT-rich genome using R (A=40%, T=40%, G=10%, C=10%, genome size=5Mb). How many random proteins of length>100 amino acids do yo find?

  3. In the given excel file, X represents time and Y represents growth of a microbial colony. Use R to fit a straight line between X and Y.

  4. The given csv file contains measurement data from two sets of patients, among which one group is administered a new medicine. Use R to find whether the new medicine had any effect on the measurement profile.

Prerequisites

Basic familiarity with the concepts of vectors and data frames is needed. This is covered in R for Biology - Core Concepts.

Module Length

Three sessions of 2 hours each.

Topics

  1. Generate and plot random data,
  2. Statistical distributions (binomial, Poisson, normal/Gaussian),
  3. Preliminary statistical analysis,
  4. Linear regression,
  5. T test,
  6. ANOVA,
  7. Clustering.
  8. Visualization,
  9. Principle component analysis.

Sessions

First Session

image (Guassian distribution)

Mean, median, mode, etc.

random vector, coin toss, throwing dice, random DNA sequence

distributions

plots

Second Session

image (Linear regression)

linear regression

clustering

T-test

PCA

Third and Fourth Sessions

image (clustering and heatmap)

Three hands on exercises -

  1. Nucleotide distribution

  2. New medicine on cancer patients.

  3. Clustering.

Details

R has become an essential tool for statistical analysis of biological data, because it includes numerous functions for performing simple and complex analysis tasks. Some of those functions come from latest research made freely available by the researchers.

R is also an excellent tool for learning statistics, because you can create random data and visualize the statistical behavior. Visual learning is more intuitive than the textbook approach of algebraic derivation of the same results.

Class Style

The classes will be conducted through online interactive chat session.

Testimonials

You can read the testimonials for our summer R classes here.

Cost

$195 (20% discount for premium members).

Register

Please sign up for the module at the following link. At present no payment is necessary to register.

Sign Up