This module shows you how to manipulate tables of data using R. On completion, you will be able to replace Excel or other spreadsheet programs with R and gain efficiency in your data analysis. You will also learn about ‘data frame’, a data structure used extensively in almost all R applications, and ‘dplyr’, a powerful and versatile R library.
Questions One Can Answer after This Module
You have an Excel file with RNAseq data from four samples along with gene IDs and another file with gene annotation details. i) Find the expression level of gene Hox1a by merging two tables. ii) Find the expression levels of all Hox genes. iii) Find average expression level for all transcription factors. iv) Compare the expressions of transcription factors and signaling genes by plotting as histograms.
Pokemon are comic characters. We will use their example to understand table processing, but the same methods can be used for any biological dataset as well. Load pokemon.csv file from the given link and - i) Find the heaviest and lightest pokemons. ii) Compare the weights and heights of all pokemons. Do you see any pattern? (Hint. The pattern is clear, if you take log of one axis). iii) Combine Pokemon types from the second table, and find the average weights of Pokemons of different types.
Load the csv file containing information for all bacterial genome available from NCBI. i) Which bacteria has the largest genome? ii) Compare GC contents of proteo-bacteria, firmicutes and actinobacteria. Which group has the highest GC?
Load the csv file containing the results of all international soccer matches since the 1850s. i) Which are the worst losses of the Brazil team? ii) Do teams win more frequently when the play at home than away? Decide using the history of all of Brazil’s matches.
Three sessions of 2 hours each, or two sessions of 3 hours each.
- From vector to data frame in R,
- Dplyr library commands - select, filter, mutate, arrange,
- Reading and writing data from Excel and csv files,
- Joining multiple tables using dplyr.
Introduction to data-frame.
Reading and writing external csv/excel files.
Learning about in-built datasets in R.
Other data types - list, matrix, factors.
Linear algebra in R.
Introduction to dplyr library.
dplyr functions - select, filter, mutate, arrange.
Practice data analysis using dplyr.
Joining multiple tables using dplyr and extracting information.
Practice data analysis on multiple tables using dplyr.
Researchers from biology background often get introduced to R through Bioconductor, while they continue to use Excel for common spreadsheet-type operations. This creates a disconnect and also makes learning R difficult.
R is actually far more powerful than Excel or other spreadsheet programs for manipulating data tables. Especially, when one needs to combine information from multiple spreadsheets to perform an analysis, using R saves significant amount of time. As an example, researchers in genomics often have gene expression data in one table and the annotations in a different table, and they need to combine them. Such tasks are straightforward with the dplyr library in R.
This module shows you how to manipulate tables of data using R. On completion, you will be able to replace Excel or other spreadsheet programs with R and gain efficiency in your data analysis. You will load data from existing “.csv” or “.xls” files, complete the analysis and then save the results in another spreadsheet. You will also learn about ‘data frame’, a data structure used extensively in almost all R libraries, and ‘dplyr’, a powerful and versatile R library.
$99 for premium members, $125 for other.
The classes will be conducted through online interactive chat session.
You can read the testimonials for our summer R classes here.
Please sign up for the module at the following link. At present no payment is necessary to register.