Move Seamlessly from Excel to R
Overview
This module is written for those researchers in biology, who are interested in migrating from Excel to R for their data analysis. Many biologists are accustomed to spreadsheet-based programs for exploring their research data. They perform operations like comparing multiple columns of data, plotting one or more columns, sorting columns, creating new columns based on the existing ones and so on. It is possible to do all those tasks in R and a lot more.
Spreadsheet-based programs are visual and are well suited for small tables of data. This visual approach gets cumbersome, when the table size increases to hundreds or thousands of rows as is common in modern high-throughput biology. Excel users move mouse up and down to scan the entire table, but this method is highly inconvenient for large files. Moreover, operations like searching for patterns become time-consuming. In addition, many analysis techniques alter the original table, which the users may like to preserve. To overcome the last problem, the users tend to copy the same data into multiple spreadsheets making files too big and slow to load.
Data split into multiple tables is another source of major inconvenience for the spreadsheet programs. This is an unavoidable situation in data analysis these days, because data often come from different sources. For example, the table of gene annotations may come from the annotation database, whereas the table of gene expression may come directly from the experiment. Merging and extracting information from multiple tables is not straightforward in spreadsheet programs.
Skills You Acquire
- You will learn to use the dplyr library in R.
Class Style
These modules are video-assisted. The recorded videos clearly explain the technical materials. In addition, the modules include text with all codes, data and explanations.
Prerequisites
R basics.
Lessons
-
Lesson 1
Welcome Status: Incomplete
-
Lesson 2
R - Basics Status: Incomplete
-
Lesson 3
Processing Large Files Status: Incomplete
-
Lesson 4
Vectors Status: Incomplete
-
Lesson 5
Filtering Rows Status: Incomplete
-
Lesson 6
Arrange and Select Status: Incomplete
-
Lesson 7
Strings Status: Incomplete
-
Lesson 8
Mutate Status: Incomplete
-
Lesson 9
Join Status: Incomplete
-
Lesson 10
Join Detail Status: Incomplete
-
Lesson 11
Example - RNAseq Data Status: Incomplete
-
Lesson 12
Example - BLAST Output Status: Incomplete
-
Lesson 13
Writing Scripts Status: Incomplete
-
Lesson 14
Pivot Table Status: Incomplete
-
Lesson 15
Processing Dates Status: Incomplete