# A Minimalist R Cheatsheet for NGS Biology

While teaching R to biologists, a common complaint I hear is that “there are too many functions”. Therefore, I try to take a minimalist approach and not introduce students new functions unless necessary. Using the existing functions has two benefits - (i) it keeps the brain free from too many function names, (ii) it allows students to get more practice on the existing ones.

Here is a minimalist cheatsheet for NGS data analysis. It will make most sense, if you join our remotely taught bioinformatics classes. I am working on the file actively with the goal of removing materials, if possible. Also, for statistical analysis and plots, this file includes only the simple functions. I plan to post two separate minimalist cheatsheets on those topics later.

## 1. Installing Packages

R Packages can be installed in (at least) three ways.

Package Source Method
CRAN install.packages("tidyverse")
Bioconductor install.packages("BiocManager"); biocLite("Biostrings")
github install.packages("devtools"); library(devtools); install_github("homologus/rnaseq.work")

### Load Installed Packages

Type “library” command to load the installed package for use. You install a package once, but run “library” every time you open a new R window.

library("tidyverse")

## 2. R as a Calculator

### R as an Ordinary Calculator

Name Action Example
“+” Add 5+2
“-” Subtract 5-2
“*“ Multiply 5*2
“/” Divide 5/2
“^” Power 5^2
“%%” Quotient in integer division 5%%2
“%/%” Remainder in integer dividion 5%/%2
“pi” Constant “pi” 2*pi
“exp(1)” Constant “e” exp(pi)

### R as a Scientific Calculator

Now that you have R installed, let us use the software for data analysis. This section covers simple mathematical operations so that R can replace your scientific calculator.

Name Action Example
abs Absolute number x=-7; abs(x)
sqrt Square root of a number sqrt(2)
exp Exponential function exp(2.7)
log Natural logarithm log(2.7)
log10 Logarithm with base 10 log10(2.7)

## 3. Piping

Piping operator comes from “tidyverse”. Make sure you load the package.

Name Action Example
%>% Rewrites function without parenthesis 2 %>% sqrt %>% log

The following two commands are equivalent.

sqrt(2)
## [1] 1.414214
2 %>% sqrt
## [1] 1.414214

Multistep piping -

22 %>% exp %>% log
## [1] 22

You can read it as ‘take 22 and do exponential and do log’. These two steps, one after another, should give you back the original number.

## 4. Vectors

R programming language is built on top of vectors. Here we show five ways to create them.

Different method for creating vectors
Method Description
Function ‘c’ Vector with given values
Function ‘:’ Vector with a range of numbers
Function ‘seq’ Vector with equal spacing
Function ‘rep’ Vector with identical numbers
Function ‘sample’ Random vector

### (i) The function ‘c’

Use the function ‘c’ to create an arbitrary vector. After a vector is created, it can be accessed entirely or by positions. Remember that the index of the first position is 1, not zero like other programming languages (C, Java, Python).

x=c(22,33,44,54,1,2,97)
x[1]
## [1] 22

#### Character

c("John", "Juan", "Jason")
## [1] "John"  "Juan"  "Jason"

#### Logical Vector

c(TRUE, FALSE, TRUE)
## [1]  TRUE FALSE  TRUE

### (ii) The function ‘:’

Increasing integers -

2:10
## [1]  2  3  4  5  6  7  8  9 10

Decreasing integers -

7:3
## [1] 7 6 5 4 3

### (iii) The function ‘seq’

seq(3,10,2)
## [1] 3 5 7 9

### (iv) The function ‘rep’

rep(5,10)
##  [1] 5 5 5 5 5 5 5 5 5 5

### (v) The function ‘sample’

sample(c('H','T'),10,replace=TRUE)
##  [1] "T" "T" "H" "H" "T" "T" "H" "T" "H" "H"

There are additional functions to create random vectors with binomial, Gaussian (bell curve) and other distributions.

### Concatenating Vectors

You can also combine vectors generated by the above methods. The function ‘c’ merges vectors of different types.

v1=1:10
v2=c(1,2,7,11)

c(v1,v2)
##  [1]  1  2  3  4  5  6  7  8  9 10  1  2  7 11

## 5. Functions Operating on Vectors

A number of functions do not exist on scientific calculators, because they apply only on vectors. Sum of a vector is a good example.

Name Action
head First few elements of a vector
tail Last few elements of a vector
sum Sum of elements of a number vector
mean Mean of elements of a number vector
median Median
sd Standard Deviation
var Variance
summary Summary statistics
table Counts the elements of a vector
vec = c(1, 22,33,44,54,1,2,97, 22)
vec %>% head
## [1]  1 22 33 44 54  1
vec %>% head(1)
## [1] 1
vec %>% tail(3)
## [1]  2 97 22
vec %>% sum
## [1] 276
vec %>% mean
## [1] 30.66667
vec %>% median
## [1] 22
vec %>% sd
## [1] 31.34486
vec %>% var
## [1] 982.5
vec %>% summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    1.00    2.00   22.00   30.67   44.00   97.00
vec %>% table
## .
##  1  2 22 33 44 54 97
##  2  1  2  1  1  1  1

## 6. Data Frames (Spreadsheets)

The following functions are described in this section -

Name Action Example
data.frame Combines vectors into matrix df=data.frame(age,name,weight)
colnames Names of columns of data frame colnames(df)
rownames Names of rows of data frame rownames(df)
dim Dimension of data frame dim(df)
head First few lines of data frame head(df)
tail Last few lines of data frame tail(df)
data Check which in-built data sets are available data()
data Load an in-built data set data("darwin")

Spreadsheets are called ‘data frames’ in R. A data frame in R holds a table of data, where the data in different columns can be of different types. A data frame is not the same as matrix in mathematics.

Usually you read your data from Excel or csv files into data frames. Here we manually create one to show the commands.

Each column of R data frame is a vector. So, a data frame can be created by combining a group of equal sized vectors with the command ‘data.frame’.

age=c(20,21,20,17,19)
name=c("Alex", "Ada", "Chen", "Kim", "John")
weight=c(80.2, 70.1, 92.3, 77.7, 68.2)

df=data.frame(name,weight,age)
df
##   name weight age
## 1 Alex   80.2  20
## 2  Ada   70.1  21
## 3 Chen   92.3  20
## 4  Kim   77.7  17
## 5 John   68.2  19

The size of the data frame can be obtained by using the ‘dim()’ command. The function ‘colnames()’ gives the names of the columns, and the function ‘head()’ provides a snap shot.

df %>% dim
## [1] 5 3
df %>% colnames
## [1] "name"   "weight" "age"
df %>% head
##   name weight age
## 1 Alex   80.2  20
## 2  Ada   70.1  21
## 3 Chen   92.3  20
## 4  Kim   77.7  17
## 5 John   68.2  19

It is possible to access the individual columns of a data frame in a number of ways.

df$weight ## [1] 80.2 70.1 92.3 77.7 68.2 df[2] ## weight ## 1 80.2 ## 2 70.1 ## 3 92.3 ## 4 77.7 ## 5 68.2 df[,2] ## [1] 80.2 70.1 92.3 77.7 68.2 A row can be accessed using the following command. df[3,] ## name weight age ## 3 Chen 92.3 20 It is also possible to access multiple rows or commands. df[2:4,] ## name weight age ## 2 Ada 70.1 21 ## 3 Chen 92.3 20 ## 4 Kim 77.7 17 ## 7. Data Visualization Core R comes with a number of plotting functions, but here we cover only two - hist (to draw histogram) and plot (to draw scatterplots). For more extensive plotting tasks, we recommend the readers to learn and use the powerful ggplot package. Name Action hist Draw histogram plot Draw scatterplot ### Using hist() x=c(rep(1,10),rep(2,10),rep(3,10)) hist(x) ### Using plot() The plot() function can be used to draw scatter-plots. I takes two equal-sized vectors as input and draws all corresponding points from the vector as (x,y). v1=c(1,3,8,9,12) v2=c(3,4,5,1,2) plot(v1,v2) ## 8. Dplyr Library This packages is used to find information from data frames. Functions discussed here - Name Action select selects a subset of columns mutate creates new columns based on some rule arrange sorts columns filter picks rows based on some rule ### Create Mock Spreadsheet with Data Our simple data set has experimental data for ten genes from two sets of experiments on heart, kidney and brain. The columns in the spreadsheet are gene-ID, annotation, heart1, kidney1, brain1, heart2, kidney2, brain2. The spreadsheet has ten rows, each with information on one gene. Let us create a spreadsheet for that experiment with totally nonsensical data. Let us create one - gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5', 'gene6', 'gene7', 'gene8', 'gene9', 'gene10') annot=c('transcription', 'metabolism', 'translation', 'transcription', 'cell-cycle', 'cell-cycle', 'translation', 'receptor', 'transcription', 'metabolism') heart1=c(10,3,4,5,8,9,1,2,4,5) kidney1=c(3,4,5,8,9,1,2,4,5,10) brain1=c(4,5,8,9,1,2,4,5,3,2) heart2=c(2,5,1,9,1,2,4,5,1,12) kidney2=c(10,3,4,5,1,2,4,5,4,9) brain2=c(8,2,7,2,1,2,4,5,3,2) expt=data.frame(gene, annot, heart1,kidney1,brain1,heart2, kidney2, brain2) expt ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene1 transcription 10 3 4 2 10 8 ## 2 gene2 metabolism 3 4 5 5 3 2 ## 3 gene3 translation 4 5 8 1 4 7 ## 4 gene4 transcription 5 8 9 9 5 2 ## 5 gene5 cell-cycle 8 9 1 1 1 1 ## 6 gene6 cell-cycle 9 1 2 2 2 2 ## 7 gene7 translation 1 2 4 4 4 4 ## 8 gene8 receptor 2 4 5 5 5 5 ## 9 gene9 transcription 4 5 3 1 4 3 ## 10 gene10 metabolism 5 10 2 12 9 2 ### Manipulating Columns (‘select’ and ‘mutate’) You learn two functions - select() and mutate(). 1. select() creates a second spreadsheet, where the columns of original are removed/rearranged. 2. mutate() creates a new spreadsheet, where an extra column is added by combining existing columns. Task 1. Create a New Spreadsheet by Rearranging Columns expt %>% select(gene, brain1, brain2, heart1, heart2, kidney1, kidney2) ## gene brain1 brain2 heart1 heart2 kidney1 kidney2 ## 1 gene1 4 8 10 2 3 10 ## 2 gene2 5 2 3 5 4 3 ## 3 gene3 8 7 4 1 5 4 ## 4 gene4 9 2 5 9 8 5 ## 5 gene5 1 1 8 1 9 1 ## 6 gene6 2 2 9 2 1 2 ## 7 gene7 4 4 1 4 2 4 ## 8 gene8 5 5 2 5 4 5 ## 9 gene9 3 3 4 1 5 4 ## 10 gene10 2 2 5 12 10 9 Note that the above commands displays the new spreadsheet on the screen, but does not store it. To save it, create a new variable. expt2=expt %>% select(gene, brain1, brain2, heart1, heart2, kidney1, kidney2) Task 2. Create a New Spreadsheet by Choosing a Subset of Columns expt %>% select(gene, brain1, heart1, kidney1) ## gene brain1 heart1 kidney1 ## 1 gene1 4 10 3 ## 2 gene2 5 3 4 ## 3 gene3 8 4 5 ## 4 gene4 9 5 8 ## 5 gene5 1 8 9 ## 6 gene6 2 9 1 ## 7 gene7 4 1 2 ## 8 gene8 5 2 4 ## 9 gene9 3 4 5 ## 10 gene10 2 5 10 expt %>% select(gene, brain2, heart2, kidney2) ## gene brain2 heart2 kidney2 ## 1 gene1 8 2 10 ## 2 gene2 2 5 3 ## 3 gene3 7 1 4 ## 4 gene4 2 9 5 ## 5 gene5 1 1 1 ## 6 gene6 2 2 2 ## 7 gene7 4 4 4 ## 8 gene8 5 5 5 ## 9 gene9 3 1 4 ## 10 gene10 2 12 9 **Task 3. ‘begins_with’, ’ends_with“** expt %>% select(gene, starts_with("heart")) ## gene heart1 heart2 ## 1 gene1 10 2 ## 2 gene2 3 5 ## 3 gene3 4 1 ## 4 gene4 5 9 ## 5 gene5 8 1 ## 6 gene6 9 2 ## 7 gene7 1 4 ## 8 gene8 2 5 ## 9 gene9 4 1 ## 10 gene10 5 12 expt %>% select(gene, ends_with("2")) ## gene heart2 kidney2 brain2 ## 1 gene1 2 10 8 ## 2 gene2 5 3 2 ## 3 gene3 1 4 7 ## 4 gene4 9 5 2 ## 5 gene5 1 1 1 ## 6 gene6 2 2 2 ## 7 gene7 4 4 4 ## 8 gene8 5 5 5 ## 9 gene9 1 4 3 ## 10 gene10 12 9 2 Task 4. Replace a Column by its Half expt %>% mutate(brain1=brain1/2) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene1 transcription 10 3 2.0 2 10 8 ## 2 gene2 metabolism 3 4 2.5 5 3 2 ## 3 gene3 translation 4 5 4.0 1 4 7 ## 4 gene4 transcription 5 8 4.5 9 5 2 ## 5 gene5 cell-cycle 8 9 0.5 1 1 1 ## 6 gene6 cell-cycle 9 1 1.0 2 2 2 ## 7 gene7 translation 1 2 2.0 4 4 4 ## 8 gene8 receptor 2 4 2.5 5 5 5 ## 9 gene9 transcription 4 5 1.5 1 4 3 ## 10 gene10 metabolism 5 10 1.0 12 9 2 Task 5. Create New Columns by Combining Multiple Existing Columns expt %>% mutate(sum1=brain1+kidney1+heart1, sum2=brain2+kidney2+heart2) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 sum1 ## 1 gene1 transcription 10 3 4 2 10 8 17 ## 2 gene2 metabolism 3 4 5 5 3 2 12 ## 3 gene3 translation 4 5 8 1 4 7 17 ## 4 gene4 transcription 5 8 9 9 5 2 22 ## 5 gene5 cell-cycle 8 9 1 1 1 1 18 ## 6 gene6 cell-cycle 9 1 2 2 2 2 12 ## 7 gene7 translation 1 2 4 4 4 4 7 ## 8 gene8 receptor 2 4 5 5 5 5 11 ## 9 gene9 transcription 4 5 3 1 4 3 12 ## 10 gene10 metabolism 5 10 2 12 9 2 17 ## sum2 ## 1 20 ## 2 10 ## 3 12 ## 4 16 ## 5 3 ## 6 6 ## 7 12 ## 8 15 ## 9 8 ## 10 23 ### Sort Data in Column (‘arrange’) ** Increasing ** expt %>% arrange(brain1) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene5 cell-cycle 8 9 1 1 1 1 ## 2 gene6 cell-cycle 9 1 2 2 2 2 ## 3 gene10 metabolism 5 10 2 12 9 2 ## 4 gene9 transcription 4 5 3 1 4 3 ## 5 gene1 transcription 10 3 4 2 10 8 ## 6 gene7 translation 1 2 4 4 4 4 ## 7 gene2 metabolism 3 4 5 5 3 2 ## 8 gene8 receptor 2 4 5 5 5 5 ## 9 gene3 translation 4 5 8 1 4 7 ## 10 gene4 transcription 5 8 9 9 5 2 ** Decreasing ** expt %>% arrange(-brain1) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene4 transcription 5 8 9 9 5 2 ## 2 gene3 translation 4 5 8 1 4 7 ## 3 gene2 metabolism 3 4 5 5 3 2 ## 4 gene8 receptor 2 4 5 5 5 5 ## 5 gene1 transcription 10 3 4 2 10 8 ## 6 gene7 translation 1 2 4 4 4 4 ## 7 gene9 transcription 4 5 3 1 4 3 ## 8 gene6 cell-cycle 9 1 2 2 2 2 ## 9 gene10 metabolism 5 10 2 12 9 2 ## 10 gene5 cell-cycle 8 9 1 1 1 1 ### Choose Subset of Rows (‘filter’) Task 1. Extract Rows above a Cutoff expt %>% filter(brain1>5) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene3 translation 4 5 8 1 4 7 ## 2 gene4 transcription 5 8 9 9 5 2 Task 2. Extract Rows with Certain Annotation expt %>% filter(annot=="cell-cycle") ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene5 cell-cycle 8 9 1 1 1 1 ## 2 gene6 cell-cycle 9 1 2 2 2 2 Task 3. Extract Rows Based on a Cutoff Involving Multiple Columns expt %>% filter(brain1+brain2>4) ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 ## 1 gene1 transcription 10 3 4 2 10 8 ## 2 gene2 metabolism 3 4 5 5 3 2 ## 3 gene3 translation 4 5 8 1 4 7 ## 4 gene4 transcription 5 8 9 9 5 2 ## 5 gene7 translation 1 2 4 4 4 4 ## 6 gene8 receptor 2 4 5 5 5 5 ## 7 gene9 transcription 4 5 3 1 4 3 ### Combine Multiple Tasks using ‘%>%’ ** Example 1 ** expt %>% filter(brain1>kidney1/2) %>% filter(brain2>kidney2/2) %>% select(gene,annot) ## gene annot ## 1 gene1 transcription ## 2 gene2 metabolism ## 3 gene3 translation ## 4 gene6 cell-cycle ## 5 gene7 translation ## 6 gene8 receptor ## 7 gene9 transcription ** Example 2 ** expt %>% filter(brain1>kidney1/2) %>% filter(brain2>kidney2/2) %>% filter(annot=="transcription") %>% select(gene) ## gene ## 1 gene1 ## 2 gene9 ### Joining Data from Tables Earlier we created the data frame ‘expt’ by combining gene, annot and experimental data from six tissues. In the data sets, we had 10 genes, their annotations and their expression levels. gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5', 'gene6', 'gene7', 'gene8', 'gene9', 'gene10') annot=c('transcription', 'metabolism', 'translation', 'transcription', 'cell-cycle', 'cell-cycle', 'translation', 'receptor', 'transcription', 'metabolism') heart1=c(10,3,4,5,8,9,1,2,4,5) kidney1=c(3,4,5,8,9,1,2,4,5,10) brain1=c(4,5,8,9,1,2,4,5,3,2) heart2=c(2,5,1,9,1,2,4,5,1,12) kidney2=c(10,3,4,5,1,2,4,5,4,9) brain2=c(8,2,7,2,1,2,4,5,3,2) expt=data.frame(gene, annot, heart1,kidney1,brain1,heart2, kidney2, brain2) Let’s say someone sends you another Excel spreadsheet with data from liver. What will you do? We will first create that new data frame with more mock data. gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5', 'gene6', 'gene7', 'gene8', 'gene9', 'gene10') liver=c(7,10,1,4,5,7,8,9,1,11) expt2 = data.frame(gene,liver) We want a combined spreadsheet from expt and expt2. The dplyr package has a function named ‘inner_join’ to accomplish that task. inner_join(expt,expt2) ## Joining, by = "gene" ## gene annot heart1 kidney1 brain1 heart2 kidney2 brain2 liver ## 1 gene1 transcription 10 3 4 2 10 8 7 ## 2 gene2 metabolism 3 4 5 5 3 2 10 ## 3 gene3 translation 4 5 8 1 4 7 1 ## 4 gene4 transcription 5 8 9 9 5 2 4 ## 5 gene5 cell-cycle 8 9 1 1 1 1 5 ## 6 gene6 cell-cycle 9 1 2 2 2 2 7 ## 7 gene7 translation 1 2 4 4 4 4 8 ## 8 gene8 receptor 2 4 5 5 5 5 9 ## 9 gene9 transcription 4 5 3 1 4 3 1 ## 10 gene10 metabolism 5 10 2 12 9 2 11 combined=inner_join(expt,expt2) ## Joining, by = "gene" The first command displays the combined data frame on the screen, whereas the second command saves it in the variable named ‘combined’. You can then use ‘select’, ‘filter’ etc on the combined data frame to find patterns. Please note - 1. Even if two data frames have rows in different orders, the function takes care of ordering them properly before joining data. 2. If one gene is missing from one data frame or another, ‘inner_join’ will remove all information for the gene from the combined data frame. 3. If you like to keep information on the missing genes, dplyr has a number of other functions. For example, ‘full_join’ will keep all genes and fill the missing places with ‘NA’. Check the following link for other options. ## 9. Reading Data from External Files Functions discussed here - Name Action read_excel Read excel file read_csv Read csv file read_tsv Read tsv file write_csv Write in csv format getwd Get current working directory setwd Change working directory Now that we mastered doing simple calculations in R, let us take a step forward to start analyzing real data. The first step is to import tibble from the external files, stored typically in text (csv) or Excel (xls) format. ### Find Current Working Directory R works in a directory or folder of your operating system so that when you save a file, it gets saved in that folder. You can see where R will save your files, try ‘getwd()’. You can also change the working directory by using ‘setwd()’. getwd() # [1] "C:/Users/Student/Documents" setwd("C:/Users/Student/Documents/R") ### Reading Excel Files R provides at least two ways to read data from an Excel file. The first method uses an external library ‘readxl’. In order to use functions from the library, it needs to be loaded in R by using the command ‘library(readxl)’. library(readxl) After that, the function ‘read_excel’ is used to read sheet 1 of the Excel file into a data frame (e.g. ‘mydata’). The concept of is explained below. mydata <- read_excel("/full/path/to/excel/file/file.xls", sheet=1) ### Reading csv Files The other approach is to use Excel ‘save as’ function to save the spreadsheet in csv format. Then the csv file can be directly read from R using the ‘read_csv’ function. This function is already in tidyverse and does not require any additional library. mydata <- read_csv("/full/path/to/csv/file/file.csv") ### Writing Data Frame or Tibble in a File The command ‘write_csv’ will save a data frame or tibble in a file in csv format. mydata %>% write_csv("file.csv") ## 10. Bioconductor for DNA Data library(Biostrings) dna = DNAString("TGCTAATCCTCT") dna %>% reverseComplement ## 12-letter "DNAString" instance ## seq: AGAGGATTAGCA dna %>% translate ## 4-letter "AAString" instance ## seq: C*SS dna %>% subseq(3,7) ## 5-letter "DNAString" instance ## seq: CTAAT pairwiseAlignment("ATCCCTTAAAAGGTTGGGT","ATCCCTAAAAGGTTGGTT") ## Global PairwiseAlignmentsSingleSubject (1 of 1) ## pattern: ATCCCTTAAAAGGTTGGGT ## subject: ATCCCT-AAAAGGTTGGTT ## score: 13.79057 ### Reading FASTA Files genome = readDNAStringSet("C:/Users/Manoj/Desktop/R/web-tables/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.toplevel.fa") #genome <- readDNAStringSet("C://Users/Manoj/Desktop/R/web-tables/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.toplevel.fa") genome %>% subseq(2038152,2038823) %>% reverseComplement %>% translate %>% toString ### Extracting Sequence genome <- DNAStringSet( list(chr1=DNAString("AATGGTCCGTG"), chr2=DNAString("TGGGTGGGTGG")) ) # gr <- GRanges("chr1", IRanges(start=3,end=5), strand="+") BSgenome::getSeq(genome, gr) # gr <- GRanges(c("chr1","chr2"), IRanges(start=c(1,3),end=c(4,5)), strand=c("+","-")) BSgenome::getSeq(genome, gr) ### Genomic Segment ir = IRanges(start=c(1,100), width=c(10,10)) ir #library(rtracklayer) #genome <- readDNAStringSet("C://Users/Manoj/Desktop/R/web-tables/e") #annot <- rtracklayer::import("C://Users/Manoj/Desktop/R/web-tables/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.39.chromosome.Chromosome.gff3") #seq <- BSgenome::getSeq(genome, annot[100,]) ## 11. Changing Data Format to Switch Between Bioconductor and Tidyverse gene = c('gene1', 'gene2', 'gene3', 'gene4', 'gene5', 'gene6', 'gene7', 'gene8', 'gene9', 'gene10') heart1=c(10,3,4,5,8,9,1,2,4,5) kidney1=c(3,4,5,8,9,1,2,4,5,10) brain1=c(4,5,8,9,1,2,4,5,3,2) heart2=c(2,5,1,9,1,2,4,5,1,12) kidney2=c(10,3,4,5,1,2,4,5,4,9) brain2=c(8,2,7,2,1,2,4,5,3,2) expt=data.frame(gene, heart1,kidney1,brain1,heart2, kidney2, brain2) ### Going from Tidyverse Style to Bioconductor Style Tidyverse likes the above style, whereas Bioconductor wants the gene names as the names of the columns. expt_bioc=expt %>% select(-gene) %>% as.matrix row.names(expt_bioc)=expt$gene
expt_bioc
##        heart1 kidney1 brain1 heart2 kidney2 brain2
## gene1      10       3      4      2      10      8
## gene2       3       4      5      5       3      2
## gene3       4       5      8      1       4      7
## gene4       5       8      9      9       5      2
## gene5       8       9      1      1       1      1
## gene6       9       1      2      2       2      2
## gene7       1       2      4      4       4      4
## gene8       2       4      5      5       5      5
## gene9       4       5      3      1       4      3
## gene10      5      10      2     12       9      2

### Going back from Bioconductor Style to Tidyverse Style

If you stick to the Bioconductor format and operate tidyverse functions, your gene names will disappear. Therefore, you need to get them as a column first.

expt=expt_bioc %>% as.data.frame %>% rownames_to_column("gene")

### Adding Row Number as Another Column

There are times you many also want to add the row number as a column. That task is simple, because you can apply â€œrownames_to_columnâ€ again.

expt=expt %>% rownames_to_column("id")