11 Lab 2: Quantification and differential gene expression by RNA-seq

Aims

Estimate transcript abundance from RNA-seq samples
Import count data in R
Perform differential expression analysis with DESeq2

Datasets

RNA-seq data was published in Nuño-Cabanes et al., Scientific Data 2020

Control RNA-seq @ 30 C:
- SRR9929263
- SRR9929264
- SRR9929273
- SRR9929282
Heat shock RNA-seq @ 39 C, 20 min: SRR2045248 & SRR2045249
- SRR9929271
- SRR9929265
- SRR9929280
- SRR9929274

11.1 Count mapped fragments over a set of gene annotations

Check the files in the Share/mapping directory.

Answer

bash

ls -l Share/mapping/RNAseq*bam

Locate the gene annotation file provided for the yeast genome.
How many annotations are there in the file?
What type of annotations are there?
How can you have more start codons than stop codons?

Answer

bash

wc -l Share/genome/R64-1-1.gtf 
head Share/genome/R64-1-1.gtf
cut -f3 Share/genome/R64-1-1.gtf | sort | uniq -c

Check featureCounts help to understand the different options available.
Which level of annotations should you use to count the number of mapped fragments over?
What are the featureCounts options specific to paired-end reads? And to stranded RNA-seq?
Use featureCounts to count the number of mapped fragments over the set of gene annotations.

Answer

bash

featureCounts --help
featureCounts \
    -a Share/genome/R64-1-1.gtf \
    -o RNAseq_counts.txt \
    -t transcript \
    -g gene_name \
    -s 2 \
    -p \
    --countReadPairs \
    -P -B -D 10000 -C \
    -T 2 \
    Share/mapping/RNAseq*R64-1-1.bam

11.2 Import counts into R

Create a tibble containing the following information for the four RNA-seq samples:
- Path to local bam file
- The biological sample it corresponds to (e.g. WT, HS)
- The biological replicate it corresponds to (e.g. rep1, rep2, …)

Answer

library(tidyverse)
samples <- tibble(
    bam = list.files('Share/mapping', pattern = 'RNAseq.*_R64-1-1.bam$', full.names = TRUE), 
    sample = basename(bam), 
) |> separate(sample, sep = '_', into = c(NA, 'condition', 'rep', NA)) |> 
    mutate(sample = paste(condition, rep, sep = '_'))

Import gene annotations for yeast.
Filter it out to only keep transcripts

Answer

library(rtracklayer)
genes <- import('Share/genome/R64-1-1.gtf')
genes <- genes[genes$type == 'transcript']
names(genes) <- genes$gene_name

Import the counts generated by featureCounts into R.

Answer

cnts <- read_tsv('RNAseq_counts.txt', col_names = TRUE, comment = "#")
cnts <- cnts |> 
    select(-c(2:6)) |> 
    column_to_rownames('Geneid') |> 
    as.matrix() 
colnames(cnts) <- samples$sample

Wrap the counts, the sample metadata and the gene annotations into a SummarizedExperiment object.

Answer

library(SummarizedExperiment)
se <- SummarizedExperiment(
    assays = list(counts = cnts), 
    rowRanges = genes, 
    colData = samples
)

11.3 Differential expression analysis

Read DESeq documentation (?DESeq). What type of object does it require? How can you create one?
Create a DESeqDataSet object from the RNA counts. Choose your design formula appropriately.
Run DESeq workflow

Answer

library(DESeq2)
dds <- DESeqDataSet(se, design = ~ condition)
dds <- DESeq(dds)

Check the documentation for the results function from DESeq2 package.
Extract the results of differential expression analysis, for the appropriate contrast.
Recover genes over-expressed in heat-shock vs. control growth (fold-change >= 2, p-value <= 0.01)

Answer

res <- results(dds, contrast = c('condition', 'HS', 'WT'))
upreg_genes <- res |> 
    as_tibble(rownames = 'gene') |> 
    filter(log2FoldChange > 1, padj <= 0.001) |> 
    pull(gene) |> writeLines()