17 Lab 4 - RNA-seq downstream analysis
Aims
- Estimate transcript abundance from RNA-seq samples (
bamfiles) - Perform differential expression analysis with
DESeq2 - Identify gene ontology pathways over-represented in genes up-regulated upon heat shock treatment
Datasets
RNA-seq data was published in Nuño-Cabanes et al., Scientific Data 2020
Control RNA-seq @ 30 C:
SRR9929263SRR9929264SRR9929273SRR9929282
Heat shock RNA-seq @ 39 C, 20 min:
SRR2045248&SRR2045249SRR9929271SRR9929265SRR9929280SRR9929274
17.1 Counting mapped fragments over a set of gene annotations
Create a tibble containing the following information for the four RNA-seq samples (bam files provided in the shared Google Drive):
- Path to local
bamfile - The biological sample it corresponds to (e.g.
WT,HS) - The biological replicate it corresponds to (e.g.
rep1,rep2, …)
- Path to local
- Import gene annotations downloaded on Monday for yeast.
- Filter it out to only keep transcripts
- Check the
summarizeOverlaps()function from theGenomicAlignmentspackage. - Run it to compute transcript abundance for all transcripts across the four replicates of the two samples.
- Specify
BPPARAMargument to perform RNA counting over multiple CPUs.
17.2 Differential expression analysis
- Read
DESeqdocumentation (?DESeq). What type of object does it require? How can you create one? - Create a
DESeqDataSetobject from the RNA counts. Choose yourdesignformula appropriately. - Run
DESeqworkflow
- Check the documentation for the
resultsfunction fromDESeq2package. - Extract the results of differential expression analysis, for the appropriate
contrast. - Recover genes over-expressed in heat-shock vs. control growth (fold-change >= 2, p-value <= 0.01)
17.3 Gene ontology over-enrichment analysis
- Perform gene ontology enrichment analysis using the
gprofiler2package. - From the results, recover GO terms from the
KEGGdatabase and extract the most enriched terms - Comment