Introduction to Multiomics Data Integration and Visualisation
Introduction to Multiomics Data Integration and Visualisation
Prerequisites
Repository folder:
Relevant reading to prepare yourself to the course
- Reinke, Aaron W, Raymond Mak, Emily R Troemel, and Eric J Ben. 2017. “In Vivo Mapping of Tissue- and Subcellular-Specific Proteomes in Caenorhabditis Elegans.” Science Advances 3 (5): 1–12.
- Jänes, Jürgen, Yan Dong, Michael Schoof, Jacques Serizay, Alex Appert, Chiara Cerrato, Carson Woodbury, et al. 2018. “Chromatin accessibility dynamics across C. elegans development and ageing.” eLife, October. https://doi.org/10.7554/eLife.37344.
- Bioconductor (Huber et al. 2015) and GenomicRanges (Marc Carlson and Morgan 2019) documentations
Required sowftware
- R (3.5.2)
- org.Hs.eg.db_3.7.0
- clusterProfiler_3.10.1
- htmlwidgets_1.3
- rtracklayer_1.42.2
- ggplot2_3.1.1
- DOSE_3.8.2
- tidyverse_1.2.1
- gprofiler2_0.1.3
- biomaRt_2.38.0
Dependencies (which should have been installed with the main required packages)
- S4Vectors_0.20.1
- stringr_1.4.0
- readr_1.3.1
- AnnotationDbi_1.44.0
- GenomeInfoDb_1.18.2
- BiocGenerics_0.28.0
- dplyr_0.8.0.1
- tidyr_0.8.3
- Biobase_2.42.0
- IRanges_2.16.0
- forcats_0.4.0
- purrr_0.3.2
- tibble_2.1.1
- Prerequisites
- Day 1: Introduction
- Day 2: Proteomics
- Day 3: ATAC-seq
- Day 4: ChIP-seq
- SessionInfo
- References
Day 1: Introduction
Research in aging
Aging is a time-dependent functional decline that affects most living organisms. It is characterized by a progressive loss of physiological integrity, leading to impaired function and increased vulnerability to death.
The accumulation of cellular damage is widely considered to be the general cause of aging (López-Otı́n et al. 2013), which explains why the studies from the last three decades have mostly focused on the cellular aspects of aging.
Currently, the aging hallmarks are:
- Genomic instability;
- Telomere attrition;
- Epigenetic alterations;
- Loss of proteostasis;
- Deregulated nutrient sensing;
- Mitochondrial dysfunction;
- Cellular senescence;
- Stem cell exhaustion;
- Altered intercellular communication.
However, aging should not only be seen as a cellular disfunction but as an organismal process. All the cells within an organism are undergoing aging, and not all of them are aging simultaneously. Overall, different tissues are aging differently (Son et al. 2019). Thus, aging is a decline affecting not only cellular components but also organismal biological processes in general.
Aging and the worm
Aging research was inaugurated more than 30 years ago, following the isolation of the first long-lived strains in C. elegans (Johnson 2013). In the past three decades, the nematode played a fundamental role in aging research. It is a powerful model organism when it comes to studying aging:
- It has a short life cycle (~ 21 days in normal conditions);
- Mutants can be easily generated and phenotypically screened for aging defects;
- Longevity assays are easy to perform and very informative;
- Its genome has been fully sequenced and presents fundamental resemblances with that of more complex metazoans.
One of the most used phenotypic assays in worm is the longevity assay. Longevity assays are useful to understand whether a gene is involved in aging (Chen et al. 2015). RNAi is used to knock-down a gene of interest and the resulting longevity is measured. If the longevity increases, the gene function normally reduce it.
In worms, RNAi can be used in large-scale screens. RNAi screens have been performed and have led to the identification of tens of genes involved in aging and regulation of longevity in general. It has been instrumental to decipher the molecular pathways involved in aging (Hamilton et al. 2005).
The question
If RNAi has been instrumental to generally study the process of aging in worm, the readouts are generally phenotypic readouts, e.g. does the worm live longer? Or does it rapidly stop moving after knock-down? These readouts are poorly suited to understand the aging-related role of a protein coded by a given gene in specific tissues. For instance, slo-1 is a gene coding for a potassium channel. Its mutation or knock-down accelerates aging (Chen et al. 2013). Because we have some information about the nature and function of the coded protein, it is rather easy to speculate in each tissue it has an aging-related function (hint: the neurons!).
However, if we do not know anything about the gene (e.g. in which tissue it is expressed, which transcription factors regulate its expression, the type of protein encoded by it, …), it becomes challenging to draw conclusions on the molecular mechanisms involving the protein encoded by the gene.
Rather than relying on RNAi screens, one could leverage the large amount of “omics” data generated in C. elegans to better characterize aging processes at the organismal level. The goal of this study is to analyze aging-related perturbations in worm in a comprehensive manner to (1) identify the tissues undergoing important remodeling during aging and (2) understand the molecular networks involved in aging. More specifically, we would like to fulfill the following aims:
- Find tissue-specific proteins and genes deregulated during aging;
- Find their associations with known diseases;
- Find potential targets for drug engineering
The approach
RNAi has been extensively used to study aging in worm. However, next-generation techniques such as high-throughput sequencing or mass spectrometry now allow for more comprehensive multiomics studies of biological processes. We will rely on results from these techniques to:
- Day 2: Annotate sets of tissue-specific proteins (using proteomics results) and compare it with sets of tissue-specific genes (using transcriptomics results);
- Day 3: Find genomic regions regulating these genes (using epigenomics results);
- Day 4: Identify networks of transcription factors regulating these genes (using epigenomics results).
About this project (and the vignette)
Over the next three days, we will be conducting analysis of several types of datasets in R. This vignette is to be used as a guideline and lists important questions or technical challenges. The R code is only here to suggest one solution to the question(s) that are raised. Before looking at it, it is recommended to investigate the data by yourself and try out your own commands.
More generally, you are welcome to investigate things your own way! If you think of relevant questions that are not raised in this vignette, which you could try and answer using the provided data, don’t hesitate to go off the beaten path. Mentors will be around to supervise everybody - even if the investigations deviate from the original direction - and any personal initiative is welcome and encouraged! Finally, it is important to stress out that everyone is not expected to investigate all the points covered in this vignette and to fully complete every single task as instructed. Again, this vignette is more of a general guideline; if you feel like you don’t want to dedicate time solving a particular point (e.g. you already know everything about investigating pathologies-related lists of genes), feel free to skip parts.
Day 2: Proteomics
Today we will be focusing on analyzing the proteomics data generated in Reinke et al. (2017): “In vivo mapping of tissue- and subcellular-specific proteomes in Caenorhabditis elegans”.
Reading the abstract of the paper is recommended to understand what the authors have attempted to do. In short, the authors have used a protein proximity labelling technique to label cytoplasmic or nuclear proteins in four different tissues of the worm: intestine, epidermis, body wall muscle and pharyngeal muscle. Proteins enriched in each subcellular location or tissue were then identified by mass spectrometry.
Set up R environment
You will first need to make sure that you are working in the right folder. You can also load important packages at this point.
PROJECT_PATH <- getwd()
require(tidyverse)
## Loading required package: tidyverse
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.1 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
require(rtracklayer)
## Loading required package: rtracklayer
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind,
## colMeans, colnames, colSums, dirname, do.call, duplicated,
## eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
## paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
## Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which, which.max,
## which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:tidyr':
##
## expand
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:purrr':
##
## reduce
## Loading required package: GenomeInfoDb
require(biomaRt)
## Loading required package: biomaRtLoad and inspect the data
The data has been fetched from the original paper and reformatted for the purpose of the course. Start by loading it from data/1602426_TableS4.csv.
Questions (~15 min):
- How many proteins are studied?
- How many are in each tissue?
proteomics <- read.csv(
file.path(PROJECT_PATH, '../data/1602426_TableS4_processed.csv'),
na.strings = "NA"
)
# Number of proteins in the file
nrow(proteomics)
## [1] 3722
# Number of proteins detected in each tissue
colSums(proteomics[,38:45])
## Epidermis_cytoplasm Epidermis_nucleus
## 2442 1278
## Pharyngeal.muscle_cytoplasm Pharyngeal.muscle_nucleus
## 837 104
## Body.wall.muscle_cytoplasm Body.wall.muscle_nucleus
## 1375 609
## Intestine_cytoplasm Intestine_nucleus
## 2449 971Adding gene information
This file only contains the protein names. Eventually, we will be looking at genes as well. The simplest approach is to convert everything to unique gene IDs. For this, we will rely on BiomaRt.
Questions (~45 min):
- What is the difference between Swissprot and Trembl?
- Why is there less annotations from SwissProt?
- Why does the original dataset contain both types of IDs?
- Why are there some protein IDs that do not match any gene ID?
- Are there proteins coming from the same gene?
### You can get IDs of genes and associated proteins using biomaRt
# ensembl <- useDataset("celegans_gene_ensembl", mart = useMart("ensembl"))
# ids <- getBM(
# attributes = c("uniprotswissprot", "uniprotsptrembl", "wormbase_gene"),
# mart = ensembl
# )
# Otherwise, you can also get the ids table from the data folder:
ids <- read.table('../data/proteins_genes_IDs.txt', header = TRUE, sep = '\t')
# Merging SwissProt and Trembl IDs into one column will make your life easier
ids$protID <- paste0(ids$uniprotswissprot, ids$uniprotsptrembl)
# Then a simple match function can be used to translate the protein IDs into gene IDs
proteomics$WormBaseID <- ids$wormbase_gene[match(proteomics$UniProtKB, ids$protID)]
# Few proteins do not have any gene ID (why?). We will ignore these ones for the rest of the study.
proteomics <- proteomics[!is.na(proteomics$WormBaseID),]Separating into lists of tissue-specific proteins
Let’s figure out which proteins are specifically present in each tissue.
Questions (~30 min):
- How many proteins are present specifically in one tissue?
- For each set of proteins, are they more cytoplasmic or nuclear-enriched?
- Which ones are transcription factors? (Hint: there is a list of all transcription factors annotated in C. elegans in the data folder…)
- What do you think about this? Are transcription factors enriched in the sets of tissue-specific proteins?
# Let's retrieve the list of proteins specifically enriched in a single tissue
list_prots <- lapply(
levels(proteomics$Tissue.specific),
function(TISSUE) {
proteomics$UniProtKB %>%
'['(proteomics$Tissue.specific == TISSUE & !is.na(proteomics$Tissue.specific)) %>%
as.character()
}
) %>% setNames(levels(proteomics$Tissue.specific))
lengths(list_prots)
## Body wall muscle specific Epidermis specific
## 55 125
## Intestine specific Pharyngeal muscle specific
## 130 21
# Because we will need it later, you can also get the same list but with gene IDs
list_genes <- lapply(
levels(proteomics$Tissue.specific),
function(TISSUE) {
proteomics$WormBaseID %>%
'['(proteomics$Tissue.specific == TISSUE & !is.na(proteomics$Tissue.specific)) %>%
as.character()
}
) %>% setNames(levels(proteomics$Tissue.specific))
# A list of most of the transcription factors found in C. elegans genome can be found in data/TFs.txt
tfs <- readLines('../data/TFs.txt')
length(tfs)
## [1] 824
# Let's see which proteins enriched in tissues are transcription factors
tfs_in_tissue_specific_prots <- lapply(list_prots, function(prots) {
prots[proteomics$WormBaseID[match(prots, proteomics$UniProtKB)] %in% tfs]
})
# Let's see if transcription factors are enriched
n_prots <- nrow(ids)
n_tfs <- length(tfs)
n_tissue_spe_prots <- sum(lengths(list_prots))
n_tfs_in_tissue_specific_prots <- sum(lengths(tfs_in_tissue_specific_prots))
contingency_matrix <- matrix(
cbind(
c(n_tfs_in_tissue_specific_prots, n_tfs - n_tfs_in_tissue_specific_prots),
c(n_tissue_spe_prots - n_tfs_in_tissue_specific_prots, n_prots - n_tfs - n_tissue_spe_prots + n_tfs_in_tissue_specific_prots)
),
nrow = 2
)
fisher.test(contingency_matrix)
##
## Fisher's Exact Test for Count Data
##
## data: contingency_matrix
## p-value = 0.6
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3354 1.5753
## sample estimates:
## odds ratio
## 0.7862Compare it to existing tissue-specific datasets from RNA-seq
Let’s see if this set of proteins overlap with tissue-specific gene annotation. For this, we can use the data in data/gene_annotations.gff3. This is a .gff3 file, a format used to add specific information to gene annotations. This annotation file has not been published yet.
Questions (~30 min):
- Explain the fundamental differences between the 2 sets of data?
- Are the experiments from the same developmental stage?
- Are the two datasets consistent with to each other?
- What are some good ways to represent the intersection between these datasets?
- Can you comment on the list of proteins specifically present in pharyngeal muscles, when compared to the gene annotations from transcriptomics?
# Let's import the gene annotation information
genes <- import('../data/gene_annotations.gff3')
names(genes) <- genes$ID
# It will be easier if we convert the tissue-specific gene annotation to a vector
genes$which.tissues <- factor(genes$which.tissues, levels = c(
'Germline', 'Sperm', 'EarlyGermline', 'Neurons', 'Muscle', 'Hypod.', 'Intest.',
'Germline_Neurons', 'Germline_Muscle', 'Germline_Hypod.', 'Germline_Intest.', 'Neurons_Muscle', 'Neurons_Hypod.', 'Neurons_Intest.', 'Muscle_Hypod.', 'Muscle_Intest.', 'Hypod._Intest.',
'Germline_Neurons_Muscle', 'Germline_Neurons_Hypod.', 'Germline_Neurons_Intest.', 'Germline_Muscle_Hypod.', 'Germline_Muscle_Intest.', 'Germline_Hypod._Intest.', 'Neurons_Muscle_Hypod.', 'Neurons_Muscle_Intest.', 'Neurons_Hypod._Intest.', 'Muscle_Hypod._Intest.',
'Germline_Neurons_Muscle_Hypod.', 'Germline_Neurons_Muscle_Intest.', 'Germline_Neurons_Hypod._Intest.', 'Germline_Muscle_Hypod._Intest.', 'Neurons_Muscle_Hypod._Intest.',
'Germline_Neurons_Muscle_Hypod._Intest.',
'Soma', 'Ubiq.', 'Ubiq.-Reg', 'Unclassified', 'Low', 'non-prot-cod'
))
# We can compare the two different tissue-specific annotations
lapply(list_genes, function(g) {
table(genes[g]$which.tissues)
})
## $`Body wall muscle specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 34
## Hypod.
## 0
## Intest.
## 0
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 3
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 2
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 3
## Ubiq.
## 4
## Ubiq.-Reg
## 0
## Unclassified
## 4
## Low
## 0
## non-prot-cod
## 0
##
## $`Epidermis specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 0
## Hypod.
## 41
## Intest.
## 1
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 1
## Germline_Intest.
## 0
## Neurons_Muscle
## 0
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 1
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 1
## Neurons_Muscle_Intest.
## 0
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 5
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 7
## Ubiq.
## 19
## Ubiq.-Reg
## 0
## Unclassified
## 10
## Low
## 4
## non-prot-cod
## 0
##
## $`Intestine specific`
##
## Germline
## 2
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 1
## Muscle
## 0
## Hypod.
## 2
## Intest.
## 39
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 0
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 2
## Hypod._Intest.
## 2
## Germline_Neurons_Muscle
## 0
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 11
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 1
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 5
## Ubiq.
## 15
## Ubiq.-Reg
## 0
## Unclassified
## 18
## Low
## 3
## non-prot-cod
## 0
##
## $`Pharyngeal muscle specific`
##
## Germline
## 0
## Sperm
## 0
## EarlyGermline
## 0
## Neurons
## 0
## Muscle
## 2
## Hypod.
## 0
## Intest.
## 0
## Germline_Neurons
## 0
## Germline_Muscle
## 0
## Germline_Hypod.
## 0
## Germline_Intest.
## 0
## Neurons_Muscle
## 1
## Neurons_Hypod.
## 0
## Neurons_Intest.
## 0
## Muscle_Hypod.
## 0
## Muscle_Intest.
## 0
## Hypod._Intest.
## 0
## Germline_Neurons_Muscle
## 1
## Germline_Neurons_Hypod.
## 0
## Germline_Neurons_Intest.
## 0
## Germline_Muscle_Hypod.
## 0
## Germline_Muscle_Intest.
## 0
## Germline_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod.
## 0
## Neurons_Muscle_Intest.
## 3
## Neurons_Hypod._Intest.
## 0
## Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod.
## 0
## Germline_Neurons_Muscle_Intest.
## 0
## Germline_Neurons_Hypod._Intest.
## 0
## Germline_Muscle_Hypod._Intest.
## 0
## Neurons_Muscle_Hypod._Intest.
## 0
## Germline_Neurons_Muscle_Hypod._Intest.
## 0
## Soma
## 1
## Ubiq.
## 0
## Ubiq.-Reg
## 0
## Unclassified
## 4
## Low
## 7
## non-prot-cod
## 0
# We can also plot the two different tissue-specific annotations as a heatmap
df <- data.frame(
transcriptome_annotations = genes[unlist(list_genes)]$which.tissues,
proteomics_annotations = proteomics[match(unlist(list_genes), proteomics$WormBaseID),]$Tissue.specific
) %>%
table() %>%
as.data.frame()
p <- ggplot(df, aes(y = 1, x = transcriptome_annotations)) +
geom_tile(aes(fill = Freq)) +
geom_text(aes(label = Freq)) +
scale_fill_gradientn(colours = c('white', 'orange', 'red')) +
theme_bw() +
facet_wrap(~proteomics_annotations) +
labs(y = '', x = 'Transcriptomics-based annotations') +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
ggsave('../results/comparison-transcriptomics-proteomics.pdf', width = 15, height = 4)Save your work!!
That’s it for today! Don’t forget to save your progress.
Day 3: ATAC-seq
Today you will be focusing on investigating the ATAC-seq data generated in Jänes et al. (2018): “Chromatin accessibility dynamics across C. elegans development and ageing”.
ATAC-seq is a method to profile genome-wide accessibility of the chromatin. Accessible (or “open”) chromatin regions are the loci responsible for the regulation of gene expression. They are frequently called “Regulatory Elements” (REs). In Jänes et al. (2018), whole worms have been used to perform ATAC-seq at different time points of their life cycle: at the young adult (YA) stage or after 3, 7, 10 or 13 days after the YA stage. Accessibility of all the known promoters (13,596) has been estimated throughout aging of the worm. A resulting matrix (13,596 promoters x 5 time points) has been generated and promoter accessibility changes have been computed. 1,800 promoters appeared to have significant accessibility variations during aging, and 8 clusters of promoters following similar changes have been annotated.
Set up R environment and load previous work
Once again, first make sure that you are working in the right folder. You can also load important packages at this point.
PROJECT_PATH <- getwd()
require(tidyverse)
require(rtracklayer)
require(htmlwidgets)
## Loading required package: htmlwidgets
require(gprofiler2)
## Loading required package: gprofiler2
require(DOSE)
## Loading required package: DOSE
##
## DOSE v3.8.2 For help: https://guangchuangyu.github.io/DOSE
##
## If you use DOSE in published research, please cite:
## Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609
require(clusterProfiler)
## Loading required package: clusterProfiler
## clusterProfiler v3.10.1 For help: https://guangchuangyu.github.io/software/clusterProfiler
##
## If you use clusterProfiler in published research, please cite:
## Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
##
## Attaching package: 'clusterProfiler'
## The following object is masked from 'package:purrr':
##
## simplify
require(org.Hs.eg.db)
## Loading required package: org.Hs.eg.db
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
##
# The work done in Day 2 has been stored in an RData object.
load('../results/Day2.RData')Investigate the ATAC-seq aging data
The data has been fetched from the original paper and reformatted for the purpose of the course. Start by loading it from data/ATAC-seq_aging.gff3.
Questions (~30min):
- Which format is the
ATAC-seq_aging.gff3file? What does the file contain? - Focus on the cluster information (“ageing_prom_cluster_label” column): How many promoters does each cluster contain?
- Can you find a good way to visualize and compare variations of promoter accessibility during aging across clusters?
- Discuss the nature of the accessibility variations in the different clusters.
REs <- import('../data/ATAC-seq_aging.gff3')
proms <- REs[REs$ageing_prom_cluster_label != '.']
ATAC_aging <- mcols(proms)[, 8:12] %>%
data.frame() %>%
data.matrix() %>%
as.data.frame()
# How many promoters does each cluster contain?
table(proms$ageing_prom_cluster_label)
##
## I I+H [1] I+H [2] Mix1 Mix2 Mix3 Mix4 Mix5 stable
## 304 252 409 210 98 273 112 142 11796
# Comparison of accessibility changes across clusters
norm_changes_df <- apply(ATAC_aging, 1, function(ROW) {log2(ROW/mean(ROW))}) %>%
t() %>%
data.frame()
df <- data.frame(
locus = proms$locus,
norm_changes_df,
cluster = proms$ageing_prom_cluster_label
) %>%
gather(stage, expr, -cluster, -locus) %>%
mutate(stage = factor(stage, levels = colnames(norm_changes_df))) %>%
filter(cluster != '.')
p <- ggplot(df, aes(x = stage, y = expr, color = cluster, group = locus)) +
geom_line(alpha = 0.05) +
theme_bw() +
facet_wrap(~cluster) +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
theme(legend.position = 'none')
ggsave('../results/accessibility-changes_clusters.pdf', height = 6, width = 6.4)
## Warning: Removed 30 rows containing missing values (geom_path).Questions (~20min):
- In which tissue(s) are the associated genes (from each cluster) translated? Use the proteomics annotations for this.
- What does it suggest about the changes occurring during aging at the organismal level? Which tissues seem to undergo changes during aging?
# Check the association between proms and proteins
list_genes_aging <- lapply(levels(df$cluster), function(cluster) {
proms <- proms[proms$ageing_prom_cluster_label == cluster & proms$regulatory_class %in% c('bidirect-promoter', 'fwd-promoter', 'rev-promoter')]
genes_cluster <- proms$WormBaseID %>% as.character() %>% strsplit(',') %>% unlist()
return(genes_cluster)
}) %>% setNames(levels(df$cluster))
# We can also plot something
df <- data.frame(
cluster_annotations = unlist(lapply(seq_along(list_genes_aging), function(K) {rep(names(list_genes_aging)[K], length(list_genes_aging[[K]]))})),
proteomics_annotations = proteomics[match(unlist(list_genes_aging), proteomics$WormBaseID),]$Tissue.specific
) %>%
table() %>%
as.data.frame() %>%
filter(cluster_annotations != 'stable')
p <- ggplot(df, aes(y = 1, x = cluster_annotations)) +
geom_tile(aes(fill = Freq)) +
geom_text(aes(label = Freq)) +
scale_fill_gradientn(colours = c('white', 'orange', 'red')) +
theme_bw() +
facet_wrap(~proteomics_annotations) +
labs(y = '', x = 'Ageing cluster') +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
ggsave('../results/comparison-ATAC-clusters-proteomics.pdf', width = 8, height = 4)Investigate the genes associated to varying promoters during aging
Now that we know which genomic loci are varying during aging, we can retrieve the associated genes and investigate their function and their associated with diseases.
Questions (~20 min):
- What is the function of genes going up (down) during aging? Use gprofiler2 package to get some insights.
# What is the function of genes going up (down) during aging?
genes_down <- unique(unlist(list_genes_aging[1:4]))
genes_up <- unique(unlist(list_genes_aging[5:8]))
gos <- gost(
list(
up = genes_up,
down = genes_down
),
organism = 'celegans',
multi_query = TRUE,
user_threshold = 0.01,
source = c('GO:BP', 'GO:MF', 'GO:CC')
)
p <- gostplot(gos, capped = TRUE, interactive = TRUE)Questions (~30 min):
- What are the homologs of these genes in mammals? Use biomaRt for this.
- For the next step, you will need ENTREZ IDs rather than Ensembl IDs for the human homologues. Use the bitr function to translate the IDs.
# What are the homologs of these genes in mammals?
ensembl <- useDataset("celegans_gene_ensembl", mart = useMart("ensembl"))
#
human_homologues_down <- getBM(
filters = c("ensembl_gene_id"),
values = genes_down,
attributes = c('ensembl_gene_id', 'hsapiens_homolog_ensembl_gene'),
mart = ensembl
)
## Batch submitting query [================>---------] 67% eta: 6s Batch
## submitting query [==========================] 100% eta: 0s
human_homologues_down_ENTREZ <- human_homologues_down$hsapiens_homolog_ensembl_gene %>%
'['(nchar(.) > 0) %>%
bitr(fromType = "ENSEMBL", toType = c("ENTREZID"), OrgDb = org.Hs.eg.db) %>%
'['(,"ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(., fromType = "ENSEMBL", toType = c("ENTREZID"), OrgDb =
## org.Hs.eg.db): 1.75% of input gene IDs are fail to map...
#
human_homologues_up <- getBM(
filters = c("ensembl_gene_id"),
values = genes_up,
attributes = c('ensembl_gene_id', 'hsapiens_homolog_ensembl_gene'),
mart = ensembl
)
## Batch submitting query [==========================] 100% eta: 0s
human_homologues_up_ENTREZ <- human_homologues_up$hsapiens_homolog_ensembl_gene %>%
'['(nchar(.) > 0) %>%
bitr(fromType = "ENSEMBL", toType = c("ENTREZID"), OrgDb = org.Hs.eg.db) %>%
'['(,"ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(., fromType = "ENSEMBL", toType = c("ENTREZID"), OrgDb =
## org.Hs.eg.db): 1.96% of input gene IDs are fail to map...Questions (~20 min):
- What are the disease(s) associated to the human homologues of worm genes down-regulated during aging? Use the DOSE package for this.
- Are the disease(s) associated to these genes consistent with the tissues in which these genes produce proteins?
diseases_enrichment_down <- enrichDO(
gene = human_homologues_down_ENTREZ,
ont = "DO",
pvalueCutoff = 0.01,
pAdjustMethod = "BH",
readable = TRUE
)
head(diseases_enrichment_down@result[,1:7])
## ID Description GeneRatio BgRatio
## DOID:2978 DOID:2978 carbohydrate metabolic disorder 9/401 27/8007
## DOID:655 DOID:655 inherited metabolic disorder 31/401 331/8007
## DOID:589 DOID:589 congenital hemolytic anemia 5/401 16/8007
## DOID:9993 DOID:9993 hypoglycemia 4/401 10/8007
## DOID:14504 DOID:14504 Niemann-Pick disease 4/401 12/8007
## DOID:4330 DOID:4330 non-langerhans-cell histiocytosis 4/401 14/8007
## pvalue p.adjust qvalue
## DOID:2978 3.794e-06 0.002443 0.002443
## DOID:655 5.323e-04 0.164617 0.164617
## DOID:589 8.479e-04 0.164617 0.164617
## DOID:9993 1.022e-03 0.164617 0.164617
## DOID:14504 2.225e-03 0.286527 0.286527
## DOID:4330 4.154e-03 0.432135 0.432135[EXTRA] Using external visualization tools
Data visualization softwares are key to investigate. If you have some spare time, you can try to use Seqplots software (Stempor and Ahringer 2016) to get more insights about the promoters clustered by their accessibility changes. The software requires:
- A genome to be installed (ce11, the latest genome version of C. elegans, should already be installed);
- Track files (typically
bigWigfiles): for instance, genome-wide accessibility tracks generated during aging (see thedata/folder…) - Some feature tracks (typically
bedfiles): these files generally contain sets of genomic loci. For instance, you could generate 9 differentbedfiles, each containing the promoters from a single cluster.
# Export bed files of promoters from each cluster to visualize the accessibility profiles over each set using SeqPlots
for (cluster in levels(df$cluster)) {
export(granges(proms[proms$ageing_prom_cluster_label == cluster]), paste0('ATAC_cluster-', cluster, '.bed'))
}
seqplots::run()Once the bed files are generated, you can load the tracks and the bed files in Seqplots and try the different functionalities of the software.
Day 4: ChIP-seq
Today we try to understand which TFs are regulating the promoters which are varying during aging. For this, we will be investigating ChIP-seq data generated by the modENCODE/modERN consortium. This consortium has profiled (among other things) chromatin binding landscape for tens of transcription factors in worm, by expressing exogenous transgenes consisting of tagged transcription factors. We will try to identify which TFs are preferentially binding to each cluster of promoters co-regulated throughout aging.
[EXTRA] Get the ChIP-seq data by yourself from modENCODE platform
modENCODE/modERN data is accessible on the ENCODE project web portal, which hosts many more datasets from different model systems. Raw data and processed files can be downloaded directly, or a list of files can be retrieved and downloaded programatically. If you have some spare time, you can try to download the original processed files from there and start to wrangle them by yourself.
Set up R environment and load previous work
Once again, first make sure that you are working in the right folder. You can also load important packages at this point.
Investigate the TFs associated to varying promoters during aging
Because getting the right list of files, tyding the downloaded files, QC-ing and removing bad quality datasets, merging replicates, etc… always require a lot of data wrangling, a clean set of .bed files (one per transcription factor) is provided in the data/ folder to avoid these cumbersome steps. Each bed file contains the transcription factor ChIP-seq “peaks”, corresponding to the genomic loci bound by the transcription factor. Some TFs are binding to thousands of loci, some others are bound to a few dozens of them. Moreover, the quality of the ChIP-seq datasets can vary between samples.
Questions (~45 min):
- How to summarize modENCODE ChIP-seq data for our purpose?
- Import all the TF ChIP-seq data in R and integrate it to the
# Read all the TF CHIP-seq bed files
TFs <- list.files('../data/bed/') %>% gsub('.bed', '', .)
TFs_peaks <- lapply(TFs, function(TF) {
import(paste0('../data/bed/', TF, '.bed'))
}) %>% setNames(TFs)
# Compile a matrix summarizing which TFs are binding to which promoters
TFs_matrix <- matrix(
FALSE,
ncol = length(TFs),
nrow = length(proms)
)
colnames(TFs_matrix) <- TFs ; row.names(TFs_matrix) <- names(proms)
for (TF in TFs) {
hits <- subjectHits(findOverlaps(TFs_peaks[[TF]], proms))
TFs_matrix[hits, TF] <- TRUE
}
# Integrate this to the promoters GRanges object
proms$TFs <- apply(TFs_matrix, 1, function(ROW) {paste(colnames(TFs_matrix)[ROW], collapse = ',')})
proms$TF_nb <- rowSums(TFs_matrix)
proms$is_HOT <- ifelse(proms$TF_nb >= 18, TRUE, FALSE)
cold_proms <- proms[!proms$is_HOT]Questions (~1 h):
- Which metric should be calculated to see if a TF is enriched in a set of promoters?
- Can the unelegant nested
forloops be avoided? - How can you represent the enrichment of binding of each TF in each cluster?
clusters <- c("I", "I+H [1]", "I+H [2]", "Mix1", "Mix2", "Mix3", "Mix4", "Mix5")
fisher_matrix <- data.frame(matrix(0, nrow = length(clusters), ncol = length(TFs))) ; colnames(fisher_matrix) = TFs ; row.names(fisher_matrix) = clusters
pval_matrix <- data.frame(matrix(0, nrow = length(clusters), ncol = length(TFs))) ; colnames(pval_matrix) = TFs ; row.names(pval_matrix) = clusters
pct_matrix <- data.frame(matrix(0, nrow = length(clusters), ncol = length(TFs))) ; colnames(pct_matrix) = TFs ; row.names(pct_matrix) = clusters
for (CLUSTER in clusters) {
# Get the cold promoters that are in the cluster CLUSTER
cold_proms_in_cluster <- cold_proms[cold_proms$ageing_prom_cluster_label == CLUSTER]
for (TF in TFs) {
# Get the cold promoters bound by the transcription factor TF
cold_proms_withTF <- sum(grepl(TF, cold_proms$TFs))
# Get the cold promoters that are in the cluster CLUSTER and bound by the transcription factor TF
cold_proms_in_cluster_withTF <- sum(grepl(TF, cold_proms_in_cluster$TFs))
# Build a contingency matrix
contingency_matrix <- matrix(
c(
cold_proms_in_cluster_withTF,
length(cold_proms_in_cluster) - cold_proms_in_cluster_withTF,
cold_proms_withTF - cold_proms_in_cluster_withTF,
length(cold_proms) - cold_proms_withTF - length(cold_proms_in_cluster) + cold_proms_in_cluster_withTF
),
nrow = 2
)
# Run a Fisher test and populate the result data.frames
fisher <- fisher.test(contingency_matrix)
fisher_matrix[CLUSTER, TF] <- fisher$estimate
pval_matrix[CLUSTER, TF] <- fisher$p.val
pct_matrix[CLUSTER, TF] <- cold_proms_in_cluster_withTF/length(cold_proms_in_cluster)*100
}
}
# Filter the results to only see the significant ones
fisher_matrix_curated <- t(fisher_matrix)
fisher_matrix_curated[t(pval_matrix) > 0.01] <- 0 # Discard non-significant enrichments
fisher_matrix_curated <- fisher_matrix_curated[rowSums(fisher_matrix_curated > 2) > 1,] # Discard TFs that don't have more than 2-fold enrichment in at least 1 cluster
df <- fisher_matrix_curated %>%
data.frame() %>%
setNames(clusters) %>%
rownames_to_column('TF') %>%
gather('cluster', 'Odd_ratio', -TF)
p <- ggplot(df, aes(y = TF, x = cluster)) +
geom_tile(aes(fill = Odd_ratio)) +
scale_fill_gradientn(colours = c('white', 'orange', 'darkred')) +
theme_bw() +
labs(y = 'TFs', x = 'Promoter clusters in aging') +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
ggsave('../results/TFs-enrichments_promoter-clusters.pdf', width = 5, height = 7)Defining key TFs involved in aging?
Questions (~15 min):
- Is there a group of motifs functionally interacting? Use STRING to investigate this.
- What can you find in the literature about these factors? Look into the STRING output carefully…
If everything has been done correctly, you should obtain a graph like this in STRING:
[EXTRA] Finding motifs associated to each TFs
Most transcription factors recognize specific sequence motifs to bind to DNA. Using MEME web interface and the bed files located in data/, you can investigate which motifs are enriched in transcription factor binding sites (provided in the .bed files).
SessionInfo
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] org.Hs.eg.db_3.7.0 AnnotationDbi_1.44.0 Biobase_2.42.0
## [4] clusterProfiler_3.10.1 DOSE_3.8.2 gprofiler2_0.1.3
## [7] htmlwidgets_1.3 biomaRt_2.38.0 rtracklayer_1.42.2
## [10] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
## [13] S4Vectors_0.20.1 BiocGenerics_0.28.0 forcats_0.4.0
## [16] stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.2
## [19] readr_1.3.1 tidyr_0.8.3 tibble_2.1.1
## [22] ggplot2_3.1.1 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] fgsea_1.13.0 colorspace_1.4-1
## [3] ggridges_0.5.1 qvalue_2.14.1
## [5] XVector_0.22.0 rstudioapi_0.10
## [7] farver_1.1.0 urltools_1.7.2
## [9] ggrepel_0.8.0 bit64_0.9-7
## [11] lubridate_1.7.4 xml2_1.2.0
## [13] splines_3.5.2 GOSemSim_2.8.0
## [15] knitr_1.22 polyclip_1.10-0
## [17] jsonlite_1.6 Cairo_1.5-9
## [19] Rsamtools_1.34.1 broom_0.5.1
## [21] GO.db_3.7.0 shiny_1.2.0
## [23] ggforce_0.2.1 compiler_3.5.2
## [25] httr_1.4.0 rvcheck_0.1.3
## [27] backports_1.1.3 assertthat_0.2.1
## [29] Matrix_1.2-17 lazyeval_0.2.2
## [31] cli_1.1.0 later_0.8.0
## [33] tweenr_1.0.1 htmltools_0.3.6
## [35] prettyunits_1.0.2 tools_3.5.2
## [37] igraph_1.2.4 gtable_0.3.0
## [39] glue_1.3.1 GenomeInfoDbData_1.2.0
## [41] reshape2_1.4.3 DO.db_2.9
## [43] fastmatch_1.1-0 Rcpp_1.0.1
## [45] enrichplot_1.2.0 cellranger_1.1.0
## [47] Biostrings_2.50.2 nlme_3.1-137
## [49] crosstalk_1.0.0 ggraph_1.0.2
## [51] xfun_0.5 rvest_0.3.2
## [53] mime_0.6 XML_3.98-1.19
## [55] europepmc_0.3 zlibbioc_1.28.0
## [57] MASS_7.3-51.1 scales_1.0.0
## [59] promises_1.0.1 hms_0.4.2
## [61] SummarizedExperiment_1.12.0 RColorBrewer_1.1-2
## [63] curl_3.3 yaml_2.2.0
## [65] memoise_1.1.0 gridExtra_2.3
## [67] UpSetR_1.3.3 triebeard_0.3.0
## [69] stringi_1.3.1 RSQLite_2.1.1
## [71] BiocParallel_1.16.6 rlang_0.4.2
## [73] pkgconfig_2.0.2 matrixStats_0.54.0
## [75] bitops_1.0-6 evaluate_0.13
## [77] lattice_0.20-38 GenomicAlignments_1.18.1
## [79] labeling_0.3 cowplot_0.9.4
## [81] bit_1.1-14 tidyselect_0.2.5
## [83] plyr_1.8.4 magrittr_1.5
## [85] bookdown_0.9.2 R6_2.4.0
## [87] generics_0.0.2 DelayedArray_0.8.0
## [89] DBI_1.0.0 pillar_1.3.1
## [91] haven_2.1.0 withr_2.1.2
## [93] RCurl_1.95-4.12 modelr_0.1.4
## [95] crayon_1.3.4 plotly_4.8.0
## [97] rmarkdown_1.12.6 viridis_0.5.1
## [99] progress_1.2.0 grid_3.5.2
## [101] readxl_1.3.1 data.table_1.12.0
## [103] blob_1.1.1 rmdformats_0.3.6
## [105] digest_0.6.18 xtable_1.8-3
## [107] httpuv_1.5.0 gridGraphics_0.3-0
## [109] munsell_0.5.0 ggplotify_0.0.3
## [111] viridisLite_0.3.0
References
Chen, Albert Tzong-Yang, Chunfang Guo, Omar A. Itani, Breane G. Budaitis, Travis W. Williams, Christopher E. Hopkins, Richard C. McEachin, et al. 2015. “Longevity Genes Revealed by Integrative Analysis of Isoform-Specific daf-16/FoxO Mutants of Caenorhabditis elegans.” Genetics 201 (2): 613. https://doi.org/10.1534/genetics.115.177998.
Chen, Chun-Hao, Yen-Chih Chen, Hao-Ching Jiang, Chung-Kuan Chen, and Chun-Liang Pan. 2013. “Neuronal aging: learning from C. elegans.” Journal of Molecular Signaling 8 (0). https://doi.org/10.1186/1750-2187-8-14.
Hamilton, Benjamin, Yuqing Dong, Mami Shindo, Wenyu Liu, Ian Odell, Gary Ruvkun, and Siu Sylvia Lee. 2005. “A systematic RNAi screen for longevity genes in C. elegans.” Genes & Development 19 (13): 1544–55. https://doi.org/10.1101/gad.1308205.
Huber, Wolfgang, Vincent J. Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S. Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating high-throughput genomic analysis with Bioconductor.” Nature Methods 12 (2): 115–21. https://doi.org/10.1038/nmeth.3252.
Jänes, Jürgen, Yan Dong, Michael Schoof, Jacques Serizay, Alex Appert, Chiara Cerrato, Carson Woodbury, et al. 2018. “Chromatin accessibility dynamics across C. elegans development and ageing.” eLife, October. https://doi.org/10.7554/eLife.37344.
Johnson, Thomas E. 2013. “25 Years after age-1: Genes, Interventions and the Revolution in Aging Research.” Experimental Gerontology 48 (7): 640. https://doi.org/10.1016/j.exger.2013.02.023.
López-Otı́n, Carlos, Maria A. Blasco, Linda Partridge, Manuel Serrano, and Guido Kroemer. 2013. “The Hallmarks of Aging.” Cell 153 (6): 1194–1217. https://doi.org/10.1016/j.cell.2013.05.039.
Marc Carlson, Patrick Aboyoun, and Martin Morgan. 2019. “An Introduction to the GenomicRanges Package.” https://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.html.
Reinke, Aaron W, Raymond Mak, Emily R Troemel, and Eric J Ben. 2017. “In Vivo Mapping of Tissue- and Subcellular-Specific Proteomes in Caenorhabditis Elegans.” Science Advances 3 (5): 1–12.
Son, Heehwa G., Ozlem Altintas, Eun Ji E. Kim, Sujeong Kwon, and Seung-Jae V. Lee. 2019. “Age-dependent changes and biomarkers of aging in Caenorhabditis elegans.” Aging Cell 18 (2): e12853. https://doi.org/10.1111/acel.12853.
Stempor, Przemyslaw, and Julie Ahringer. 2016. “SeqPlots - Interactive software for exploratory data analyses, pattern discovery and visualization in genomics.” Wellcome Open Research 1 (November): 14. https://doi.org/10.12688/wellcomeopenres.10004.1.