14  Lab 5: Comparing sets of ChIP-seq peaks

Aims
  • Import a dozen of ChIP-seq peak sets in R
  • Check distribution of peaks compared to genomic features
  • Check peak occurrence over tissue-specific regulatory elements
Datasets
  • modENCODE/modERN TF ChIP-seq database, available here

14.1 Download peaks from ENCODE database

Check out the modENCODE/modERN TF ChIP-seq database.

Inspect the available datasets, by filtering e.g. using the following criteria:

  • Assay title: TF ChIP-seq
  • Organism: C. elegans
  • Genome assembly: ce11
  • Project: modENCODE

Peak files for 12 ChIP-seq datasets have been already downloaded from this database.

14.2 Import all peaks in R

  • In R, list all the bed files available for the peaks.
  • Import each file as a GRanges object.
r
library(rtracklayer)
library(tidyverse)
bedfiles <- list.files('Share/peaks/modENCODE/', pattern = '*bed')
peaks <- map(bedfiles, ~ import(file.path('Share/peaks/modENCODE/', .x)))
names(peaks) <- str_replace(bedfiles, '.bed', '')

14.3 Define genomic features in ce11

Genomic features can be easily annotated from a set of gene features. ChIPseeker facilitates the annotation of ChIP-seq peaks using gene annotations directly provided in R by TxDb gene annotation packages, e.g. TxDb.Celegans.UCSC.ce11.ensGene

  • Install the TxDb.Celegans.UCSC.ce11.ensGene package in R
  • Install ChIPseeker
r
library(ChIPseeker)
library(TxDb.Celegans.UCSC.ce11.ensGene)

14.4 ChIP-seq peak overlaps with genomic features

  • Use ChIPseeker to annotate ChIP-seq peaks for a single ChIP-seq experiment.
  • Extract
r
ceh48 <- peaks[[1]]
annots_ceh48 <- annotatePeak(ceh48, TxDb = TxDb.Celegans.UCSC.ce11.ensGene)
getAnnoStat(annots_ceh48)
  • Iterate over all the peak sets to compile their annotations across C. elegans genome. To iterate over each set of peaks in the peaks list and return an aggregate data.frame, use the imap_dfr.
r
annots <- imap_dfr(peaks, ~ annotatePeak(.x, TxDb = TxDb.Celegans.UCSC.ce11.ensGene) |> 
    getAnnoStat() |> 
    mutate(TF = .y)
)
annots
  • Generate a barplot to represent the % of peaks in each type of genomic features, for the 12 TFs.
r
library(ggplot2)
p <- ggplot(annots, aes(x = TF, y = Frequency, fill = Feature)) + 
    geom_col() + 
    guides(x =  guide_axis(angle = 90))
  • Comment on the distribution of the 12 sets of peaks over genomic features.

14.5 Peaks occurrence over tissue-specific regulatory elements

  • Import regulatory elements annotated in C. elegans in R, as seen in previous Lab.
R
library(readxl)
download.file('https://genome.cshlp.org/content/suppl/2020/11/16/gr.265934.120.DC1/Supplemental_Table_S2.xlsx', 'WBcel235_REs.xlsx')
REs <- read_xlsx('WBcel235_REs.xlsx', skip = 2, col_names = TRUE)
REs <- makeGRangesFromDataFrame(
    REs, seqnames.field = 'chrom_ce11', 
    start.field = 'start_ce11', end.field = 'end_ce11',
    keep.extra.columns = TRUE
)
REs
  • Check for overlap of each regulatory element with xnd-1 peaks.
  • Check the overlap of germline-specific regulatory element with xnd-1 peaks.
  • Check the enrichment of germline-specific regulatory element over xnd-1 peaks.
R
table(REs %over% peaks[['xnd-1']])
table(REs %over% peaks[['xnd-1']], REs$Annotation)
table(REs %over% peaks[['xnd-1']], REs$Annotation == 'Germline') 
table(REs %over% peaks[['xnd-1']], REs$Annotation == 'Germline') |> fisher.test()
  • For each tissue (Germline, Neurons, Muscle, Hypod. and Intest.), check whether the tissue-specific REs are enriched in xnd-1 peak.
Tip
  • You can iterate over each tissue and return an aggregated data.frame using map_dfr.
  • For each iteration, you can transform the result of fisher.test() into a tibble with the glance function from broom package.
R
tissues <- c("Germline", "Neurons", "Muscle", "Hypod.", "Intest.")
df <- map_dfr(tissues, function(tissue) {
    fisher.test(
        REs %over% peaks[['xnd-1']], 
        REs$Annotation == tissue
    ) |> 
        broom::glance() |> 
        mutate(tissue = tissue, TF = 'xnd-1') |> 
        dplyr::select(tissue, TF, estimate, p.value)
})
df
  • Perform the same operation by iterating over each peak set in the list of imported peaks.
Tip
  • You can use two nested map_dfr, iterating first over each TF then over each tissue.
R
TFs <- names(peaks)
tissues <- c("Germline", "Neurons", "Muscle", "Hypod.", "Intest.")
df <- map_dfr(TFs, function(TF) {
    map_dfr(tissues, function(tissue) {
        fisher.test(
            REs %over% peaks[[TF]], 
            REs$Annotation == tissue
        ) |> 
            broom::glance() |> 
            mutate(tissue = tissue, TF = TF) |> 
            dplyr::select(tissue, TF, estimate, p.value)
    })
})
df
  • For each TF, filter the tissues in which it is preferentially enriched (odds ratio >= 2, p.value <= 0.05)
  • Find TFs enriched over Intestine REs
R
filter(df, estimate >= 2, p.value <= 0.05)
filter(df, estimate >= 2, p.value <= 0.05, tissue == "Intest.")
  • Check the STRING DB website to assess whether these TFs have been shown to interact together. Comment.
Back to top