15 Lab 6: Motif enrichment analysis from ChIP-seq data

Aims

Find motifs enriched in a set of ChIP-seq peaks
Scan genomic loci for de novo motifs
Compute statistical enrichment of motifs in sets of genomic annotations

Datasets

modENCODE/modERN TF ChIP-seq database, available here

15.1 Find de novo motifs enriched in `xnd-1` ChIP-seq

Check the meme website to identify which tool is best suited to identify a motif de novo in a set of peaks from a ChIP-seq experiment.
What do you need to run xstreme on a set of peaks?

15.1.1 Preparing `meme` input

Import xnd-1 peaks in R as a GRanges object.
Recover ce11 genome sequence using the BSgenome.Celegans.UCSC.ce11 package.
Extract sequence over xnd-1 peaks with the Biostrings package.
Export the sequences as a fasta file.

Answer

library(rtracklayer)
library(Biostrings)
library(BSgenome.Celegans.UCSC.ce11)
xnd1 <- import('Share/peaks/modENCODE/xnd-1.bed')
genome <- BSgenome.Celegans.UCSC.ce11
xnd1_seq <- getSeq(genome, xnd1)
names(xnd1_seq) <- as.character(xnd1)
writeXStringSet(xnd1_seq, 'xnd-1.fa')

15.1.2 Running `meme`

Identify motifs enriched in xnd-1 peaks with xstreme using the following options:
- Zero or one occurence per sequence at most for meme
- At most 3 motifs
- A min motif width of 6 for meme
- A max motif width of 15 for meme
- No streme motifs
- Over multiple processors in parallel

Answer

sh

xstreme \
    --p Share/peaks/modENCODE/xnd-1.fa \
    --oc meme_out/ \
    -minw 12 \
    -maxw 16 \
    --meme-nmotifs 3 \
    --meme-p 12 \
    --meme-mod zoops \
    --streme-nmotifs 0

Check the results. Do they make sense in the light of recent publications? (see here)

15.2 Scan `ce11` regulatory elements for de novo motifs

Import ce11 tissue-specific regulatory element annotations, as done in the previous lab.

Answer

library(rtracklayer)
library(GenomicRanges)
REs <- readxl::read_xlsx('Share/WBcel235_REs.xlsx', skip = 2, col_names = TRUE)
REs <- makeGRangesFromDataFrame(
    REs, seqnames.field = 'chrom_ce11', 
    start.field = 'start_ce11', end.field = 'end_ce11',
    keep.extra.columns = TRUE
)

Import the first xnd-1 motif found by XSTREME in R using universalmotif package.

Answer

library(universalmotif)
xnd1_motifs <- read_meme('meme_out/xstreme.txt')
names(xnd1_motifs) <- c('motif1', 'motif2', 'motif3')

Read motifmatchr::matchMotifs() documentation.
What class of objects the motifs should be stored into? Coerce xnd1- motifs to PFMatrixList class.
Map xnd-1 motifs to regulatory elements.
What is the class of the generated object? Its dimensions? What are the features and what are the columns?

Answer

library(motifmatchr)
library(BiocParallel)
register(MulticoreParam(workers = 2, progressbar = TRUE))
xnd1_motifs_pfm <- map(xnd1_motifs, convert_motifs, "TFBSTools-PFMatrix") |> do.call(PFMatrixList, args = _)
motif_mappings <- matchMotifs(xnd1_motifs_pfm, REs, genome = BSgenome.Celegans.UCSC.ce11::BSgenome.Celegans.UCSC.ce11)
class(motif_mappings)
dim(motif_mappings)

Compute a fisher test of enrichment for xnd-1 first motif over Intestine-specific regulatory elements.
Is xnd-1 motif enriched in Intestine-specific regulatory elements?

Answer

has_xnd1 <- motifMatches(motif_mappings)[, 'motif1']
is_intestine <- REs$Annotation == 'Intest.'
tab <- table(is_intestine, has_xnd1)
fisher_test <- fisher.test(tab)
fisher_test

Now do the same analysis for Germline specific regulatory elements.
Is xnd-1 motif enriched in Germline-specific regulatory elements?

Answer

has_xnd1 <- motifMatches(motif_mappings)[, 'motif1']
is_germline <- REs$Annotation == 'Germline'
tab <- table(is_germline, has_xnd1)
fisher_test <- fisher.test(tab)
fisher_test

For each motif, identify in which class of tissue-specific regulatory elements they are enriched.
Comment on the results.

Answer

library(purrr)
library(broom)
groups <- c("Hypod.", "Muscle", "Intest.", "Neurons",  "Germline", "Soma", "Ubiq.")
pvals <- map_dfr(names(xnd1_motifs_pfm), function(motif) {
    map_dfr(groups, function(group) {
        tab <- table(REs$Annotation == group, motifMatches(motif_mappings)[, motif])
        ftest <- fisher.test(tab) |> 
            tidy() |> 
            mutate(motif = motif, group = group) |> 
            select(group, motif, estimate, p.value)
    })
})
filter(pvals, p.value < 0.01, estimate > 2) |> print(n = 40)

15.1 Find de novo motifs enriched in xnd-1 ChIP-seq

15.1.1 Preparing meme input

15.1.2 Running meme

15.2 Scan ce11 regulatory elements for de novo motifs

15.1 Find de novo motifs enriched in `xnd-1` ChIP-seq

15.1.1 Preparing `meme` input

15.1.2 Running `meme`

15.2 Scan `ce11` regulatory elements for de novo motifs