13 Lab 3 - ChIP-seq downstream analysis
Aims
- Find motifs enriched in a set of ChIP-seq peaks
- Import a dozen of ChIP-seq peak sets in R
- Check distribution of peaks comapred to genomic features
- Check peak occurrence over tissue-specific regulatory elements
Datasets
- modENCODE/modERN TF ChIP-seq database, available here
13.1 Download peaks from ENCODE database
Check out the modENCODE/modERN TF ChIP-seq database.
Inspect the available datasets, by filtering e.g. using the following criteria:
- Assay title:
TF ChIP-seq - Organism:
C. elegans - Genome assembly:
ce11 - Project:
modENCODE
Peak files for 12 ChIP-seq datasets have been already downloaded from this database.
13.2 Find motifs enriched in xnd-1 ChIP-seq
- Check the
memewebsite to identify which tool is best suited to identify a motif de novo in a set of peaks from a ChIP-seq experiment. - What do you need to run
xstremeon a set of peaks?
13.2.1 Preparing meme input
- Import
xnd-1peaks in R as aGRangesobject. - Recover
ce11genome sequence using theBSgenome.Celegans.UCSC.ce11package. - Extract sequence over
xnd-1peaks with theBiostringspackage. - Export the sequences as a
fastafile.
13.2.2 Running meme
Identify motifs enriched in
xnd-1peaks withxstremeusing the following options:- Zero or one occurence per sequence at most for
meme - At most 3 motifs
- A min motif width of 6 for
meme - A max motif width of 15 for
meme - No
strememotifs - Over multiple processors in parallel
- Zero or one occurence per sequence at most for
- Check the results. Do they make sense in the light of recent publications? (see here)
13.3 Compare all peaks to genomic features
13.3.1 Import all peaks in R
- In R, list all the
bedfiles available for the peaks. - Import each file as a
GRangesobject.
13.3.2 Define genomic features in ce11
Genomic features can be easily annotated from a set of gene features. ChIPseeker facilitates the annotation of ChIP-seq peaks using gene annotations directly provided in R by TxDb gene annotation packages, e.g. TxDb.Celegans.UCSC.ce11.ensGene
- Install the
TxDb.Celegans.UCSC.ce11.ensGenepackage in R - Install
ChIPseeker
13.3.3 ChIP-seq peak overlaps with genomic features
- Use
ChIPseekerto annotate ChIP-seq peaks for a single ChIP-seq experiment. - Extract
- Iterate over all the peak sets to compile their annotations across C. elegans genome. To iterate over each set of peaks in the
peakslist and return an aggregatedata.frame, use theimap_dfr.
- Generate a barplot to represent the % of peaks in each type of genomic features, for the 12 TFs.
- Comment on the distribution of the 12 sets of peaks over genomic features.
13.3.4 Peaks occurrence over tissue-specific regulatory elements
- Import regulatory elements annotated in C. elegans in R, as seen in previous Lab.
- Check for overlap of each regulatory element with
xnd-1peaks. - Check the overlap of germline-specific regulatory element with
xnd-1peaks. - Check the enrichment of germline-specific regulatory element over
xnd-1peaks.
- For each tissue (
Germline,Neurons,Muscle,Hypod.andIntest.), check whether the tissue-specific REs are enriched inxnd-1peak.
Tip
- You can iterate over each
tissueand return an aggregateddata.frameusingmap_dfr. - For each iteration, you can transform the result of
fisher.test()into atibblewith theglancefunction frombroompackage.
- Perform the same operation by iterating over each peak set in the list of imported peaks.
Tip
- You can use two nested
map_dfr, iterating first over each TF then over each tissue.
- For each TF, filter the tissues in which it is preferentially enriched (odds ratio >= 2, p.value <= 0.05)
- Find TFs enriched over Intestine REs
- Check the STRING DB website to assess whether these TFs have been shown to interact together. Comment.