12  Demo 3 - Processing of ChIP-seq data

Aims
  • Manually process Scc1 ChIP-seq reads
  • Generate IP/input ratios with bamCoverage
  • Call peaks and inspect them visually
Datasets

Cohesin (Scc1) ChIP-seq data was published in Verzijlbergen et al., eLife 2014

  • Scc1 ChIP-seq IP: SRR1103930
  • Scc1 ChIP-seq input: SRR1103928

12.1 Getting data

12.1.1 Downloading reads from internet

We can download the single-end reads directly from the internet.

sh
cd ~/EpigenomicsDataAnalysis_2023
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/000/SRR1103930/SRR1103930.fastq.gz -o data/Scc1_IP.fq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR110/008/SRR1103928/SRR1103928.fastq.gz -o data/Scc1_Inp.fq.gz

12.1.2 FastQC reads

fastqc program will run quick QCs on each fastq file separately.

sh
fastqc \
    --outdir data/fastqc \
    --noextract \
    --threads 12 \
    --adapters /home/rsg/repos/autobcl2fastq/adapters.txt \
    data/Scc1*fq.gz 1>&2

12.2 Align reads to a genome reference

12.2.1 Mapping single-end IP and input reads

sh
for FILE in IP Inp
do
    bowtie2 \
        --threads 16 \
        -x data/genome/R64-1-1 \
        -U data/Scc1_"${FILE}".fq.gz \
        > data/mapping/Scc1_"${FILE}"_R64-1-1.sam
done

12.2.2 Filtering mapped fragments

sh
SAMTOOLS_OPTIONS="-@ 12 --output-fmt bam"

for FILE in IP Inp
do
    samtools sort "${SAMTOOLS_OPTIONS}" data/mapping/Scc1_"${FILE}"_R64-1-1.sam | \
        samtools markdup "${SAMTOOLS_OPTIONS}" -r - - | \
        samtools view "${SAMTOOLS_OPTIONS}" -q 20 --fast -b - | \
        samtools sort "${SAMTOOLS_OPTIONS}" -l 9 -o data/mapping/Scc1_"${FILE}"_R64-1-1.bam
    samtools index -@ 12 data/mapping/Scc1_"${FILE}"_R64-1-1.bam
done

12.2.3 Create coverage track

sh
for FILE in IP Inp
do
    bamCoverage \
        --bam data/mapping/Scc1_"${FILE}"_R64-1-1.bam \
        --outFileName data/tracks/Scc1_"${FILE}"_R64-1-1.CPM.bw \
        --binSize 1 \
        --numberOfProcessors 16 \
        --normalizeUsing CPM \
        --skipNonCoveredRegions \
        --extendReads 220
done

12.2.4 Create IP/inp track

sh
bamCompare \
    -b1 data/mapping/Scc1_IP_R64-1-1.bam \
    -b2 data/mapping/Scc1_Inp_R64-1-1.bam \
    --outFileName data/tracks/Scc1_IP-vs-Inp.log2.bw \
    --scaleFactorsMethod readCount \
    --operation log2 \
    --skipZeroOverZero \
    --skipNonCoveredRegions \
    --skipNAs \
    --numberOfProcessors 16 \
    --binSize 1 \
    --extendReads 220 

12.2.5 Call peaks with macs2

sh
macs2 callpeak \
    -t data/mapping/Scc1_IP_R64-1-1.bam \
    -c data/mapping/Scc1_Inp_R64-1-1.bam \
    --format BAM \
    --nomodel --extsize 220 \
    --gsize 13000000 \
    --outdir data/peaks/ \
    --name Scc1_IP-vs-Inp