3 Demo 1 - Processing of MNase-seq data
Aims
- Fetching an MNase-seq dataset from GEO
- Indexing a genome with
bowtie2 - Map paired-end reads with
bowtie2 - Generate sequencing-depth normalized track
- Generate nucleosomes track
Datasets
3.1 Getting data
3.1.1 Downloading reads from internet
We can download the paired-end reads (R1 and R2 fastq files) directly from the internet.
3.1.2 FastQC reads
fastqc program will run quick QCs on each fastq file separately.
3.2 Pre-process reads
3.2.1 Trimming reads with trim_galore
3.2.2 FastQC on trimmed reads
3.3 Align reads to a genome reference
3.3.1 Indexing sacCer3 genome
Genome references for model systems can be fetched from iGenomes.
sh
## - Fetching data from iGenomes
mkdir data/genome
curl -L http://igenomes.illumina.com.s3-website-us-east-1.amazonaws.com/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Saccharomyces_cerevisiae_Ensembl_R64-1-1.tar.gz -o data/genome/R64-1-1.tar.gz
## - Unpacking data
tar -C data/genome/ -xzf data/genome/R64-1-1.tar.gz
cp data/genome/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa data/genome/R64-1-1.fa
cp data/genome/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf data/genome/R64-1-1.gtf
## - Indexing genome
bowtie2-build data/genome/R64-1-1.fa data/genome/R64-1-13.3.2 Mapping paired-end trimmed reads
3.3.3 Filtering mapped fragments
sh
## - Fixing mates
#| -m: "Add mate score tag"
samtools fixmate \
-@ 16 --output-fmt bam \
-m \
data/mapping/MNase_R64-1-1.sam data/mapping/MNase_R64-1-1.bam
## - Sorting read pairs
samtools sort \
-@ 16 --output-fmt bam \
data/mapping/MNase_R64-1-1.bam \
-o data/mapping/MNase_R64-1-1_sorted.bam
## - Removing PCR & optical duplicates
#| -s: "Report stats"
#| -r: "Remove duplicate reads"
samtools markdup \
-@ 16 --output-fmt bam \
-s -r \
data/mapping/MNase_R64-1-1_sorted.bam \
data/mapping/MNase_R64-1-1_sorted_noDups.bam
## - Filter read pairs
#| -f 0x001: "Keep read paired"
#| -f 0x002: "Keep read mapped in proper pair"
#| -F 0x004: "Remove read unmapped"
#| -F 0x008: "Remove mate unmapped"
#| -q 10: "MAPQ > 20"
#| --fast: "Use fast bam compression"
samtools view \
-@ 16 --output-fmt bam \
-f 0x001 -f 0x002 -F 0x004 -F 0x008 -q 20 \
--fast \
data/mapping/MNase_R64-1-1_sorted_noDups.bam \
-o data/mapping/MNase_R64-1-1_sorted_noDups_filtered.bam
## - Sorting read pairs
#| -l 9: "Use best compression "
samtools sort \
-@ 16 --output-fmt bam \
-l 9 \
data/mapping/MNase_R64-1-1_sorted_noDups_filtered.bam \
-o data/mapping/MNase_R64-1-1_sorted_noDups_filtered_sorted.bam
## - Indexing bam file
samtools index -@ 16 data/mapping/MNase_R64-1-1_sorted_noDups_filtered_sorted.bam3.3.4 Create coverage track
sh
## - Generate coverage
mkdir data/tracks
bamCoverage \
--bam data/mapping/MNase_R64-1-1_sorted_noDups_filtered_sorted.bam \
--outFileName data/tracks/MNase_R64-1-1_sorted_noDups_filtered_sorted.CPM.bw \
--binSize 1 \
--numberOfProcessors 16 \
--normalizeUsing CPM \
--skipNonCoveredRegions \
--extendReads3.3.5 Create nucleosome track
sh
bamCoverage \
--bam data/mapping/MNase_R64-1-1_sorted_noDups_filtered_sorted.bam \
--outFileName data/tracks/MNase_R64-1-1_sorted_noDups_filtered_sorted.135-160bp.CPM.bw \
--binSize 1 \
--numberOfProcessors 16 \
--normalizeUsing CPM \
--skipNonCoveredRegions \
--extendReads 40 \
--centerReads \
--minFragmentLength 130 \
--maxFragmentLength 165