bash
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR221/072/SRR22130072/SRR22130072_1.fastq.gz -o Share/reads/HiC_R1.fq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR221/072/SRR22130072/SRR22130072_2.fastq.gz -o Share/reads/HiC_R2.fq.gzWe will process a data from the Koszul lab, generated in 2024 and published in Science.
bwa
fastq files have already been downloaded from SRA, using the following command:
bash
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR221/072/SRR22130072/SRR22130072_1.fastq.gz -o Share/reads/HiC_R1.fq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR221/072/SRR22130072/SRR22130072_2.fastq.gz -o Share/reads/HiC_R2.fq.gzbwa-mem2 allows faster mapping than bowtie2. We will use it for Hi-C mapping.
bwa-mem2 index help to understand how to use it.bwa-mem2.bash
bwa-mem2
bwa-mem2 index -p genomes/R64-1-1 Share/genome/R64-1-1.fabwa-mem2 mem help to understand the different options available.bwa-mem2 mem , and the -SP option.HiC.sam file with samtools stats.bash
bwa-mem2 mem
bwa-mem2 mem -SP5M -t 2 \
genomes/R64-1-1 \
Share/reads/HiC_R1.fq.gz \
Share/reads/HiC_R2.fq.gz > HiC.sam
samtools stats HiC.sam | grep ^SNsam file into pairspairtools parse documentationpairtools parse to convert the sam file into a pairs file. Use the --drop-seq and --drop-sam options to lighten the output file. Also use --output-stats test.stats to generate a summary of the parsing.pairs file with head.pairs file?test.stats file. What is the average mapping quality of the pairs?bash
pairtools parse \
-c ~/Share/genome/R64-1-1.chrom.sizes \
-o HiC.pairs \
--drop-seq --drop-sam \
--output-stats test.stats \
--nproc-in 2 --nproc-out 2 \
Share/mapping/HiC.sam
head -n 50 HiC.pairs
wc -l HiC.pairs
cat test.statsThe following filtering steps should be done to the pairs file:
Sort the pairs file
Remove duplicates
Select only pairs with both ends mapped (ie pair_type=="UU")
Process to pairs filtering with pairtools subcommands.
bash
pairtools sort --nproc 2 -o HiC.sorted.pairs Share/pairs/HiC.pairs
pairtools dedup \
--max-mismatch 3 \
--mark-dups \
--output HiC.sorted.dedup.pairs \
HiC.sorted.pairs
pairtools select 'pair_type=="UU"' HiC.sorted.dedup.pairs -o HiC_filtered.pairscooler cload help to understand how to bin pairs into a contact map.cooler cload to generate a contact map from the pairs file. Bin the data at 1kb resolution.bash
cooler cload pairs \
-c1 2 -p1 3 -c2 4 -p2 5 \
~/Share/genome/R64-1-1.chrom.sizes:1000 \
Share/pairs/HiC_filtered.pairs \
HiC.coolcool file structure with cooler tree.HiC.cool metadata with cooler info.HiC.cool file? How many non-zero values are there? Comment.bash
cooler tree ~/Share/maps/HiC.cool
cooler info ~/Share/maps/HiC.coolcooler zoomify help to understand how to generate a multi-resolution contact map.cooler zoomify to generate a multi-resolution contact map. Don’t forget to include normalization in the output maps.HiC.cool file with cooler info and cooler tree.bash
cooler zoomify -r 1000N --balance -o HiC.mcool ~/Share/maps/HiC.cool
cooler info HiC.mcool::/resolutions/1000
cooler info HiC.mcool::/resolutions/5000
cooler tree HiC.mcool