HiCool::HiCool() automatically processes paired-end HiC sequencing files
by performing the following steps:
Automatically setting up an appropriate conda environment using basilisk;
Mapping the reads to the provided genome reference using hicstuff and filtering of irrelevant pairs;
Filtering the resulting pairs file to remove unwanted chromosomes (e.g. chrM);
Binning the filtered pairs into a cool file at a chosen resolution;
Generating a multi-resolution mcool file;
Normalizing matrices at each resolution by iterative corretion using cooler.
The filtering strategy used by hicstuff is described in Cournac et al., BMC Genomics 2012.
HiCool(
r1 = "~/repos/tinyMapper/tests/testHiC_R1.fq.gz",
r2 = "~/repos/tinyMapper/tests/testHiC_R2.fq.gz",
genome = "R64-1-1",
restriction = "DpnII,HinfI",
binning = NULL,
iterative = TRUE,
balancing_args = " --min-nnz 10 --mad-max 5 ",
threads = 1L,
exclude_chr = "Mito|chrM|MT",
output = "HiCool",
keep_bam = FALSE,
build_report = TRUE,
scratch = tempdir()
)
importHiCoolFolder(output, hash, resolution = NULL)
getHiCoolArgs(log)
getHicStats(log)Path to fastq file (R1 read)
Path to fastq file (R2 read)
Genome used to map the reads on, provided either
as a fasta file (in which case the bowtie2 index will be automatically
generated), or as a prefix to a bowtie2 index (e.g. mm10 for
mm10.*.bt2 files). Genome can also be a unique ID for the following
references: hg38, mm10, dm6, R64-1-1, GRZc10, WBcel235,
Galgal4.
Restriction enzyme(s) used in HiC (Default: "DpnII,HinfI")
First resolution used to bin the final mcool file
(Default: 10000 for hg38 and mm10, 1000 for dm6, R64-1-1, ...)
Should the read mapping be performed iteratively? (Default: TRUE)
Balancing arguments for cooler.
See cooler documentation here
for a list of all available balancing arguments.
These defaults match those used by the 4DN consortium.
Number of CPUs used for parallelization. (Default: 1)
Chromosomes excluded from the final .mcool file. This will not affect the pairs file. (Default: "Mito|chrM|MT")
Output folder used by HiCool.
Should the bam files be kept? (Default: FALSE)
Should an automated report be computed? (Default: TRUE)
Path to temporary directory where processing will take place.
(Default: tempdir())
Unique 6-letter ID used to identify files from a specific HiCool processing run.
Resolution used to import the mcool file
Path to log file generated by hicstuff/hicool
A CoolFile object with prefilled pairsFile and metadata slots.
importHiCoolFolder(folder, hash) automatically finds the different processed files
associated with a specific HiCool::HiCool() processing hash ID.
getHiCoolArgs() parses the log file generated by HiCool::HiCool() during processing to recover which arguments were used.
getHicStats() parses the log file generated by HiCool::HiCool() during processing to recover pre-computed stats about pair numbers, filtering thresholds, etc.
r1 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
r2 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
hcf <- HiCool(r1, r2, genome = 'R64-1-1', output = './HiCool/')
#> HiCool :: Recovering bowtie2 genome index from AWS iGenomes...
#> HiCool :: Initializing processing of fastq files [tmp folder: /tmp/Rtmpb2CCp7/WL4DIE]...
#> HiCool :: Mapping fastq files...
#> HiCool :: Tidying up everything for you...
#> HiCool :: .fastq to .mcool processing done!
#> HiCool :: Check ./HiCool/folder to find the generated files
#> HiCool :: Generating HiCool report. This might take a while.
#> HiCool :: Report generated and available @ /__w/HiCool/HiCool/docs/reference/HiCool/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.html
#> HiCool :: All processing successfully achieved. Congrats!
hcf
#> CoolFile object
#> .mcool file: ./HiCool//matrices/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.mcool
#> resolution: 1000
#> pairs file: ./HiCool//pairs/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.pairs
#> metadata(3): log args stats
getHiCoolArgs(metadata(hcf)$log)
#> $r1
#> [1] "/github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833"
#>
#> $r2
#> [1] "/github/home/.cache/R/ExperimentHub/8df408f067f_7834"
#>
#> $genome
#> [1] "/tmp/Rtmpb2CCp7/R64-1-1"
#>
#> $binning
#> [1] "1000"
#>
#> $restriction
#> [1] "DpnII,HinfI"
#>
#> $iterative
#> [1] TRUE
#>
#> $balancing_args
#> [1] " --min-nnz 10 --mad-max 5 "
#>
#> $threads
#> [1] 1
#>
#> $output
#> [1] "./HiCool/"
#>
#> $exclude_chr
#> [1] "Mito|chrM|MT"
#>
#> $keep_bam
#> [1] FALSE
#>
#> $scratch
#> [1] "/tmp/Rtmpb2CCp7"
#>
#> $wd
#> [1] "/__w/HiCool/HiCool/docs/reference"
#>
getHicStats(metadata(hcf)$log)
#> $nFragments
#> [1] 1e+05
#>
#> $nPairs
#> [1] 64761
#>
#> $nDangling
#> [1] 9266
#>
#> $nSelf
#> [1] 1910
#>
#> $nDumped
#> [1] 32
#>
#> $nFiltered
#> [1] 53553
#>
#> $nDups
#> [1] 613
#>
#> $nUnique
#> [1] 52940
#>
#> $threshold_uncut
#> [1] 7
#>
#> $threshold_self
#> [1] 7
#>
readLines(metadata(hcf)$log)
#> [1] "HiCool working directory ::: /__w/HiCool/HiCool/docs/reference"
#> [2] "HiCool argument ::: r1: /github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833"
#> [3] "HiCool argument ::: r2: /github/home/.cache/R/ExperimentHub/8df408f067f_7834"
#> [4] "HiCool argument ::: genome: /tmp/Rtmpb2CCp7/R64-1-1"
#> [5] "HiCool argument ::: binning: 1000"
#> [6] "HiCool argument ::: restriction: DpnII,HinfI"
#> [7] "HiCool argument ::: iterative: TRUE"
#> [8] "HiCool argument ::: balancing_args: --min-nnz 10 --mad-max 5 "
#> [9] "HiCool argument ::: threads: 1"
#> [10] "HiCool argument ::: output: ./HiCool/"
#> [11] "HiCool argument ::: exclude_chr: Mito|chrM|MT"
#> [12] "HiCool argument ::: keep_bam: FALSE"
#> [13] "HiCool argument ::: scratch: /tmp/Rtmpb2CCp7"
#> [14] "----------------"
#> [15] "## hicstuff: v3.2.4 log file"
#> [16] "## date: 2026-03-13 12:51:45"
#> [17] "## enzyme: DpnII,HinfI"
#> [18] "## input1: /github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833 "
#> [19] "## input2: /github/home/.cache/R/ExperimentHub/8df408f067f_7834"
#> [20] "## ref: /tmp/Rtmpb2CCp7/R64-1-1"
#> [21] "---"
#> [22] "2026-03-13,12:51:45 :: INFO :: The default output format is now `.cool`. The Hi-C matrix will be generated with cooler v0.10.3 (Abdennur & Mirny, Bioinformatics 2020)."
#> [23] "2026-03-13,12:51:48 :: INFO :: Checking content of fastq files."
#> [24] "2026-03-13,12:51:48 :: INFO :: 100000 reads found in each fastq file."
#> [25] "2026-03-13,12:51:48 :: INFO :: 100000 reads to parse"
#> [26] "2026-03-13,12:51:48 :: INFO :: Truncating unaligned reads to 20bp and mapping."
#> [27] "2026-03-13,12:51:49 :: INFO :: 100000 reads left to map."
#> [28] "2026-03-13,12:51:49 :: INFO :: Trying to map unaligned reads at full length (35bp)."
#> [29] "2026-03-13,12:51:52 :: INFO :: 23337 reads left to map."
#> [30] "2026-03-13,12:51:52 :: INFO :: 76663 reads aligned / 100000 total reads."
#> [31] "2026-03-13,12:51:53 :: INFO :: 100000 reads to parse"
#> [32] "2026-03-13,12:51:53 :: INFO :: Truncating unaligned reads to 20bp and mapping."
#> [33] "2026-03-13,12:51:53 :: INFO :: 100000 reads left to map."
#> [34] "2026-03-13,12:51:54 :: INFO :: Trying to map unaligned reads at full length (35bp)."
#> [35] "2026-03-13,12:51:57 :: INFO :: 24859 reads left to map."
#> [36] "2026-03-13,12:51:57 :: INFO :: 75141 reads aligned / 100000 total reads."
#> [37] "2026-03-13,12:52:01 :: INFO :: 76% reads (single ends) mapped with Q >= 30 (151804/200000)"
#> [38] "2026-03-13,12:52:02 :: INFO :: 64761 pairs successfully mapped (64.76%)"
#> [39] "2026-03-13,12:52:02 :: INFO :: Filtering with thresholds: uncuts=7 loops=7"
#> [40] "2026-03-13,12:52:03 :: INFO :: Proportion of inter contacts: 21.89% (intra: 41829, inter: 11724)"
#> [41] "2026-03-13,12:52:03 :: INFO :: 11208 pairs discarded: Loops: 1910, Uncuts: 9266, Weirds: 32"
#> [42] "2026-03-13,12:52:03 :: INFO :: 53553 pairs kept (82.69%)"
#> [43] "2026-03-13,12:52:05 :: INFO :: 1% PCR duplicates have been filtered out (613 / 53553 pairs) "
#> [44] "2026-03-13,12:52:05 :: INFO :: 52940 pairs remaining after removing PCR duplicates"
#> [45] "2026-03-13,12:52:05 :: INFO :: Generating matrix from pairs file /tmp/Rtmpb2CCp7/WL4DIE/tmp/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.valid_idx_pcrfree.pairs (52940 pairs in the file) "
#> [46] "2026-03-13,12:52:15 :: INFO :: Fetching mapping and pairing stats"
#> [47] "2026-03-13,12:52:15 :: INFO :: {'Sample': '8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE', 'Total read pairs': 100000, 'Mapped reads': 151804, 'Unmapped reads': 48196, 'Recovered contacts': 64761, 'Final contacts': 52940, 'Removed contacts': 11821, 'Filtered out': 11208, 'Loops': 1910, 'Uncuts': 9266, 'Weirds': 32, 'PCR duplicates': 613}"
#> [48] "2026-03-13,12:52:16 :: INFO :: Contact map generated after 0h 0m 30s"