HiCool::HiCool() automatically processes paired-end HiC sequencing files by performing the following steps:

  1. Automatically setting up an appropriate conda environment using basilisk;

  2. Mapping the reads to the provided genome reference using hicstuff and filtering of irrelevant pairs;

  3. Filtering the resulting pairs file to remove unwanted chromosomes (e.g. chrM);

  4. Binning the filtered pairs into a cool file at a chosen resolution;

  5. Generating a multi-resolution mcool file;

  6. Normalizing matrices at each resolution by iterative corretion using cooler.

The filtering strategy used by hicstuff is described in Cournac et al., BMC Genomics 2012.

HiCool(
  r1 = "~/repos/tinyMapper/tests/testHiC_R1.fq.gz",
  r2 = "~/repos/tinyMapper/tests/testHiC_R2.fq.gz",
  genome = "R64-1-1",
  restriction = "DpnII,HinfI",
  binning = NULL,
  iterative = TRUE,
  balancing_args = " --min-nnz 10 --mad-max 5 ",
  threads = 1L,
  exclude_chr = "Mito|chrM|MT",
  output = "HiCool",
  keep_bam = FALSE,
  build_report = TRUE,
  scratch = tempdir()
)

importHiCoolFolder(output, hash, resolution = NULL)

getHiCoolArgs(log)

getHicStats(log)

Arguments

r1

Path to fastq file (R1 read)

r2

Path to fastq file (R2 read)

genome

Genome used to map the reads on, provided either as a fasta file (in which case the bowtie2 index will be automatically generated), or as a prefix to a bowtie2 index (e.g. mm10 for mm10.*.bt2 files). Genome can also be a unique ID for the following references: hg38, mm10, dm6, R64-1-1, GRZc10, WBcel235, Galgal4.

restriction

Restriction enzyme(s) used in HiC (Default: "DpnII,HinfI")

binning

First resolution used to bin the final mcool file (Default: 10000 for hg38 and mm10, 1000 for dm6, R64-1-1, ...)

iterative

Should the read mapping be performed iteratively? (Default: TRUE)

balancing_args

Balancing arguments for cooler. See cooler documentation here for a list of all available balancing arguments. These defaults match those used by the 4DN consortium.

threads

Number of CPUs used for parallelization. (Default: 1)

exclude_chr

Chromosomes excluded from the final .mcool file. This will not affect the pairs file. (Default: "Mito|chrM|MT")

output

Output folder used by HiCool.

keep_bam

Should the bam files be kept? (Default: FALSE)

build_report

Should an automated report be computed? (Default: TRUE)

scratch

Path to temporary directory where processing will take place. (Default: tempdir())

hash

Unique 6-letter ID used to identify files from a specific HiCool processing run.

resolution

Resolution used to import the mcool file

log

Path to log file generated by hicstuff/hicool

Value

A CoolFile object with prefilled pairsFile and metadata slots.

HiCool utils

  • importHiCoolFolder(folder, hash) automatically finds the different processed files associated with a specific HiCool::HiCool() processing hash ID.

  • getHiCoolArgs() parses the log file generated by HiCool::HiCool() during processing to recover which arguments were used.

  • getHicStats() parses the log file generated by HiCool::HiCool() during processing to recover pre-computed stats about pair numbers, filtering thresholds, etc.

Examples

r1 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
r2 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
hcf <- HiCool(r1, r2, genome = 'R64-1-1', output = './HiCool/')
#> HiCool :: Recovering bowtie2 genome index from AWS iGenomes...
#> HiCool :: Initializing processing of fastq files [tmp folder: /tmp/Rtmpb2CCp7/WL4DIE]...
#> HiCool :: Mapping fastq files...
#> HiCool :: Tidying up everything for you...
#> HiCool :: .fastq to .mcool processing done!
#> HiCool :: Check ./HiCool/folder to find the generated files
#> HiCool :: Generating HiCool report. This might take a while.
#> HiCool :: Report generated and available @ /__w/HiCool/HiCool/docs/reference/HiCool/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.html
#> HiCool :: All processing successfully achieved. Congrats!
hcf
#> CoolFile object
#> .mcool file: ./HiCool//matrices/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.mcool 
#> resolution: 1000 
#> pairs file: ./HiCool//pairs/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.pairs 
#> metadata(3): log args stats
getHiCoolArgs(metadata(hcf)$log)
#> $r1
#> [1] "/github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833"
#> 
#> $r2
#> [1] "/github/home/.cache/R/ExperimentHub/8df408f067f_7834"
#> 
#> $genome
#> [1] "/tmp/Rtmpb2CCp7/R64-1-1"
#> 
#> $binning
#> [1] "1000"
#> 
#> $restriction
#> [1] "DpnII,HinfI"
#> 
#> $iterative
#> [1] TRUE
#> 
#> $balancing_args
#> [1] " --min-nnz 10 --mad-max 5 "
#> 
#> $threads
#> [1] 1
#> 
#> $output
#> [1] "./HiCool/"
#> 
#> $exclude_chr
#> [1] "Mito|chrM|MT"
#> 
#> $keep_bam
#> [1] FALSE
#> 
#> $scratch
#> [1] "/tmp/Rtmpb2CCp7"
#> 
#> $wd
#> [1] "/__w/HiCool/HiCool/docs/reference"
#> 
getHicStats(metadata(hcf)$log)
#> $nFragments
#> [1] 1e+05
#> 
#> $nPairs
#> [1] 64761
#> 
#> $nDangling
#> [1] 9266
#> 
#> $nSelf
#> [1] 1910
#> 
#> $nDumped
#> [1] 32
#> 
#> $nFiltered
#> [1] 53553
#> 
#> $nDups
#> [1] 613
#> 
#> $nUnique
#> [1] 52940
#> 
#> $threshold_uncut
#> [1] 7
#> 
#> $threshold_self
#> [1] 7
#> 
readLines(metadata(hcf)$log)
#>  [1] "HiCool working directory ::: /__w/HiCool/HiCool/docs/reference"                                                                                                                                                                                                                                                                              
#>  [2] "HiCool argument ::: r1: /github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833"                                                                                                                                                                                                                                                                
#>  [3] "HiCool argument ::: r2: /github/home/.cache/R/ExperimentHub/8df408f067f_7834"                                                                                                                                                                                                                                                                
#>  [4] "HiCool argument ::: genome: /tmp/Rtmpb2CCp7/R64-1-1"                                                                                                                                                                                                                                                                                         
#>  [5] "HiCool argument ::: binning: 1000"                                                                                                                                                                                                                                                                                                           
#>  [6] "HiCool argument ::: restriction: DpnII,HinfI"                                                                                                                                                                                                                                                                                                
#>  [7] "HiCool argument ::: iterative: TRUE"                                                                                                                                                                                                                                                                                                         
#>  [8] "HiCool argument ::: balancing_args:  --min-nnz 10 --mad-max 5 "                                                                                                                                                                                                                                                                              
#>  [9] "HiCool argument ::: threads: 1"                                                                                                                                                                                                                                                                                                              
#> [10] "HiCool argument ::: output: ./HiCool/"                                                                                                                                                                                                                                                                                                       
#> [11] "HiCool argument ::: exclude_chr: Mito|chrM|MT"                                                                                                                                                                                                                                                                                               
#> [12] "HiCool argument ::: keep_bam: FALSE"                                                                                                                                                                                                                                                                                                         
#> [13] "HiCool argument ::: scratch: /tmp/Rtmpb2CCp7"                                                                                                                                                                                                                                                                                                
#> [14] "----------------"                                                                                                                                                                                                                                                                                                                            
#> [15] "## hicstuff: v3.2.4 log file"                                                                                                                                                                                                                                                                                                                
#> [16] "## date: 2026-03-13 12:51:45"                                                                                                                                                                                                                                                                                                                
#> [17] "## enzyme: DpnII,HinfI"                                                                                                                                                                                                                                                                                                                      
#> [18] "## input1: /github/home/.cache/R/ExperimentHub/8df4ecb67ee_7833 "                                                                                                                                                                                                                                                                            
#> [19] "## input2: /github/home/.cache/R/ExperimentHub/8df408f067f_7834"                                                                                                                                                                                                                                                                             
#> [20] "## ref: /tmp/Rtmpb2CCp7/R64-1-1"                                                                                                                                                                                                                                                                                                             
#> [21] "---"                                                                                                                                                                                                                                                                                                                                         
#> [22] "2026-03-13,12:51:45 :: INFO :: The default output format is now `.cool`. The Hi-C matrix will be generated with cooler v0.10.3 (Abdennur & Mirny, Bioinformatics 2020)."                                                                                                                                                                     
#> [23] "2026-03-13,12:51:48 :: INFO :: Checking content of fastq files."                                                                                                                                                                                                                                                                             
#> [24] "2026-03-13,12:51:48 :: INFO :: 100000 reads found in each fastq file."                                                                                                                                                                                                                                                                       
#> [25] "2026-03-13,12:51:48 :: INFO :: 100000 reads to parse"                                                                                                                                                                                                                                                                                        
#> [26] "2026-03-13,12:51:48 :: INFO :: Truncating unaligned reads to 20bp and mapping."                                                                                                                                                                                                                                                              
#> [27] "2026-03-13,12:51:49 :: INFO :: 100000 reads left to map."                                                                                                                                                                                                                                                                                    
#> [28] "2026-03-13,12:51:49 :: INFO :: Trying to map unaligned reads at full length (35bp)."                                                                                                                                                                                                                                                         
#> [29] "2026-03-13,12:51:52 :: INFO :: 23337 reads left to map."                                                                                                                                                                                                                                                                                     
#> [30] "2026-03-13,12:51:52 :: INFO :: 76663 reads aligned / 100000 total reads."                                                                                                                                                                                                                                                                    
#> [31] "2026-03-13,12:51:53 :: INFO :: 100000 reads to parse"                                                                                                                                                                                                                                                                                        
#> [32] "2026-03-13,12:51:53 :: INFO :: Truncating unaligned reads to 20bp and mapping."                                                                                                                                                                                                                                                              
#> [33] "2026-03-13,12:51:53 :: INFO :: 100000 reads left to map."                                                                                                                                                                                                                                                                                    
#> [34] "2026-03-13,12:51:54 :: INFO :: Trying to map unaligned reads at full length (35bp)."                                                                                                                                                                                                                                                         
#> [35] "2026-03-13,12:51:57 :: INFO :: 24859 reads left to map."                                                                                                                                                                                                                                                                                     
#> [36] "2026-03-13,12:51:57 :: INFO :: 75141 reads aligned / 100000 total reads."                                                                                                                                                                                                                                                                    
#> [37] "2026-03-13,12:52:01 :: INFO :: 76% reads (single ends) mapped with Q >= 30 (151804/200000)"                                                                                                                                                                                                                                                  
#> [38] "2026-03-13,12:52:02 :: INFO :: 64761 pairs successfully mapped (64.76%)"                                                                                                                                                                                                                                                                     
#> [39] "2026-03-13,12:52:02 :: INFO :: Filtering with thresholds: uncuts=7 loops=7"                                                                                                                                                                                                                                                                  
#> [40] "2026-03-13,12:52:03 :: INFO :: Proportion of inter contacts: 21.89% (intra: 41829, inter: 11724)"                                                                                                                                                                                                                                            
#> [41] "2026-03-13,12:52:03 :: INFO :: 11208 pairs discarded: Loops: 1910, Uncuts: 9266, Weirds: 32"                                                                                                                                                                                                                                                 
#> [42] "2026-03-13,12:52:03 :: INFO :: 53553 pairs kept (82.69%)"                                                                                                                                                                                                                                                                                    
#> [43] "2026-03-13,12:52:05 :: INFO :: 1% PCR duplicates have been filtered out (613 / 53553 pairs) "                                                                                                                                                                                                                                                
#> [44] "2026-03-13,12:52:05 :: INFO :: 52940 pairs remaining after removing PCR duplicates"                                                                                                                                                                                                                                                          
#> [45] "2026-03-13,12:52:05 :: INFO :: Generating matrix from pairs file /tmp/Rtmpb2CCp7/WL4DIE/tmp/8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE.valid_idx_pcrfree.pairs (52940 pairs in the file) "                                                                                                                                                       
#> [46] "2026-03-13,12:52:15 :: INFO :: Fetching mapping and pairing stats"                                                                                                                                                                                                                                                                           
#> [47] "2026-03-13,12:52:15 :: INFO :: {'Sample': '8df4ecb67ee_7833^mapped-R64-1-1^WL4DIE', 'Total read pairs': 100000, 'Mapped reads': 151804, 'Unmapped reads': 48196, 'Recovered contacts': 64761, 'Final contacts': 52940, 'Removed contacts': 11821, 'Filtered out': 11208, 'Loops': 1910, 'Uncuts': 9266, 'Weirds': 32, 'PCR duplicates': 613}"
#> [48] "2026-03-13,12:52:16 :: INFO :: Contact map generated after 0h 0m 30s"