HiCool::HiCool() automatically processes paired-end HiC sequencing files by performing the following steps:

  1. Automatically setting up an appropriate conda environment using basilisk;

  2. Mapping the reads to the provided genome reference using hicstuff and filtering of irrelevant pairs;

  3. Filtering the resulting pairs file to remove unwanted chromosomes (e.g. chrM);

  4. Binning the filtered pairs into a cool file at a chosen resolution;

  5. Generating a multi-resolution mcool file;

  6. Normalizing matrices at each resolution by iterative corretion using cooler.

The filtering strategy used by hicstuff is described in Cournac et al., BMC Genomics 2012.

HiCool(
  r1,
  r2,
  genome,
  restriction = "DpnII,HinfI",
  resolutions = NULL,
  iterative = TRUE,
  balancing_args = " --min-nnz 10 --mad-max 5 ",
  threads = 1L,
  exclude_chr = "Mito|chrM|MT",
  output = "HiCool",
  keep_bam = FALSE,
  build_report = TRUE,
  scratch = tempdir()
)

importHiCoolFolder(output, hash, resolution = NULL)

getHiCoolArgs(log)

getHicStats(log)

Arguments

r1

Path to fastq file (R1 read)

r2

Path to fastq file (R2 read)

genome

Genome used to map the reads on, provided either as a fasta file (in which case the bowtie2 index will be automatically generated), or as a prefix to a bowtie2 index (e.g. mm10 for mm10.*.bt2 files). Genome can also be a unique ID for the following references: hg38, mm10, dm6, R64-1-1, GRZc10, WBcel235, Galgal4.

restriction

Restriction enzyme(s) used in HiC (Default: "DpnII,HinfI")

resolutions

Resolutions used to bin the final mcool file (Default: 5 levels of resolution automatically inferred according to genome size)

iterative

Should the read mapping be performed iteratively? (Default: TRUE)

balancing_args

Balancing arguments for cooler. See cooler documentation here for a list of all available balancing arguments. These defaults match those used by the 4DN consortium.

threads

Number of CPUs used for parallelization. (Default: 1)

exclude_chr

Chromosomes excluded from the final .mcool file. This will not affect the pairs file. (Default: "Mito|chrM|MT")

output

Output folder used by HiCool.

keep_bam

Should the bam files be kept? (Default: FALSE)

build_report

Should an automated report be computed? (Default: TRUE)

scratch

Path to temporary directory where processing will take place. (Default: tempdir())

hash

Unique 6-letter ID used to identify files from a specific HiCool processing run.

resolution

Resolution used to import the mcool file

log

Path to log file generated by hicstuff/hicool

Value

A CoolFile object with prefilled pairsFile and metadata slots.

HiCool utils

  • importHiCoolFolder(folder, hash) automatically finds the different processed files associated with a specific HiCool::HiCool() processing hash ID.

  • getHiCoolArgs() parses the log file generated by HiCool::HiCool() during processing to recover which arguments were used.

  • getHicStats() parses the log file generated by HiCool::HiCool() during processing to recover pre-computed stats about pair numbers, filtering thresholds, etc.

Examples

r1 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R1')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
r2 <- HiContactsData::HiContactsData(sample = 'yeast_wt', format = 'fastq_R2')
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
hcf <- HiCool(r1, r2, genome = 'R64-1-1', output = './HiCool/')
#> HiCool :: Recovering bowtie2 genome index from AWS iGenomes...
#> HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpLG33tZ/DF8U6P]...
#> HiCool :: Mapping fastq files...
#> HiCool :: Best-suited minimum resolution automatically inferred: 1000
#> HiCool :: Removing unwanted chromosomes...
#> HiCool :: Parsing pairs into .cool file...
#> HiCool :: Generating multi-resolution .mcool file...
#> HiCool :: Balancing .mcool file...
#> HiCool :: Tidying up everything for you...
#> HiCool :: .fastq to .mcool processing done!
#> HiCool :: Check ./HiCool/folder to find the generated files
#> HiCool :: Generating HiCool report. This might take a while.
#> HiCool :: Report generated and available @ /__w/HiCool/HiCool/docs/reference/HiCool/1fe971273e9d_7833^mapped-R64-1-1^DF8U6P.html
#> HiCool :: All processing successfully achieved. Congrats!
hcf
#> CoolFile object
#> .mcool file: ./HiCool//matrices/1fe971273e9d_7833^mapped-R64-1-1^DF8U6P.mcool 
#> resolution: 1000 
#> pairs file: ./HiCool//pairs/1fe971273e9d_7833^mapped-R64-1-1^DF8U6P.pairs 
#> metadata(3): log args stats
getHiCoolArgs(S4Vectors::metadata(hcf)$log)
#> $r1
#> [1] "/github/home/.cache/R/ExperimentHub/1fe971273e9d_7833"
#> 
#> $r2
#> [1] "/github/home/.cache/R/ExperimentHub/1fe923fbe457_7834"
#> 
#> $genome
#> [1] "/tmp/RtmpLG33tZ/R64-1-1"
#> 
#> $resolutions
#> [1] NA
#> 
#> $restriction
#> [1] "DpnII,HinfI"
#> 
#> $iterative
#> [1] TRUE
#> 
#> $balancing_args
#> [1] " --min-nnz 10 --mad-max 5 "
#> 
#> $threads
#> [1] 1
#> 
#> $output
#> [1] "./HiCool/"
#> 
#> $exclude_chr
#> [1] "Mito|chrM|MT"
#> 
#> $keep_bam
#> [1] FALSE
#> 
#> $scratch
#> [1] "/tmp/RtmpLG33tZ"
#> 
#> $wd
#> [1] "/__w/HiCool/HiCool/docs/reference"
#> 
getHicStats(S4Vectors::metadata(hcf)$log)
#> $nFragments
#> [1] 1e+05
#> 
#> $nPairs
#> [1] 73993
#> 
#> $nDangling
#> [1] 10027
#> 
#> $nSelf
#> [1] 2205
#> 
#> $nDumped
#> [1] 83
#> 
#> $nFiltered
#> [1] 61678
#> 
#> $nDups
#> [1] 719
#> 
#> $nUnique
#> [1] 60959
#> 
#> $threshold_uncut
#> [1] 7
#> 
#> $threshold_self
#> [1] 7
#> 
readLines(S4Vectors::metadata(hcf)$log)
#>  [1] "HiCool working directory ::: /__w/HiCool/HiCool/docs/reference"                                  
#>  [2] "HiCool argument ::: r1: /github/home/.cache/R/ExperimentHub/1fe971273e9d_7833"                   
#>  [3] "HiCool argument ::: r2: /github/home/.cache/R/ExperimentHub/1fe923fbe457_7834"                   
#>  [4] "HiCool argument ::: genome: /tmp/RtmpLG33tZ/R64-1-1"                                             
#>  [5] "HiCool argument ::: resolutions: "                                                               
#>  [6] "HiCool argument ::: restriction: DpnII,HinfI"                                                    
#>  [7] "HiCool argument ::: iterative: TRUE"                                                             
#>  [8] "HiCool argument ::: balancing_args:  --min-nnz 10 --mad-max 5 "                                  
#>  [9] "HiCool argument ::: threads: 1"                                                                  
#> [10] "HiCool argument ::: output: ./HiCool/"                                                           
#> [11] "HiCool argument ::: exclude_chr: Mito|chrM|MT"                                                   
#> [12] "HiCool argument ::: keep_bam: FALSE"                                                             
#> [13] "HiCool argument ::: scratch: /tmp/RtmpLG33tZ"                                                    
#> [14] "----------------"                                                                                
#> [15] "## hicstuff: v3.1.5 log file"                                                                    
#> [16] "## date: 2023-05-22 12:14:05"                                                                    
#> [17] "## enzyme: DpnII,HinfI"                                                                          
#> [18] "## input1: /github/home/.cache/R/ExperimentHub/1fe971273e9d_7833 "                               
#> [19] "## input2: /github/home/.cache/R/ExperimentHub/1fe923fbe457_7834"                                
#> [20] "## ref: /tmp/RtmpLG33tZ/R64-1-1"                                                                 
#> [21] "---"                                                                                             
#> [22] "2023-05-22,12:14:08 :: INFO :: 100000 reads to parse"                                            
#> [23] "2023-05-22,12:14:08 :: INFO :: Truncating unaligned reads to 20bp and mapping."                  
#> [24] "2023-05-22,12:14:11 :: INFO :: 19256 reads left to map."                                         
#> [25] "2023-05-22,12:14:11 :: INFO :: Trying to map unaligned reads at full length (35bp)."             
#> [26] "2023-05-22,12:14:12 :: INFO :: 18387 reads left to map."                                         
#> [27] "2023-05-22,12:14:12 :: INFO :: 81613 reads aligned / 100000 total reads."                        
#> [28] "2023-05-22,12:14:13 :: INFO :: 100000 reads to parse"                                            
#> [29] "2023-05-22,12:14:13 :: INFO :: Truncating unaligned reads to 20bp and mapping."                  
#> [30] "2023-05-22,12:14:16 :: INFO :: 20617 reads left to map."                                         
#> [31] "2023-05-22,12:14:16 :: INFO :: Trying to map unaligned reads at full length (35bp)."             
#> [32] "2023-05-22,12:14:16 :: INFO :: 19373 reads left to map."                                         
#> [33] "2023-05-22,12:14:17 :: INFO :: 80627 reads aligned / 100000 total reads."                        
#> [34] "2023-05-22,12:14:21 :: INFO :: 81% reads (single ends) mapped with Q >= 10.0 (162240/200000)"    
#> [35] "2023-05-22,12:14:24 :: INFO :: Filtering with thresholds: uncuts=7 loops=7"                      
#> [36] "2023-05-22,12:14:25 :: INFO :: Proportion of inter contacts: 22.03% (intra: 48089, inter: 13589)"
#> [37] "2023-05-22,12:14:25 :: INFO :: 12315 pairs discarded: Loops: 2205, Uncuts: 10027, Weirds: 83"    
#> [38] "2023-05-22,12:14:25 :: INFO :: 61678 pairs kept (83.36%)"                                        
#> [39] "2023-05-22,12:14:27 :: INFO :: 1% PCR duplicates have been filtered out (719 / 61678 pairs) "    
#> [40] "2023-05-22,12:14:29 :: INFO :: Contact map generated after 0h 0m 23s"