ATAC-seq data may be obtained in isolation using a single-cell ATAC-seq protocol (e.g. 10X scATACseq) or in combination with gene expression data using a single-cell multiome protocole (e.g. 10X multiome and SHARE-seq).
Several packages are currently avaialble to process scATAC-seq data in R. These include Signac and ArchR. This lab will closely follow the processing steps outlined in Signac, which interfaces well with Seurat for single-cell analysis.
In this lab, we will process a PBMC single-cell ATAC-seq (scATAC-seq) dataset and perform preliminary analysis to assess quality of these data. The data for this lab comes from 10X Genomics. The dataset contains roughly ~ 10,000 cells.
Overarching goals
Import cells from human PBMC scATACseq dataset
Perform scATACseq quality controls and checks
Filter, normalize and plot PBMC scATACseq dataset
Compute gene activity scores, check known markers and annotate scATAC clusters
Perform differential peak accessibility analysis
15.1 Process human PBMC dataset
15.1.1 Download data
Download the files related to scATACseq of all human PBMC cells.
Notice how the count matrix is in a .h5 format. We have already encountered this format in Lab3. Back then, we imported it with DropletUtils::read10xCounts.
This works because 10X Genomics make sure to distribute files in .h5 format that are consistent accross single-cell sequencing methodologies. However, the SingleCellExperiment obtained with this approach is not the most convenient, as it cannot natively leverage fragments file (see below).
Instead, we can create a Signac object, a flavour of Seurat objects.
Import counts matrix and feature annotations using an import function provided by Seurat.
The next step is to aggregate counts and features into a ChromatinAssay, a scATAC-seq flavour of Seurat standard Assays. The documentation for ?CreateChromatinAssay indicates that the user can provide:
A fragments file, corresponding to the full list of all unique fragments mapped across all single cells.
Genomic annotations to the ChromatinAssay, corresponding to gene annotations, promoter positions, etc. Such annotations can be generated from Ensembl.
Generate human annotations from Ensembl using a parsing function from Seurat.
R
## - Get human gene annotations (hg19/GRCh37) to feed it into the future `ChromatinAssay`BiocManager::install('EnsDb.Hsapiens.v75')annotations<-GetGRangesFromEnsDb(ensdb =EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75)seqlevelsStyle(annotations)<-'UCSC'
Create a ChromatinAssay using counts, features, fragments and annotations.
Which analysis are the fragments required for, exactly?
Could we still perform normalization/clustering/annotation without them? And motif enrichment analysis?
Since we do have the fragments file at hand, most of the QC steps are available (e.g. TSSEnrichment, NucleosomeSignal or fragment size distribution). Let’s go through them one by one.
R
# compute nucleosome signal score per cellPBMC<-NucleosomeSignal(object =PBMC)# compute TSS enrichment score per cellPBMC<-Signac::TSSEnrichment(object =PBMC, fast =FALSE)
The TSSPlot function from Signac can be used to plot the fragment count per peak ~ TSS enrichment.
What can you observe in the UMAP projection of the dataset? Comment on the separation of some cell types into different spatially-resolved clusters.
15.2 Compute gene activity scores
Signac’s GeneActivity() function require scATACseq fragment information. Since we have them, we can estimate a gene activity score for each gene in the annotations.
R
gene.activities<-GeneActivity(PBMC)
We can now save this new object as an Assay in the PBMC object and normalize it.
R
PBMC[['RNA']]<-CreateAssayObject(counts =gene.activities)# - Normalize the new RNA assay, this time with `SCTransform`PBMC<-SCTransform( object =PBMC, assay ='RNA',)PBMC
One can now visualize expression of individual markers across clusters to infer cluster identity.