15  Lab 8 - Single-cell ATAC-seq analysis workflow

ATAC-seq data may be obtained in isolation using a single-cell ATAC-seq protocol (e.g. 10X scATACseq) or in combination with gene expression data using a single-cell multiome protocole (e.g. 10X multiome and SHARE-seq).

Several packages are currently avaialble to process scATAC-seq data in R. These include Signac and ArchR. This lab will closely follow the processing steps outlined in Signac, which interfaces well with Seurat for single-cell analysis.

In this lab, we will process a PBMC single-cell ATAC-seq (scATAC-seq) dataset and perform preliminary analysis to assess quality of these data. The data for this lab comes from 10X Genomics. The dataset contains roughly ~ 10,000 cells.

Overarching goals
  1. Import cells from human PBMC scATACseq dataset
  2. Perform scATACseq quality controls and checks
  3. Filter, normalize and plot PBMC scATACseq dataset
  4. Compute gene activity scores, check known markers and annotate scATAC clusters
  5. Perform differential peak accessibility analysis

15.1 Process human PBMC dataset

15.1.1 Download data

  • Download the files related to scATACseq of all human PBMC cells.

Data comes from 10X Genomics.

Direct download links are provided below

R
dir.create('scATAC')
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5", "scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_peaks.bed", "scATAC/atac_v1_pbmc_10k_peaks.bed")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_singlecell.csv", "scATAC/atac_v1_pbmc_10k_singlecell.csv")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz", "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz.tbi", "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz.tbi")

15.1.2 Import data

Notice how the count matrix is in a .h5 format. We have already encountered this format in Lab3. Back then, we imported it with DropletUtils::read10xCounts.

  • Does this function work here?
R
DropletUtils::read10xCounts("scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")

This works because 10X Genomics make sure to distribute files in .h5 format that are consistent accross single-cell sequencing methodologies. However, the SingleCellExperiment obtained with this approach is not the most convenient, as it cannot natively leverage fragments file (see below).

Instead, we can create a Signac object, a flavour of Seurat objects.

  • Import counts matrix and feature annotations using an import function provided by Seurat.
R
library(Seurat)
library(Signac)
library(rtracklayer)
library(stringr)
cnts <- Read10X_h5('scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5')
features <- import('scATAC/atac_v1_pbmc_10k_peaks.bed')
features$peak <- as.character(features) |> str_replace(':', '-')
metadata <- read.csv(
  file = "scATAC/atac_v1_pbmc_10k_singlecell.csv",
  header = TRUE,
  row.names = 1
)
  • How many accessible genomic segments were found in this dataset?
R
features
length(features)

15.1.3 Create a Seurat object

The next step is to aggregate counts and features into a ChromatinAssay, a scATAC-seq flavour of Seurat standard Assays. The documentation for ?CreateChromatinAssay indicates that the user can provide:

  1. A fragments file, corresponding to the full list of all unique fragments mapped across all single cells.
  2. Genomic annotations to the ChromatinAssay, corresponding to gene annotations, promoter positions, etc. Such annotations can be generated from Ensembl.
  • Generate human annotations from Ensembl using a parsing function from Seurat.
R
## - Get human gene annotations (hg19/GRCh37) to feed it into the future `ChromatinAssay`
BiocManager::install('EnsDb.Hsapiens.v75')
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75)
seqlevelsStyle(annotations) <- 'UCSC'
  • Create a ChromatinAssay using counts, features, fragments and annotations.
R
## - Create Chromatin Assay
assay <- Signac::CreateChromatinAssay(
    counts = cnts,
    ranges = features,
    fragments = "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz",
    annotation = annotations,
    genome = "hg19",
    min.cells = 10, 
    min.features = 10
)
assay
  • What are the dimensions of this object? Are they comparable to the count matrix? Comment.
R
dim(cnts)
dim(assay)

It’s finally time to wrap the ChromatinAssay into a Seurat standard object. This is done using the CreateSeuratObject, as already covered in Lab6

  • Create a PBMC Seurat object.
R
## - Create Seurat object
PBMC <- Seurat::CreateSeuratObject(
    counts = assay,
    assay = 'ATAC', 
    meta.data = metadata[metadata$is__cell_barcode == 1, ]
)
PBMC

PBMC[['ATAC']]
granges(PBMC)
Annotation(PBMC)

15.1.4 Check QCs

15.1.4.1 Cell-based QCs

The fraction of reads in peaks (FRiP) is a good indicator of how well each cell was handled during scATACseq processing.

R
PBMC$FRiP <- PBMC$peak_region_fragments / PBMC$passed_filters
PBMC$nCount_ATAC <- colSums(GetAssayData(PBMC, slot = "counts"))
PBMC$nFeature_ATAC <- colSums(GetAssayData(PBMC, slot = "counts") > 0)

quantile(PBMC$FRiP, seq(0, 1, 0.1))
quantile(PBMC$nCount_ATAC, seq(0, 1, 0.1))
quantile(PBMC$nFeature_ATAC, seq(0, 1, 0.1))

15.1.4.2 Peaks-based QCs

  • Which analysis are the fragments required for, exactly?
  • Could we still perform normalization/clustering/annotation without them? And motif enrichment analysis?

Since we do have the fragments file at hand, most of the QC steps are available (e.g. TSSEnrichment, NucleosomeSignal or fragment size distribution). Let’s go through them one by one.

R
# compute nucleosome signal score per cell
PBMC <- NucleosomeSignal(object = PBMC)

# compute TSS enrichment score per cell
PBMC <- Signac::TSSEnrichment(object = PBMC, fast = FALSE)

The TSSPlot function from Signac can be used to plot the fragment count per peak ~ TSS enrichment.

R
PBMC$high.tss <- ifelse(PBMC$TSS.enrichment > 3.5, 'High', 'Low')
TSSPlot(PBMC, group.by = 'high.tss') + NoLegend()
PBMC$high.tss <- ifelse(PBMC$TSS.enrichment > 2.5, 'High', 'Low')
TSSPlot(PBMC, group.by = 'high.tss') + NoLegend()

The FragmentHistogram function from Signac can be used to plot the fragment size distribution in peaks with different nucleosome signals.

R
PBMC$nucleosome_group <- ifelse(PBMC$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
FragmentHistogram(object = PBMC, group.by = 'nucleosome_group')

15.1.5 Filter cells and features

  • Filter the Seurat object (cells and features) as deemed appropriate.
R
## - Filter data 
PBMC <- subset(PBMC, subset = nCount_ATAC > 3000 & nCount_ATAC < 100000) 
PBMC <- subset(PBMC, subset = nFeature_ATAC > 1000 & nFeature_ATAC < 20000) 
PBMC <- subset(PBMC, subset = FRiP > 0.30) 
## - Remove peaks with low coverage
PBMC <- PBMC[rowSums(GetAssayData(PBMC, slot = "counts")) > 10, ]
PBMC <- PBMC[rowSums(GetAssayData(PBMC, slot = "counts") > 0) > 10, ]

15.1.6 Dimensionality reduction and clustering

  • Now that the dataset is filtered, normalize (by using TF-IDF approach) then further reduce the dimensionality for visualization purposes.
R
## - Normalize data 
PBMC <- RunTFIDF(PBMC) 

## - Reduce dimensionality
PBMC <- FindTopFeatures(PBMC, min.cutoff = 'q50') 
PBMC <- RunSVD(PBMC) 

## - Embed in 2D UMAP
PBMC <- RunUMAP(object = PBMC, reduction = 'lsi', dims = 2:30)

## - Cluster cells
PBMC <- FindNeighbors(object = PBMC, reduction = 'lsi', dims = 2:30)
PBMC <- FindClusters(object = PBMC, verbose = FALSE, algorithm = 3)

## - Visualize data 
p <- DimPlot(PBMC, label = TRUE) + NoLegend()
  • What can you observe in the UMAP projection of the dataset? Comment on the separation of some cell types into different spatially-resolved clusters.

15.2 Compute gene activity scores

Signac’s GeneActivity() function require scATACseq fragment information. Since we have them, we can estimate a gene activity score for each gene in the annotations.

R
gene.activities <- GeneActivity(PBMC)

We can now save this new object as an Assay in the PBMC object and normalize it.

R
PBMC[['RNA']] <- CreateAssayObject(counts = gene.activities)

# - Normalize the new RNA assay, this time with `SCTransform`
PBMC <- SCTransform(
  object = PBMC,
  assay = 'RNA',
)

PBMC

One can now visualize expression of individual markers across clusters to infer cluster identity.

R
p <- VlnPlot(
  object = PBMC,
  features = c(
    'MS4A1', # B-cell
    'CD3D', # T-cell
    'NKG7', # NK cell
    'TREM1' # Monocytes
  ), 
  assay = "SCT"
)
PBMC$annotation <- dplyr::case_when(
  Idents(PBMC) %in% c(0, 5, 9, 12) ~ "Monocytes", 
  Idents(PBMC) %in% c(1, 2, 3, 4, 7, 10) ~ "T-cells", 
  Idents(PBMC) %in% c(6) ~ "NK-cells", 
  Idents(PBMC) %in% c(8, 11) ~ "B-cells", 
  Idents(PBMC) %in% c(13, 14) ~ "DC"
)
p <- DimPlot(PBMC, group.by = 'annotation', label = TRUE) + NoLegend()

15.3 Find peaks differentially accessible across clusters

R
Idents(PBMC) <- PBMC$annotation
da_peaks <- FindMarkers(
  object = PBMC,
  ident.1 = "Monocytes",
  ident.2 = "B-cells",
  test.use = 'LR',
  latent.vars = 'nCount_ATAC'
)
p <- VlnPlot(
  object = PBMC,
  features = rownames(da_peaks)[1],
  pt.size = 0.1,
  idents = c("Monocytes","B-cells")
)