13 Lab 7: Single-cell ATAC-seq analysis workflow

Notes

The estimated time for this lab is around 1h10.

Aims

Import cells from human PBMC scATACseq dataset
Perform scATACseq quality controls and checks
Filter, normalize and plot PBMC scATACseq dataset
Compute gene activity scores, check known markers and annotate scATAC clusters
Perform differential peak accessibility analysis

ATAC-seq data may be obtained in isolation using a single-cell ATAC-seq protocol (e.g. 10X scATACseq) or in combination with gene expression data using a single-cell multiome protocole (e.g. 10X multiome and SHARE-seq).

Several packages are currently avaialble to process scATAC-seq data in R. These include Signac and ArchR. This lab will closely follow the processing steps outlined in Signac, which interfaces well with Seurat for single-cell analysis.

In this lab, we will process a PBMC single-cell ATAC-seq (scATAC-seq) dataset and perform preliminary analysis to assess quality of these data. The data for this lab comes from 10X Genomics. The dataset contains roughly ~ 10,000 cells.

13.1 Process human PBMC dataset

13.1.1 Download data

Data comes from 10X Genomics. Direct download links are provided below.

Download the files related to scATACseq of all human PBMC cells.

dir.create('scATAC', showWarnings = FALSE)
options(timeout=1000)
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5", "scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_peaks.bed", "scATAC/atac_v1_pbmc_10k_peaks.bed")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_singlecell.csv", "scATAC/atac_v1_pbmc_10k_singlecell.csv")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz", "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz")
download.file("https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz.tbi", "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz.tbi")

trying URL 'https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5'
Content type 'binary/octet-stream' length 79995014 bytes (76.3 MB)
==================================================
downloaded 76.3 MB

trying URL 'https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_peaks.bed'
Content type 'binary/octet-stream' length 2145757 bytes (2.0 MB)
==================================================
downloaded 2.0 MB

trying URL 'https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_singlecell.csv'
Content type 'text/csv' length 35207776 bytes (33.6 MB)
==================================================
downloaded 33.6 MB

trying URL 'https://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz'
Content type 'text/tab-separated-values' length 1960680306 bytes (1869.9 MB)
==================================================
downloaded 1869.9 MB

13.1.2 Import data

Notice how the count matrix is in a .h5 format. We have already encountered this format in Lab3. Back then, we imported it with DropletUtils::read10xCounts.

Question

Does this function work here?

Answer

DropletUtils::read10xCounts("scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")

class: SingleCellExperiment 
dim: 89796 8728 
metadata(1): Samples
assays(1): counts
rownames(89796): chr1:565107-565550 chr1:569174-569639 ... chrY:59001782-59002175 chrY:59017143-59017246
rowData names(3): ID Symbol Type
colnames: NULL
colData names(2): Sample Barcode
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

This works because 10X Genomics make sure to distribute files in .h5 format that are consistent accross single-cell sequencing methodologies.

However, the SingleCellExperiment obtained with this approach is not the most convenient, as it cannot natively leverage fragments file (see below).

Instead, we can create a Signac object, a flavour of Seurat objects.

Question

Import counts matrix and feature annotations using an import function provided by Seurat.

Answer

library(Seurat)

Loading required package: SeuratObject

Loading required package: sp

'SeuratObject' was built under R 4.4.0 but the current version is 4.4.1; it is recomended that you reinstall 'SeuratObject' as the ABI for R may have changed

'SeuratObject' was built with package 'Matrix' 1.7.0 but the current version is 1.7.1; it is recomended that you reinstall 'SeuratObject' as the ABI for 'Matrix' may have changed


Attaching package: 'SeuratObject'

The following objects are masked from 'package:base':

    intersect, t

library(Signac)
library(rtracklayer)

Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'

The following object is masked from 'package:SeuratObject':

    intersect

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, table, tapply, union, unique,
    unsplit, which.max, which.min

Loading required package: S4Vectors


Attaching package: 'S4Vectors'

The following object is masked from 'package:utils':

    findMatches

The following objects are masked from 'package:base':

    expand.grid, I, unname

Loading required package: IRanges


Attaching package: 'IRanges'

The following object is masked from 'package:sp':

    %over%

Loading required package: GenomeInfoDb

library(stringr)
cnts <- Read10X_h5('scATAC/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5')
features <- import('scATAC/atac_v1_pbmc_10k_peaks.bed')
features$peak <- as.character(features) |> str_replace(':', '-')
metadata <- read.csv(
  file = "scATAC/atac_v1_pbmc_10k_singlecell.csv",
  header = TRUE,
  row.names = 1
)

How many accessible genomic segments were found in this dataset?

Answer

features

GRanges object with 89796 ranges and 1 metadata column:
          seqnames            ranges strand |                   peak
             <Rle>         <IRanges>  <Rle> |            <character>
      [1]     chr1     565108-565550      * |     chr1-565108-565550
      [2]     chr1     569175-569639      * |     chr1-569175-569639
      [3]     chr1     713461-714823      * |     chr1-713461-714823
      [4]     chr1     752423-753038      * |     chr1-752423-753038
      [5]     chr1     762107-763359      * |     chr1-762107-763359
      ...      ...               ...    ... .                    ...
  [89792]     chrY 58993393-58993760      * | chrY-58993393-58993760
  [89793]     chrY 58994572-58994823      * | chrY-58994572-58994823
  [89794]     chrY 58996353-58997331      * | chrY-58996353-58997331
  [89795]     chrY 59001783-59002175      * | chrY-59001783-59002175
  [89796]     chrY 59017144-59017246      * | chrY-59017144-59017246
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

length(features)

[1] 89796

13.1.3 Create a `Seurat` object

The next step is to aggregate counts and features into a ChromatinAssay, a scATAC-seq flavour of Seurat standard Assays. The documentation for ?CreateChromatinAssay indicates that the user can provide:

A fragments file, corresponding to the full list of all unique fragments mapped across all single cells.
Genomic annotations to the ChromatinAssay, corresponding to gene annotations, promoter positions, etc. Such annotations can be generated from Ensembl.

Question

Generate human annotations from Ensembl using a parsing function from Seurat.

Answer

## - Get human gene annotations (hg19/GRCh37) to feed it into the future `ChromatinAssay`
BiocManager::install('EnsDb.Hsapiens.v75')

'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cloud.r-project.org

Bioconductor version 3.19 (BiocManager 1.30.25), R 4.4.1 (2024-06-14)

Warning: package(s) not installed when version(s) same as or greater than current; use `force = TRUE` to re-install: 'EnsDb.Hsapiens.v75'

Old packages: 'boot', 'curl', 'foreign', 'fs', 'httr2', 'MASS', 'mvtnorm', 'nlme', 'pak', 'progressr', 'R.oo', 'Rcpp', 'rliger', 'rmarkdown', 'spatstat.univar', 'spatstat.utils', 'survival',
  'tinytex', 'waldo', 'xfun'

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75)

Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)

seqlevelsStyle(annotations) <- 'UCSC'

Create a ChromatinAssay using counts, features, fragments and annotations.

Answer

## - Create Chromatin Assay
assay <- Signac::CreateChromatinAssay(
    counts = cnts,
    ranges = features,
    fragments = "scATAC/atac_v1_pbmc_10k_fragments.tsv.gz",
    annotation = annotations,
    genome = "hg19",
    min.cells = 10, 
    min.features = 10
)

Computing hash

assay

ChromatinAssay data with 87561 features for 8728 cells
Variable features: 0 
Genome: hg19 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 1

What are the dimensions of this object? Are they comparable to the count matrix? Comment.

Answer

dim(cnts)

[1] 89796  8728

dim(assay)

[1] 87561  8728

It’s finally time to wrap the ChromatinAssay into a Seurat standard object! This is done using the CreateSeuratObject, as already covered in Lab6

Question

Create a Seurat object from the ChromatinAssay object.

Answer

## - Create Seurat object
PBMC <- Seurat::CreateSeuratObject(
    counts = assay,
    assay = 'ATAC', 
    meta.data = metadata[metadata$is__cell_barcode == 1, ]
)
PBMC

An object of class Seurat 
87561 features across 8728 samples within 1 assay 
Active assay: ATAC (87561 features, 0 variable features)
 2 layers present: counts, data

PBMC[['ATAC']]

ChromatinAssay data with 87561 features for 8728 cells
Variable features: 0 
Genome: hg19 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 1

granges(PBMC)

GRanges object with 87561 ranges and 1 metadata column:
          seqnames            ranges strand |                   peak
             <Rle>         <IRanges>  <Rle> |            <character>
      [1]     chr1     565108-565550      * |     chr1-565108-565550
      [2]     chr1     569175-569639      * |     chr1-569175-569639
      [3]     chr1     713461-714823      * |     chr1-713461-714823
      [4]     chr1     752423-753038      * |     chr1-752423-753038
      [5]     chr1     762107-763359      * |     chr1-762107-763359
      ...      ...               ...    ... .                    ...
  [87557]     chrY 58993393-58993760      * | chrY-58993393-58993760
  [87558]     chrY 58994572-58994823      * | chrY-58994572-58994823
  [87559]     chrY 58996353-58997331      * | chrY-58996353-58997331
  [87560]     chrY 59001783-59002175      * | chrY-59001783-59002175
  [87561]     chrY 59017144-59017246      * | chrY-59017144-59017246
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

Annotation(PBMC)

GRanges object with 3072120 ranges and 5 metadata columns:
                  seqnames        ranges strand |           tx_id   gene_name         gene_id   gene_biotype     type
                     <Rle>     <IRanges>  <Rle> |     <character> <character>     <character>    <character> <factor>
  ENSE00001489430     chrX 192989-193061      + | ENST00000399012      PLCXD1 ENSG00000182378 protein_coding     exon
  ENSE00001536003     chrX 192991-193061      + | ENST00000484611      PLCXD1 ENSG00000182378 protein_coding     exon
  ENSE00002160563     chrX 193020-193061      + | ENST00000430923      PLCXD1 ENSG00000182378 protein_coding     exon
  ENSE00001750899     chrX 197722-197788      + | ENST00000445062      PLCXD1 ENSG00000182378 protein_coding     exon
  ENSE00001489388     chrX 197859-198351      + | ENST00000381657      PLCXD1 ENSG00000182378 protein_coding     exon
              ...      ...           ...    ... .             ...         ...             ...            ...      ...
  ENST00000361739    chrMT     7586-8269      + | ENST00000361739      MT-CO2 ENSG00000198712 protein_coding      cds
  ENST00000361789    chrMT   14747-15887      + | ENST00000361789      MT-CYB ENSG00000198727 protein_coding      cds
  ENST00000361851    chrMT     8366-8572      + | ENST00000361851     MT-ATP8 ENSG00000228253 protein_coding      cds
  ENST00000361899    chrMT     8527-9207      + | ENST00000361899     MT-ATP6 ENSG00000198899 protein_coding      cds
  ENST00000362079    chrMT     9207-9990      + | ENST00000362079      MT-CO3 ENSG00000198938 protein_coding      cds
  -------
  seqinfo: 25 sequences (1 circular) from hg19 genome

13.1.4 Check QCs

13.1.4.1 Cell-based QCs

The fraction of reads in peaks (FRiP) is a good indicator of how well each cell was handled during scATACseq processing.

PBMC$FRiP <- PBMC$peak_region_fragments / PBMC$passed_filters
PBMC$nCount_ATAC <- colSums(GetAssayData(PBMC, layer = "counts"))
PBMC$nFeature_ATAC <- colSums(GetAssayData(PBMC, layer = "counts") > 0)

quantile(PBMC$FRiP, seq(0, 1, 0.1))

      0%      10%      20%      30%      40%      50%      60%      70%      80%      90%     100% 
0.027703 0.552667 0.608955 0.640861 0.665830 0.685208 0.701947 0.719937 0.738010 0.760047 0.903831

quantile(PBMC$nCount_ATAC, seq(0, 1, 0.1))

      0%      10%      20%      30%      40%      50%      60%      70%      80%      90%     100% 
  1710.0   5079.1   7619.8   9522.5  11300.8  13367.5  15754.6  19005.5  23502.6  31316.2 249799.0

quantile(PBMC$nFeature_ATAC, seq(0, 1, 0.1))

     0%     10%     20%     30%     40%     50%     60%     70%     80%     90%    100% 
  654.0  2308.4  3349.4  4108.1  4765.0  5447.0  6234.2  7100.9  8241.6  9983.4 30313.0

13.1.4.2 Peaks-based QCs

Question

Which analysis are the fragments required for, exactly?

Could we still perform normalization/clustering/annotation without them? And motif enrichment analysis?

Since we do have the fragments file at hand, most of the QC steps are available (e.g. TSSEnrichment, NucleosomeSignal or fragment size distribution). Let’s go through them one by one.

Answer

# compute nucleosome signal score per cell
PBMC <- NucleosomeSignal(object = PBMC)
PBMC

An object of class Seurat 
87561 features across 8728 samples within 1 assay 
Active assay: ATAC (87561 features, 0 variable features)
 2 layers present: counts, data

# compute TSS enrichment score per cell
PBMC <- Signac::TSSEnrichment(object = PBMC, fast = FALSE)

Extracting TSS positions

Finding + strand cut sites

Finding - strand cut sites

Computing mean insertion frequency in flanking regions

Normalizing TSS score

PBMC

An object of class Seurat 
87561 features across 8728 samples within 1 assay 
Active assay: ATAC (87561 features, 0 variable features)
 2 layers present: counts, data

The TSSPlot function from Signac can be used to plot the fragment count per peak ~ TSS enrichment.

PBMC$high.tss <- ifelse(PBMC$TSS.enrichment > 3.5, 'High', 'Low')
TSSPlot(PBMC, group.by = 'high.tss') + NoLegend()

PBMC$high.tss <- ifelse(PBMC$TSS.enrichment > 2.5, 'High', 'Low')
TSSPlot(PBMC, group.by = 'high.tss') + NoLegend()

The FragmentHistogram function from Signac can be used to plot the fragment size distribution in peaks with different nucleosome signals.

Question

Group genomic loci by their nucleosome signal (> 4 or < 4) and plot the fragment size distribution.

Answer

PBMC$nucleosome_group <- ifelse(PBMC$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
FragmentHistogram(object = PBMC, group.by = 'nucleosome_group')

Warning: Removed 92 rows containing non-finite outside the scale range (`stat_bin()`).

Warning: Removed 4 rows containing missing values or values outside the scale range (`geom_bar()`).

13.1.5 Filter cells and features

Just like standard scRNAseq data, scATACseq data needs to be filtered to remove low-quality cells and peaks.

Question

Filter the Seurat object (cells and features) as deemed appropriate.

Answer

## - Filter data 
PBMC <- subset(PBMC, subset = nCount_ATAC > 3000 & nCount_ATAC < 100000) 
PBMC <- subset(PBMC, subset = nFeature_ATAC > 1000 & nFeature_ATAC < 20000) 
PBMC <- subset(PBMC, subset = FRiP > 0.30) 
PBMC

An object of class Seurat 
87561 features across 8231 samples within 1 assay 
Active assay: ATAC (87561 features, 0 variable features)
 2 layers present: counts, data

## - Remove peaks with low coverage
PBMC <- PBMC[rowSums(GetAssayData(PBMC, layer = "counts")) > 10, ]
PBMC <- PBMC[rowSums(GetAssayData(PBMC, layer = "counts") > 0) > 10, ]
PBMC

An object of class Seurat 
87212 features across 8231 samples within 1 assay 
Active assay: ATAC (87212 features, 0 variable features)
 2 layers present: counts, data

13.1.6 Dimensionality reduction and clustering

Now that the dataset is filtered, we can proceed to data normalization, dimensionality reduction and clustering. Compared to scRNAseq, scATACseq data is much sparser and has a higher dimensionality. This has important implications for the choice of normalization and dimensionality reduction methods. scATACseq data are generally normalized using a TF-IDF normalization, which is implemented in the RunTFIDF function in Signac, and dimensionality reduction is performed using LSI (Latent Semantic Indexing) implemented in the RunSVD function.

Question

Normalize then further reduce the dimensionality for visualization purposes.

Answer

## - Normalize data 
PBMC <- RunTFIDF(PBMC)

Performing TF-IDF normalization

## - Reduce dimensionality
PBMC <- FindTopFeatures(PBMC, min.cutoff = 'q50') 
PBMC <- RunSVD(PBMC)

Running SVD

Scaling cell embeddings

## - Embed in 2D UMAP
PBMC <- RunUMAP(object = PBMC, reduction = 'lsi', dims = 2:30)

Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session

23:26:55 UMAP embedding parameters a = 0.9922 b = 1.112

23:26:55 Read 8231 rows and found 29 numeric columns

23:26:55 Using Annoy for neighbor search, n_neighbors = 30

23:26:55 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

**************************************************|
23:26:55 Writing NN index file to temp file /var/folders/dg/mbw146s905lgqgswn6w4ghk80000gn/T//Rtmp1eF4yc/file9ef7ad6087
23:26:55 Searching Annoy index using 1 thread, search_k = 3000
23:26:57 Annoy recall = 100%
23:26:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
23:27:00 Initializing from normalized Laplacian + noise (using RSpectra)
23:27:00 Commencing optimization for 500 epochs, with 316686 positive edges
23:27:08 Optimization finished

## - Cluster cells
PBMC <- FindNeighbors(object = PBMC, reduction = 'lsi', dims = 2:30)

Computing nearest neighbor graph
Computing SNN

PBMC <- FindClusters(object = PBMC, verbose = FALSE, algorithm = 3)

## - Visualize data 
DimPlot(PBMC, label = TRUE) + NoLegend()

What can you observe in the UMAP projection of the dataset? Comment on the separation of some cell types into different spatially-resolved clusters.

13.2 Compute gene activity scores

Signac’s GeneActivity() function require scATACseq fragment information. Since we have them, we can estimate a gene activity score for each gene in the annotations.

gene.activities <- GeneActivity(PBMC)

Extracting gene coordinates

Warning in SingleFeatureMatrix(fragment = fragments[[x]], features = features, : 13 features are on seqnames not present in the fragment file. These will be removed.

Extracting reads overlapping genomic regions

We can now save this new object as an Assay in the PBMC object and normalize it.

PBMC[['RNA']] <- CreateAssayObject(counts = gene.activities)

Warning: Non-unique features (rownames) present in the input matrix, making unique

# - Normalize the new RNA assay.
PBMC <- NormalizeData(object = PBMC, assay = 'RNA', normalization.method = 'LogNormalize', scale.factor = median(PBMC$nCount_RNA))
PBMC

An object of class Seurat 
107222 features across 8231 samples within 2 assays 
Active assay: ATAC (87212 features, 43610 variable features)
 2 layers present: counts, data
 1 other assay present: RNA
 2 dimensional reductions calculated: lsi, umap

One can now visualize expression of individual markers across clusters to infer cluster identity.

Question

Assess the expression of known markers across clusters, projecting the cells in a 2D UMAP.

Answer

VlnPlot(
  object = PBMC,
  features = c(
    'MS4A1', # B-cell
    'CD3D', # T-cell
    'NKG7', # NK cell
    'TREM1' # Monocytes
  ), 
  assay = "RNA"
)

PBMC$annotation <- dplyr::case_when(
  Idents(PBMC) %in% c(0, 5, 9, 12) ~ "Monocytes", 
  Idents(PBMC) %in% c(1, 2, 3, 4, 7, 10) ~ "T-cells", 
  Idents(PBMC) %in% c(6) ~ "NK-cells", 
  Idents(PBMC) %in% c(8, 11) ~ "B-cells", 
  Idents(PBMC) %in% c(13, 14) ~ "DC"
)
DimPlot(PBMC, group.by = 'annotation', label = TRUE) + NoLegend()

13.3 Find peaks differentially accessible across clusters

Finally, we can perform differential peak accessibility analysis to identify peaks that are differentially accessible between clusters. This is done using the FindMarkers function in Signac, which is similar to FindAllMarkers in Seurat.

Idents(PBMC) <- PBMC$annotation
da_peaks <- FindMarkers(
  object = PBMC,
  ident.1 = "Monocytes",
  ident.2 = "B-cells",
  test.use = 'LR',
  latent.vars = 'nCount_ATAC'
)
VlnPlot(
  object = PBMC,
  features = rownames(da_peaks)[1],
  pt.size = 0.1,
  idents = c("Monocytes","B-cells")
)

13.4 Session info

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Paris
 date     2024-11-07
 pandoc   2.19.2 @ /opt/homebrew/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version   date (UTC) lib source
 abind                  1.4-8     2024-09-12 [1] CRAN (R 4.4.1)
 AnnotationDbi          1.66.0    2024-06-16 [1] Bioconduc~
 AnnotationFilter       1.28.0    2024-04-30 [1] Bioconduc~
 backports              1.5.0     2024-05-23 [1] CRAN (R 4.4.0)
 base64enc              0.1-3     2015-07-28 [1] CRAN (R 4.4.0)
 beachmat               2.20.0    2024-05-06 [1] Bioconduc~
 beeswarm               0.4.0     2021-06-01 [1] CRAN (R 4.4.0)
 Biobase                2.64.0    2024-04-30 [1] Bioconduc~
 BiocGenerics         * 0.50.0    2024-04-30 [1] Bioconduc~
 BiocIO                 1.14.0    2024-04-30 [1] Bioconduc~
 BiocManager            1.30.25   2024-08-28 [1] CRAN (R 4.4.1)
 BiocParallel           1.38.0    2024-04-30 [1] Bioconduc~
 Biostrings             2.72.1    2024-06-02 [1] Bioconduc~
 biovizBase             1.52.0    2024-04-30 [1] Bioconduc~
 bit                    4.5.0     2024-09-20 [1] CRAN (R 4.4.1)
 bit64                  4.5.2     2024-09-22 [1] CRAN (R 4.4.1)
 bitops                 1.0-9     2024-10-03 [1] CRAN (R 4.4.1)
 blob                   1.2.4     2023-03-17 [1] CRAN (R 4.4.0)
 BSgenome               1.72.0    2024-04-30 [1] Bioconduc~
 cachem                 1.1.0     2024-05-16 [1] CRAN (R 4.4.0)
 checkmate              2.3.2     2024-07-29 [1] CRAN (R 4.4.0)
 cli                    3.6.3     2024-06-21 [1] CRAN (R 4.4.0)
 cluster                2.1.6     2023-12-01 [1] CRAN (R 4.4.1)
 codetools              0.2-20    2024-03-31 [1] CRAN (R 4.4.1)
 colorspace             2.1-1     2024-07-26 [1] CRAN (R 4.4.0)
 cowplot                1.1.3     2024-01-22 [1] CRAN (R 4.4.0)
 crayon                 1.5.3     2024-06-20 [1] CRAN (R 4.4.0)
 curl                   5.2.3     2024-09-20 [1] CRAN (R 4.4.1)
 data.table             1.16.2    2024-10-10 [1] CRAN (R 4.4.1)
 DBI                    1.2.3     2024-06-02 [1] CRAN (R 4.4.0)
 DelayedArray           0.30.1    2024-05-30 [1] Bioconduc~
 DelayedMatrixStats     1.26.0    2024-04-30 [1] Bioconduc~
 deldir                 2.0-4     2024-02-28 [1] CRAN (R 4.4.0)
 devtools               2.4.5     2022-10-11 [1] CRAN (R 4.4.0)
 dichromat              2.0-0.1   2022-05-02 [1] CRAN (R 4.4.0)
 digest                 0.6.37    2024-08-19 [1] CRAN (R 4.4.1)
 dotCall64              1.2       2024-10-04 [1] CRAN (R 4.4.1)
 dplyr                  1.1.4     2023-11-17 [1] CRAN (R 4.4.0)
 dqrng                  0.4.1     2024-05-28 [1] CRAN (R 4.4.0)
 DropletUtils           1.24.0    2024-04-30 [1] Bioconduc~
 edgeR                  4.2.2     2024-10-13 [1] Bioconduc~
 ellipsis               0.3.2     2021-04-29 [1] CRAN (R 4.4.0)
 EnsDb.Hsapiens.v75     2.99.0    2024-10-30 [1] Bioconductor
 ensembldb              2.28.1    2024-08-21 [1] Bioconduc~
 evaluate               1.0.1     2024-10-10 [1] CRAN (R 4.4.1)
 fansi                  1.0.6     2023-12-08 [1] CRAN (R 4.4.0)
 farver                 2.1.2     2024-05-13 [1] CRAN (R 4.4.0)
 fastDummies            1.7.4     2024-08-16 [1] CRAN (R 4.4.0)
 fastmap                1.2.0     2024-05-15 [1] CRAN (R 4.4.0)
 fastmatch              1.1-4     2023-08-18 [1] CRAN (R 4.4.0)
 fitdistrplus           1.2-1     2024-07-12 [1] CRAN (R 4.4.0)
 foreign                0.8-86    2023-11-28 [1] CRAN (R 4.4.1)
 Formula                1.2-5     2023-02-24 [1] CRAN (R 4.4.0)
 fs                     1.6.4     2024-04-25 [1] CRAN (R 4.4.0)
 future                 1.34.0    2024-07-29 [1] CRAN (R 4.4.0)
 future.apply           1.11.3    2024-10-27 [1] CRAN (R 4.4.1)
 generics               0.1.3     2022-07-05 [1] CRAN (R 4.4.0)
 GenomeInfoDb         * 1.40.1    2024-06-16 [1] Bioconduc~
 GenomeInfoDbData       1.2.12    2024-10-29 [1] Bioconductor
 GenomicAlignments      1.40.0    2024-04-30 [1] Bioconduc~
 GenomicFeatures        1.56.0    2024-04-30 [1] Bioconduc~
 GenomicRanges        * 1.56.2    2024-10-09 [1] Bioconduc~
 ggbeeswarm             0.7.2     2023-04-29 [1] CRAN (R 4.4.0)
 ggplot2                3.5.1     2024-04-23 [1] CRAN (R 4.4.0)
 ggrastr                1.0.2     2023-06-01 [1] CRAN (R 4.4.0)
 ggrepel                0.9.6     2024-09-07 [1] CRAN (R 4.4.1)
 ggridges               0.5.6     2024-01-23 [1] CRAN (R 4.4.0)
 globals                0.16.3    2024-03-08 [1] CRAN (R 4.4.0)
 glue                   1.8.0     2024-09-30 [1] CRAN (R 4.4.1)
 goftest                1.2-3     2021-10-07 [1] CRAN (R 4.4.0)
 gridExtra              2.3       2017-09-09 [1] CRAN (R 4.4.0)
 gtable                 0.3.6     2024-10-25 [1] CRAN (R 4.4.1)
 HDF5Array              1.32.1    2024-08-11 [1] Bioconduc~
 hdf5r                  1.3.11    2024-07-07 [1] CRAN (R 4.4.0)
 Hmisc                  5.2-0     2024-10-28 [1] CRAN (R 4.4.1)
 htmlTable              2.4.3     2024-07-21 [1] CRAN (R 4.4.0)
 htmltools              0.5.8.1   2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets            1.6.4     2023-12-06 [1] CRAN (R 4.4.0)
 httpuv                 1.6.15    2024-03-26 [1] CRAN (R 4.4.0)
 httr                   1.4.7     2023-08-15 [1] CRAN (R 4.4.0)
 ica                    1.0-3     2022-07-08 [1] CRAN (R 4.4.0)
 igraph                 2.1.1     2024-10-19 [1] CRAN (R 4.4.1)
 IRanges              * 2.38.1    2024-07-03 [1] Bioconduc~
 irlba                  2.3.5.1   2022-10-03 [1] CRAN (R 4.4.0)
 jsonlite               1.8.9     2024-09-20 [1] CRAN (R 4.4.1)
 KEGGREST               1.44.1    2024-06-19 [1] Bioconduc~
 KernSmooth             2.23-24   2024-05-17 [1] CRAN (R 4.4.1)
 knitr                  1.48      2024-07-07 [1] CRAN (R 4.4.0)
 labeling               0.4.3     2023-08-29 [1] CRAN (R 4.4.0)
 later                  1.3.2     2023-12-06 [1] CRAN (R 4.4.0)
 lattice                0.22-6    2024-03-20 [1] CRAN (R 4.4.1)
 lazyeval               0.2.2     2019-03-15 [1] CRAN (R 4.4.0)
 leiden                 0.4.3.1   2023-11-17 [1] CRAN (R 4.4.0)
 lifecycle              1.0.4     2023-11-07 [1] CRAN (R 4.4.0)
 limma                  3.60.6    2024-10-02 [1] Bioconduc~
 listenv                0.9.1     2024-01-29 [1] CRAN (R 4.4.0)
 lmtest                 0.9-40    2022-03-21 [1] CRAN (R 4.4.0)
 locfit                 1.5-9.10  2024-06-24 [1] CRAN (R 4.4.0)
 magrittr               2.0.3     2022-03-30 [1] CRAN (R 4.4.0)
 MASS                   7.3-60.2  2024-04-26 [1] CRAN (R 4.4.1)
 Matrix                 1.7-1     2024-10-18 [1] CRAN (R 4.4.1)
 MatrixGenerics         1.16.0    2024-04-30 [1] Bioconduc~
 matrixStats            1.4.1     2024-09-08 [1] CRAN (R 4.4.1)
 memoise                2.0.1     2021-11-26 [1] CRAN (R 4.4.0)
 mime                   0.12      2021-09-28 [1] CRAN (R 4.4.0)
 miniUI                 0.1.1.1   2018-05-18 [1] CRAN (R 4.4.0)
 munsell                0.5.1     2024-04-01 [1] CRAN (R 4.4.0)
 nlme                   3.1-164   2023-11-27 [1] CRAN (R 4.4.1)
 nnet                   7.3-19    2023-05-03 [1] CRAN (R 4.4.1)
 parallelly             1.38.0    2024-07-27 [1] CRAN (R 4.4.0)
 patchwork              1.3.0     2024-09-16 [1] CRAN (R 4.4.1)
 pbapply                1.7-2     2023-06-27 [1] CRAN (R 4.4.0)
 pillar                 1.9.0     2023-03-22 [1] CRAN (R 4.4.0)
 pkgbuild               1.4.5     2024-10-28 [1] CRAN (R 4.4.1)
 pkgconfig              2.0.3     2019-09-22 [1] CRAN (R 4.4.0)
 pkgload                1.4.0     2024-06-28 [1] CRAN (R 4.4.0)
 plotly                 4.10.4    2024-01-13 [1] CRAN (R 4.4.0)
 plyr                   1.8.9     2023-10-02 [1] CRAN (R 4.4.0)
 png                    0.1-8     2022-11-29 [1] CRAN (R 4.4.0)
 polyclip               1.10-7    2024-07-23 [1] CRAN (R 4.4.0)
 profvis                0.4.0     2024-09-20 [1] CRAN (R 4.4.1)
 progressr              0.14.0    2023-08-10 [1] CRAN (R 4.4.0)
 promises               1.3.0     2024-04-05 [1] CRAN (R 4.4.0)
 ProtGenerics           1.36.0    2024-04-30 [1] Bioconduc~
 purrr                  1.0.2     2023-08-10 [1] CRAN (R 4.4.0)
 R.methodsS3            1.8.2     2022-06-13 [1] CRAN (R 4.4.0)
 R.oo                   1.26.0    2024-01-24 [1] CRAN (R 4.4.0)
 R.utils                2.12.3    2023-11-18 [1] CRAN (R 4.4.0)
 R6                     2.5.1     2021-08-19 [1] CRAN (R 4.4.0)
 RANN                   2.6.2     2024-08-25 [1] CRAN (R 4.4.1)
 RColorBrewer           1.1-3     2022-04-03 [1] CRAN (R 4.4.0)
 Rcpp                   1.0.13    2024-07-17 [1] CRAN (R 4.4.0)
 RcppAnnoy              0.0.22    2024-01-23 [1] CRAN (R 4.4.0)
 RcppHNSW               0.6.0     2024-02-04 [1] CRAN (R 4.4.0)
 RcppRoll               0.3.1     2024-07-07 [1] CRAN (R 4.4.0)
 RCurl                  1.98-1.16 2024-07-11 [1] CRAN (R 4.4.0)
 remotes                2.5.0     2024-03-17 [1] CRAN (R 4.4.0)
 reshape2               1.4.4     2020-04-09 [1] CRAN (R 4.4.0)
 restfulr               0.0.15    2022-06-16 [1] CRAN (R 4.4.0)
 reticulate             1.39.0    2024-09-05 [1] CRAN (R 4.4.1)
 rhdf5                  2.48.0    2024-04-30 [1] Bioconduc~
 rhdf5filters           1.16.0    2024-04-30 [1] Bioconduc~
 Rhdf5lib               1.26.0    2024-04-30 [1] Bioconduc~
 rjson                  0.2.23    2024-09-16 [1] CRAN (R 4.4.1)
 rlang                  1.1.4     2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown              2.28      2024-08-17 [1] CRAN (R 4.4.0)
 ROCR                   1.0-11    2020-05-02 [1] CRAN (R 4.4.0)
 rpart                  4.1.23    2023-12-05 [1] CRAN (R 4.4.1)
 Rsamtools              2.20.0    2024-04-30 [1] Bioconduc~
 RSpectra               0.16-2    2024-07-18 [1] CRAN (R 4.4.0)
 RSQLite                2.3.7     2024-05-27 [1] CRAN (R 4.4.0)
 rstudioapi             0.17.1    2024-10-22 [1] CRAN (R 4.4.1)
 rtracklayer          * 1.64.0    2024-05-06 [1] Bioconduc~
 Rtsne                  0.17      2023-12-07 [1] CRAN (R 4.4.0)
 S4Arrays               1.4.1     2024-05-30 [1] Bioconduc~
 S4Vectors            * 0.42.1    2024-07-03 [1] Bioconduc~
 scales                 1.3.0     2023-11-28 [1] CRAN (R 4.4.0)
 scattermore            1.2       2023-06-12 [1] CRAN (R 4.4.0)
 sctransform            0.4.1     2023-10-19 [1] CRAN (R 4.4.0)
 scuttle                1.14.0    2024-04-30 [1] Bioconduc~
 sessioninfo            1.2.2     2021-12-06 [1] CRAN (R 4.4.0)
 Seurat               * 5.1.0     2024-05-10 [1] CRAN (R 4.4.0)
 SeuratObject         * 5.0.2     2024-05-08 [1] CRAN (R 4.4.0)
 shiny                  1.9.1     2024-08-01 [1] CRAN (R 4.4.0)
 Signac               * 1.14.0    2024-08-21 [1] CRAN (R 4.4.1)
 SingleCellExperiment   1.26.0    2024-04-30 [1] Bioconduc~
 sp                   * 2.1-4     2024-04-30 [1] CRAN (R 4.4.0)
 spam                   2.11-0    2024-10-03 [1] CRAN (R 4.4.1)
 SparseArray            1.4.8     2024-05-30 [1] Bioconduc~
 sparseMatrixStats      1.16.0    2024-04-30 [1] Bioconduc~
 spatstat.data          3.1-2     2024-06-21 [1] CRAN (R 4.4.0)
 spatstat.explore       3.3-3     2024-10-22 [1] CRAN (R 4.4.1)
 spatstat.geom          3.3-3     2024-09-18 [1] CRAN (R 4.4.1)
 spatstat.random        3.3-2     2024-09-18 [1] CRAN (R 4.4.1)
 spatstat.sparse        3.1-0     2024-06-21 [1] CRAN (R 4.4.0)
 spatstat.univar        3.0-1     2024-09-05 [1] CRAN (R 4.4.1)
 spatstat.utils         3.1-0     2024-08-17 [1] CRAN (R 4.4.0)
 statmod                1.5.0     2023-01-06 [1] CRAN (R 4.4.0)
 stringi                1.8.4     2024-05-06 [1] CRAN (R 4.4.0)
 stringr              * 1.5.1     2023-11-14 [1] CRAN (R 4.4.0)
 SummarizedExperiment   1.34.0    2024-04-30 [1] Bioconduc~
 survival               3.6-4     2024-04-24 [1] CRAN (R 4.4.1)
 tensor                 1.5       2012-05-05 [1] CRAN (R 4.4.0)
 tibble                 3.2.1     2023-03-20 [1] CRAN (R 4.4.0)
 tidyr                  1.3.1     2024-01-24 [1] CRAN (R 4.4.0)
 tidyselect             1.2.1     2024-03-11 [1] CRAN (R 4.4.0)
 UCSC.utils             1.0.0     2024-05-06 [1] Bioconduc~
 urlchecker             1.0.1     2021-11-30 [1] CRAN (R 4.4.0)
 usethis                3.0.0     2024-07-29 [1] CRAN (R 4.4.0)
 utf8                   1.2.4     2023-10-22 [1] CRAN (R 4.4.0)
 uwot                   0.2.2     2024-04-21 [1] CRAN (R 4.4.0)
 VariantAnnotation      1.50.0    2024-04-30 [1] Bioconduc~
 vctrs                  0.6.5     2023-12-01 [1] CRAN (R 4.4.0)
 vipor                  0.4.7     2023-12-18 [1] CRAN (R 4.4.0)
 viridisLite            0.4.2     2023-05-02 [1] CRAN (R 4.4.0)
 withr                  3.0.2     2024-10-28 [1] CRAN (R 4.4.1)
 xfun                   0.48      2024-10-03 [1] CRAN (R 4.4.1)
 XML                    3.99-0.17 2024-06-25 [1] CRAN (R 4.4.0)
 xtable                 1.8-4     2019-04-21 [1] CRAN (R 4.4.0)
 XVector                0.44.0    2024-04-30 [1] Bioconduc~
 yaml                   2.3.10    2024-07-26 [1] CRAN (R 4.4.0)
 zlibbioc               1.50.0    2024-04-30 [1] Bioconduc~
 zoo                    1.8-12    2023-04-13 [1] CRAN (R 4.4.0)

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

13.1 Process human PBMC dataset

13.1.1 Download data

13.1.2 Import data

13.1.3 Create a Seurat object

13.1.4 Check QCs

13.1.4.1 Cell-based QCs

13.1.4.2 Peaks-based QCs

13.1.5 Filter cells and features

13.1.6 Dimensionality reduction and clustering

13.2 Compute gene activity scores

13.3 Find peaks differentially accessible across clusters

13.4 Session info

13.1.3 Create a `Seurat` object