15 Lab 8: Pseudotime analyses

Notes

The estimated time for this lab is around 1h30.

Aims

Understand the requirements for RNA velocity computation
Perform pseudotime and lineage analysis
Model and visualize gene expression along pseudotime
Compute RNA velocity and use it to orientate lineages

15.1 Process testis data in R

15.1.1 Import testis data from GSE112013 and pre-process it

We will work on a testis dataset, from the “The adult human testis transcriptional cell atlas” study (Guo et al., 2018). This dataset contains single-cell RNAseq data from human testis. It is available on the GEO database under the accession number GSE112013.

The pre-processing of the testis dataset has already been done for you. We have applied the different analysis steps we have learned over the past three days to the count matrix GSE112013_Combined_UMI_table.txt.gz. See the code below for the details of pre-processing steps.

👀 Data pre-processing 👇

Check the GEO page for supplementary processed data, and download the GSE112013_Combined_UMI_table.txt.gz table.

With this data, we will:

Import the dataset in R as a SingleCellExperiment object,
Remove doublets,
Filter genes,
Normalize counts,
Correct for batch effect,
Cluster cells.

library(SingleCellExperiment)
library(tidyverse)

# Download raw file
dir.create('Guo_testis')
download.file(
    'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE112nnn/GSE112013/suppl/GSE112013_Combined_UMI_table.txt.gz', 
    'Guo_testis/GSE112013_Combined_UMI_table.txt.gz'
)
system('gunzip Guo_testis/GSE112013_Combined_UMI_table.txt.gz')

# Create SingleCellExperiment object
x <- vroom::vroom('Guo_testis/GSE112013_Combined_UMI_table.txt')
counts <- as.matrix(x[, -1])
counts <- as(counts, 'dgCMatrix')
genes <- as.data.frame(x[, 1])
cells <- data.frame(cellid = colnames(x[, -1]))
testis <- SingleCellExperiment(
    assays = list(counts = counts), 
    colData = cells, 
    rowData = genes
)
testis$Barcode <- str_replace(testis$cellid, 'Donor.-', '') |> str_replace('-.', '')
testis <- testis[, !duplicated(testis$Barcode)]
testis$donor <- str_replace(testis$cellid, '-.*', '')
testis$replicate <- str_replace(testis$cellid, '.*-', '')
rownames(testis) <- rowData(testis)$Gene
set.seed(1000)

# Remove doublets
testis <- scDblFinder::scDblFinder(testis)
testis <- testis[, testis$scDblFinder.class == 'singlet']

# Recover human genomic, protein-coding gene informations
library(plyranges)
ah <- AnnotationHub::AnnotationHub()
AnnotationHub::query(ah, c('gene annotation', 'ensembl', '102', 'homo_sapiens', 'GRCh38'))
gtf <- AnnotationHub::query(ah, c('Homo_sapiens.GRCh38.102.chr.gtf'))[[1]]
genes <- gtf |> 
    filter(type == 'gene') |> 
    filter(gene_biotype == 'protein_coding') |> 
    filter(gene_source == 'ensembl_havana') 
genes <- genes[!duplicated(genes$gene_name)]

# Annotate genes in testis dataset and filter out non-relevant genes
testis <- testis[genes$gene_name[genes$gene_name %in% rownames(testis)], ]
rowRanges(testis) <- genes[match(rownames(testis), genes$gene_name)]
rowData(testis) <- rowData(testis)[, c('gene_name', 'gene_id')]
rownames(testis) <- scuttle::uniquifyFeatureNames(rowData(testis)$gene_id, rowData(testis)$gene_name)

# Get preliminary QCs per cell and per gene
testis <- scuttle::addPerCellQCMetrics(testis)
testis <- scuttle::addPerFeatureQCMetrics(testis)

# Filter out genes not expressed in at least 10 cells
testis <- testis[rowSums(assay(testis, 'counts') > 0) >= 10, ]

# Normalize counts using VST
cnts <- as(assay(testis, 'counts'), 'dgCMatrix')
colnames(cnts) <- testis$cellid
rownames(cnts) <- rownames(testis)
testis_vst <- sctransform::vst(cnts, return_cell_attr = TRUE)
corrected_cnts <- sctransform::correct(testis_vst)
assay(testis, 'corrected_counts', withDimnames = FALSE) <- corrected_cnts
assay(testis, 'logcounts', withDimnames = FALSE) <- log1p(corrected_cnts)

# Computing biological variance of each gene
testis_variance <- scran::modelGeneVar(testis)
HVGs <- scran::getTopHVGs(testis_variance, prop = 0.1)
rowData(testis)$isHVG <- rownames(testis) %in% HVGs

# Embedding dataset in PCA space, correcting for batch effect
mergedBatches <- batchelor::fastMNN(
    testis, 
    batch = testis$donor, 
    subset.row = HVGs, 
    BPPARAM = BiocParallel::MulticoreParam(workers = 12)
)
reducedDim(testis, 'corrected') <- reducedDim(mergedBatches, 'corrected')

# Embedding dataset in shared k-nearest neighbors graph for clustering 
graph <- scran::buildSNNGraph(testis, use.dimred = 'corrected')

# Cluster cells using Louvain community finding algorithm
testis_clust <- igraph::cluster_louvain(graph)$membership
table(testis_clust)
testis$cluster <- factor(testis_clust)

# Embedding dataset in TSNE space for visualization
set.seed(10)
testis <- scater::runTSNE(testis, dimred = 'corrected')

# Visualize embeddings
p <- cowplot::plot_grid(
    scater::plotReducedDim(testis, 'corrected', colour_by = 'donor'),
    scater::plotReducedDim(testis, 'corrected', colour_by = 'cluster'),
    scater::plotReducedDim(testis, 'TSNE', colour_by = 'donor'),
    scater::plotReducedDim(testis, 'TSNE', colour_by = 'cluster')
)
p
saveRDS(testis, '~/Share/Guo_testis.rds')

15.1.2 Annotate cells using HPA resources

The Human Protein Atlas (HPA) is a valuable resource for cell type annotations. It provides single-cell RNAseq data for a large number of cell types.

Question

Download HPA scRNAseq atlas (combined per cell type) from here, and import it in R as a SummarizedExperiment object.

Answer

library(SingleCellExperiment)

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'MatrixGenerics'

The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps,
    colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates,
    colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse,
    rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans,
    rowWeightedMedians, rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, table, tapply, union, unique,
    unsplit, which.max, which.min

Loading required package: S4Vectors


Attaching package: 'S4Vectors'

The following object is masked from 'package:utils':

    findMatches

The following objects are masked from 'package:base':

    expand.grid, I, unname

Loading required package: IRanges

Loading required package: GenomeInfoDb

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:MatrixGenerics':

    rowMedians

The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2

── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::%within%() masks IRanges::%within%()
✖ dplyr::collapse()     masks IRanges::collapse()
✖ dplyr::combine()      masks Biobase::combine(), BiocGenerics::combine()
✖ dplyr::count()        masks matrixStats::count()
✖ dplyr::desc()         masks IRanges::desc()
✖ tidyr::expand()       masks S4Vectors::expand()
✖ dplyr::filter()       masks stats::filter()
✖ dplyr::first()        masks S4Vectors::first()
✖ dplyr::lag()          masks stats::lag()
✖ ggplot2::Position()   masks BiocGenerics::Position(), base::Position()
✖ purrr::reduce()       masks GenomicRanges::reduce(), IRanges::reduce()
✖ dplyr::rename()       masks S4Vectors::rename()
✖ lubridate::second()   masks S4Vectors::second()
✖ lubridate::second<-() masks S4Vectors::second<-()
✖ dplyr::slice()        masks IRanges::slice()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

HPA <- read_tsv('~/Share/rna_single_cell_type.tsv.zip') |> 
    pivot_wider(names_from = `Cell type`, values_from = 'nTPM') |> 
    dplyr::select(-Gene) |> 
    distinct(`Gene name`, .keep_all = TRUE) |> 
    column_to_rownames('Gene name')

Rows: 1626642 Columns: 4
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): Gene, Gene name, Cell type
dbl (1): nTPM

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

HPA_se <- SummarizedExperiment::SummarizedExperiment(HPA, colData = tibble(cell_type = colnames(HPA)), assays = list('logcounts' = HPA))

How many and what cell types are profiled?

Answer

length(HPA_se$cell_type)

[1] 81

HPA_se$cell_type

 [1] "Adipocytes"                      "Alveolar cells type 1"           "Alveolar cells type 2"           "Astrocytes"                      "B-cells"                        
 [6] "Basal keratinocytes"             "Basal prostatic cells"           "Basal respiratory cells"         "Basal squamous epithelial cells" "Bipolar cells"                  
[11] "Breast glandular cells"          "Breast myoepithelial cells"      "Cardiomyocytes"                  "Cholangiocytes"                  "Ciliated cells"                 
[16] "Club cells"                      "Collecting duct cells"           "Cone photoreceptor cells"        "Cytotrophoblasts"                "dendritic cells"                
[21] "Distal enterocytes"              "Distal tubular cells"            "Ductal cells"                    "Early spermatids"                "Endometrial stromal cells"      
[26] "Endothelial cells"               "Enteroendocrine cells"           "Erythroid cells"                 "Excitatory neurons"              "Exocrine glandular cells"       
[31] "Extravillous trophoblasts"       "Fibroblasts"                     "Gastric mucus-secreting cells"   "Glandular and luminal cells"     "granulocytes"                   
[36] "Granulosa cells"                 "Hepatocytes"                     "Hofbauer cells"                  "Horizontal cells"                "Inhibitory neurons"             
[41] "Intestinal goblet cells"         "Ionocytes"                       "Kupffer cells"                   "Langerhans cells"                "Late spermatids"                
[46] "Leydig cells"                    "Lymphatic endothelial cells"     "Macrophages"                     "Melanocytes"                     "Mesothelial cells"              
[51] "Microglial cells"                "monocytes"                       "Mucus glandular cells"           "Muller glia cells"               "NK-cells"                       
[56] "Oligodendrocyte precursor cells" "Oligodendrocytes"                "Oocytes"                         "Ovarian stromal cells"           "Pancreatic endocrine cells"     
[61] "Paneth cells"                    "Peritubular cells"               "Plasma cells"                    "Prostatic glandular cells"       "Proximal enterocytes"           
[66] "Proximal tubular cells"          "Rod photoreceptor cells"         "Salivary duct cells"             "Schwann cells"                   "Secretory cells"                
[71] "Serous glandular cells"          "Sertoli cells"                   "Skeletal myocytes"               "Smooth muscle cells"             "Spermatocytes"                  
[76] "Spermatogonia"                   "Squamous epithelial cells"       "Suprabasal keratinocytes"        "Syncytiotrophoblasts"            "T-cells"                        
[81] "Undifferentiated cells"

We can use the HPA reference to annotate cell types in the testis dataset.

Question

Use SingleR to annotate cell types in the testis dataset using HPA reference.

Answer

testis <- readRDS('~/Share/Guo_testis.rds')
predictions <- SingleR::SingleR(
    test = testis, 
    ref = HPA_se, 
    labels = colData(HPA_se)$cell_type, 
    clusters = testis$cluster
)
testis$annotation <- predictions$labels[testis$cluster]
cowplot::plot_grid(
    scater::plotReducedDim(testis, dimred = 'corrected', colour_by = 'cluster', text_by = 'cluster') + ggtitle('Testis data, PCA, graph-based clusters'), 
    scater::plotReducedDim(testis, dimred = 'corrected', colour_by = 'annotation', text_by = 'annotation') + ggtitle('PCA, Annotations transferred from HPA'),
    scater::plotReducedDim(testis, dimred = 'TSNE', colour_by = 'cluster', text_by = 'cluster') + ggtitle('Testis data, tSNE, graph-based clusters'), 
    scater::plotReducedDim(testis, dimred = 'TSNE', colour_by = 'annotation', text_by = 'annotation') + ggtitle('tSNE, Annotations transferred from HPA')
)

15.1.3 Filter the testis dataset to only germinal cells.

In this notebook, we will focus on germinal cells only.

Question

Filter the testis dataset to only retain germinal cells.

Answer

germcells <- testis[
    , 
    testis$annotation %in% c("Spermatogonia", "Spermatocytes", "Early spermatids", "Late spermatids")
]

15.2 Trajectory inference (TI) in scRNAseq

An important question in scRNAseq field of research is: how to identify a cell trajectory from high-dimensional expression data and map individual cells onto it? A large number of methods have currently emerged, each one with their own specificities, assumptions, and strengths. A nice breakdown (from 2019, so already very outdated!) is available from Saelens et al., Nat. Biotech. 2018 (doi: 10.1038/s41587-019-0071-9):

15.2.1 Trajectory

Slingshot is perhaps one of the most widely used algorithms for users who want to focus on R-based approaches.

Question

Read Slingshot documentation to understand how to identify lineages in a scRNAseq dataset in R.

Why is it recommended to infer lineages from PCA space rather than t-SNE or UMAP space, even though these spaces do “reveal” an obvious trajectory in 2D?

Infer lineages, using cluster annotations as information to build the MST. Note that you will first need to remove the 50th PCA dimension for slingshot to work (bug reported).

Answer

reducedDim(germcells, 'corrected_2') <- reducedDim(germcells, 'corrected')[, 1:49]
germcells_slingshot <- slingshot::slingshot(
    germcells, 
    reducedDim = 'corrected_2', 
    clusterLabels = germcells$annotation
)
slingshot::slingLineages(germcells_slingshot)

$Lineage1
[1] "Late spermatids"  "Early spermatids" "Spermatocytes"    "Spermatogonia"

15.2.2 Pseudotime

slingshot automatically deduces a pseudotime for each cell, based on the inferred lineage. It can be useful to visualize the trajectory of cells in a 2D space, and to order cells along the trajectory.

Question

Recover pseudotime values and principal curves from slingshot output.

Answer

# Fetching pseudotime
germcells$pseudotime <- slingshot::slingPseudotime(germcells_slingshot)[, 'Lineage1']

# Fetching principal curve in PCA space
pca_curve <- slingshot::slingCurves(germcells_slingshot, as.df = TRUE)
colnames(pca_curve) <- paste0('PC', 1:ncol(pca_curve))
head(pca_curve)

      PC1     PC2      PC3      PC4      PC5       PC6     PC7        PC8      PC9      PC10     PC11        PC12      PC13      PC14      PC15     PC16       PC17      PC18       PC19      PC20
1 0.53255 0.12770 -0.33383 0.036078 -0.28674 -0.042046 0.13734 -0.0099950 0.026943 -0.050426 0.029891  0.00523859 -0.018911 -0.025575 -0.010748 0.020062 -0.0040499 0.0059456 -0.0089460 -0.016504
2 0.53314 0.12574 -0.32473 0.034858 -0.27366 -0.040134 0.13099 -0.0091373 0.025707 -0.047675 0.027900  0.00369063 -0.019211 -0.024697 -0.011852 0.021232 -0.0056933 0.0081235 -0.0086916 -0.018824
3 0.53373 0.12378 -0.31562 0.033638 -0.26059 -0.038223 0.12463 -0.0082795 0.024471 -0.044924 0.025909  0.00214267 -0.019511 -0.023819 -0.012955 0.022401 -0.0073367 0.0103014 -0.0084373 -0.021144
4 0.53432 0.12182 -0.30652 0.032417 -0.24752 -0.036311 0.11828 -0.0074218 0.023234 -0.042174 0.023918  0.00059497 -0.019812 -0.022941 -0.014059 0.023570 -0.0089799 0.0124791 -0.0081829 -0.023463
5 0.53491 0.11986 -0.29741 0.031197 -0.23445 -0.034396 0.11193 -0.0065652 0.022000 -0.039433 0.021936 -0.00094697 -0.020113 -0.022066 -0.015167 0.024740 -0.0106199 0.0146528 -0.0079287 -0.025777
6 0.53549 0.11789 -0.28827 0.029976 -0.22137 -0.032474 0.10560 -0.0057110 0.020771 -0.036713 0.019972 -0.00247587 -0.020416 -0.021198 -0.016285 0.025909 -0.0122525 0.0168175 -0.0076749 -0.028076
         PC21      PC22       PC23      PC24        PC25      PC26      PC27       PC28       PC29       PC30      PC31       PC32       PC33      PC34        PC35      PC36       PC37       PC38
1 -0.00292927 -0.017404 -0.0087785 0.0074640  5.4695e-05 0.0100871 -0.010260 -0.0035944 -0.0062462 -0.0068927 0.0078310 -0.0015015 -0.0060110 0.0087654 -5.8788e-04 0.0083160 -0.0041018 -0.0025883
2 -0.00236729 -0.017099 -0.0093209 0.0071747 -6.9630e-04 0.0100163 -0.010516 -0.0036699 -0.0069872 -0.0065120 0.0080722 -0.0016012 -0.0054229 0.0086525 -2.6930e-04 0.0082659 -0.0044179 -0.0028582
3 -0.00180531 -0.016794 -0.0098634 0.0068854 -1.4473e-03 0.0099455 -0.010772 -0.0037454 -0.0077282 -0.0061313 0.0083134 -0.0017009 -0.0048349 0.0085396  4.9288e-05 0.0082158 -0.0047340 -0.0031281
4 -0.00124336 -0.016489 -0.0104058 0.0065961 -2.1982e-03 0.0098747 -0.011028 -0.0038209 -0.0084691 -0.0057504 0.0085546 -0.0018006 -0.0042468 0.0084266  3.6783e-04 0.0081658 -0.0050501 -0.0033981
5 -0.00068232 -0.016182 -0.0109479 0.0063063 -2.9476e-03 0.0098036 -0.011285 -0.0038952 -0.0092075 -0.0053672 0.0087948 -0.0019006 -0.0036577 0.0083132  6.8528e-04 0.0081161 -0.0053667 -0.0036683
6 -0.00012305 -0.015872 -0.0114893 0.0060156 -3.6933e-03 0.0097315 -0.011544 -0.0039673 -0.0099406 -0.0049789 0.0090328 -0.0020012 -0.0030666 0.0081991  1.0003e-03 0.0080672 -0.0056841 -0.0039391
       PC39      PC40        PC41      PC42       PC43       PC44       PC45      PC46       PC47        PC48        PC49 PC50 PC51
1 0.0057976 0.0010192 -1.1903e-03 0.0024878 -0.0027841 0.00064003 0.00340531 0.0053154 -0.0028912 -0.00019255 -0.00221046    1    1
2 0.0060588 0.0015296 -9.6893e-04 0.0026406 -0.0026553 0.00071086 0.00282910 0.0048515 -0.0025377 -0.00027362 -0.00173842    2    1
3 0.0063200 0.0020400 -7.4761e-04 0.0027933 -0.0025265 0.00078169 0.00225288 0.0043876 -0.0021843 -0.00035469 -0.00126638    3    1
4 0.0065813 0.0025503 -5.2628e-04 0.0029460 -0.0023976 0.00085250 0.00167667 0.0039237 -0.0018308 -0.00043577 -0.00079433    4    1
5 0.0068431 0.0030609 -3.0488e-04 0.0030985 -0.0022684 0.00092308 0.00110042 0.0034599 -0.0014773 -0.00051683 -0.00032216    5    1
6 0.0071058 0.0035716 -8.3373e-05 0.0032503 -0.0021385 0.00099312 0.00052435 0.0029962 -0.0011239 -0.00059804  0.00015009    6    1

We would like to visualize the trajectory of cells in our TSNE space, but the principal curve is only available in PCA space. We can use slingshot to embed it in the TSNE space!

Answer

tsne_curve <- slingshot::embedCurves(germcells_slingshot, 'TSNE', smoother = 'loess', span = 0.1) |> 
    slingshot::slingCurves(as.df = TRUE)
tsne_curve <- tsne_curve[order(tsne_curve$Order), ]
head(tsne_curve)

                           TSNE1  TSNE2 Order Lineage
Donor3-AACTCCCGTCTGGTCG-2 -25.23 11.729     1       1
Donor1-GGGATGACATTCTCAT-2 -25.23 11.729     2       1
Donor1-CGCTTCATCTAACTGG-1 -25.23 11.729     3       1
Donor1-GGGTCTGAGCTGATAA-2 -25.23 11.729     4       1
Donor1-CGGACACTCAGCAACT-1 -25.23 11.729     5       1
Donor1-GGATTACAGTCAATAG-2 -25.23 11.729     6       1

Plot PCA and tSNE embeddings, coloring cells by either their annotation or their pseudotime value. Overlay the principal curves to each embedding.

Answer

df <- tibble(
    PC1 = reducedDim(germcells, 'corrected')[,1], 
    PC2 = reducedDim(germcells, 'corrected')[,2], 
    TSNE1 = reducedDim(germcells, 'TSNE')[,1], 
    TSNE2 = reducedDim(germcells, 'TSNE')[,2], 
    annotation = germcells$annotation, 
    pseudotime = germcells$pseudotime
)
cowplot::plot_grid(
    df |> 
        ggplot() + 
        geom_point(aes(PC1, PC2, col = annotation)) + 
        geom_path(data = pca_curve, aes(x = PC1, y = PC2)) + 
        theme_bw() + 
        coord_fixed(),
    df |> 
        ggplot() + 
        geom_point(aes(TSNE1, TSNE2, col = annotation)) + 
        geom_path(data = tsne_curve, aes(x = TSNE1, y = TSNE2)) + 
        theme_bw() + 
        coord_fixed(),
    df |> 
        ggplot() + 
        geom_point(aes(PC1, PC2, col = pseudotime)) + 
        geom_path(data = pca_curve, aes(x = PC1, y = PC2)) + 
        theme_bw() + 
        coord_fixed(),
    df |> 
        ggplot() + 
        geom_point(aes(TSNE1, TSNE2, col = pseudotime)) + 
        geom_path(data = tsne_curve, aes(x = TSNE1, y = TSNE2)) + 
        theme_bw() + 
        coord_fixed()
)

Question

Check the distribution of pseudotime values across the different cell clusters.

What do you observe? Where you expecting this?

Answer

tibble(
    annotation = factor(germcells$annotation, c("Spermatogonia", "Spermatocytes", "Early spermatids", "Late spermatids")), 
    pseudotime = germcells$pseudotime
) |> 
    ggplot(aes(x = annotation, y = pseudotime, fill = annotation)) + 
    geom_violin(scale = 'width') + 
    geom_boxplot(outlier.shape = NULL, width = 0.1, fill = 'white') + 
    theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

Correct pseudotime values as you would expect it to be

germcells$pseudotime <- scales::rescale((-1 * slingshot::slingPseudotime(germcells_slingshot)[, 'Lineage1']), c(0, 1))

15.2.3 BONUS: Daunting snippet but that makes a cool figure for a paper: modeling pseudotime-dependent gene expression

Using pseudotime / cell, one can model gene expression along the differentiation process. This alleviates the need to study gene expression per cell, and allows one to focus on process-related effects (e.g. gene expression during a developmental trajectory).

Question

Try to do so for few markers of spermatogonia, spermatocytes and spermatids.

Answer

genes <- c('ID4', 'SYCP3', 'DMC1', 'ACR', 'PRM1', 'PGK2')
fitExprs <- logcounts(germcells[genes, ]) |> # ----------------------------------- Get norm. counts for genes of interest
    as.matrix() |> 
    t() |> 
    as_tibble() |> 
    mutate(  # ----------------------------------------------------------------- Add information for each cell
        cellID = colnames(germcells), 
        annotation = factor(germcells$annotation, c("Spermatogonia", "Spermatocytes", "Early spermatids", "Late spermatids")), 
        pseudotime = germcells$pseudotime
    ) |> 
    pivot_longer(contains(genes), names_to = 'gene', values_to = 'obs_expr') |> # - Pivot in "long" tidy format 
    mutate(gene = factor(gene, genes)) |> 
    group_by(gene) |> # ------------------------------------------------------- Group rows by genes
    nest(.key = 'data') |> # -------------------------------------------------- For each gene, extract the subtable into a column named data
    mutate(
        gamModel = map(data, ~mgcv::gam(obs_expr ~ s(pseudotime, bs = "cs"), data = .)), 
        gamFitted_expr = map(gamModel, predict) # ------------------------------ For each gene, fit the expression values ~ pseudotime with a GAM
    ) |> 
    dplyr::select(-ends_with('Model')) |> 
    unnest(c(data, ends_with('_expr'))) # -------------------------------------- Unnest all the modelled expressions
ggplot(fitExprs) + 
    geom_point(aes(x = pseudotime, y = obs_expr, col = annotation), alpha = 0.5) + 
    geom_line(aes(x = pseudotime, y = gamFitted_expr), col = 'white', size = 2, alpha = 0.5) + 
    geom_line(aes(x = pseudotime, y = gamFitted_expr), col = '#af2d0c', size = 1) +
    theme_bw() + 
    facet_grid(gene~., scales = 'free') + 
    labs(y = 'logcounts') + 
    ggtitle('Fitted models of pseudotime-dependent gene expression')

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

15.3 Ordering trajectory with RNA velocity

As we saw earlier, TI does not necessarily know which direction is right for a given trajectory. This orientation can be sometimes refined using RNA velocity. Let’s see whether RNA velocity helps orientating our spermatocyte differentiation lineage trajectory here.

Question

Read velociraptor documentation. What do you need to compute RNA velocity scores in R?
Import spliced and unspliced counts computed with velocyto in R.
Try and compute RNA velocity.

Answer

## - Import entire GSE112013 dataset with spliced/unspliced counts 
full_GSE112013_counts <- LoomExperiment::import('~/Share/Guo_testis/Guo_testis_full-counts.loom')
full_GSE112013_counts

## - Filter `germcells` genes and cells to only retain those present in `full_GSE112013_counts`
germcells <- germcells[
    rowData(germcells)$gene_id %in% rowData(full_GSE112013_counts)$Accession, 
    germcells$Barcode %in% full_GSE112013_counts$Barcode
]

## - Reorder rows of `full_GSE112013_counts_germcells` to match those of `germcells`
full_GSE112013_counts_germcells <- full_GSE112013_counts[match(rowData(germcells)$gene_id, rowData(full_GSE112013_counts)$Accession), match(germcells$Barcode, full_GSE112013_counts$Barcode)]
dim(germcells)
dim(full_GSE112013_counts_germcells)

## - Add spliced/unspliced counts to germcells
assay(germcells, 'spliced', withDimnames=FALSE) <- as(assay(full_GSE112013_counts_germcells, 'spliced'), 'dgCMatrix')
assay(germcells, 'unspliced', withDimnames=FALSE) <- as(assay(full_GSE112013_counts_germcells, 'unspliced'), 'dgCMatrix')
germcells

## - Run velociraptor
velo_out <- velociraptor::scvelo(
    germcells, 
    assay.X = "counts", 
    subset.row = rowData(germcells)$isHVG, 
    use.dimred = "corrected"
)
velo_out

Embed the velocity field in tSNE scRNAseq embedding and plot the RNA velocity field on top of tSNE projection. Conclude.

embedded_velo <- velociraptor::embedVelocity(reducedDim(germcells, "TSNE"), velo_out)
head(embedded_velo)
grid_df <- velociraptor::gridVectors(reducedDim(germcells, "TSNE"), embedded_velo, resolution = 30, scale = TRUE)
head(grid_df)
scater::plotReducedDim(germcells, 'TSNE', colour_by = "annotation", point_alpha = 0.5) +
    geom_segment(
        data = grid_df, 
        mapping = aes(x = start.1, y = start.2, xend = end.1, yend = end.2), 
        arrow = arrow(length = unit(0.05, "inches"), type = "closed")
    )

15.4 Session info

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.1 (2024-06-14)
 os       macOS Sonoma 14.6.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Paris
 date     2024-11-03
 pandoc   2.19.2 @ /opt/homebrew/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version date (UTC) lib source
 abind                  1.4-8   2024-09-12 [1] CRAN (R 4.4.1)
 beachmat               2.20.0  2024-05-06 [1] Bioconduc~
 beeswarm               0.4.0   2021-06-01 [1] CRAN (R 4.4.0)
 Biobase              * 2.64.0  2024-04-30 [1] Bioconduc~
 BiocGenerics         * 0.50.0  2024-04-30 [1] Bioconduc~
 BiocNeighbors          1.22.0  2024-04-30 [1] Bioconduc~
 BiocParallel           1.38.0  2024-04-30 [1] Bioconduc~
 BiocSingular           1.20.0  2024-04-30 [1] Bioconduc~
 bit                    4.5.0   2024-09-20 [1] CRAN (R 4.4.1)
 bit64                  4.5.2   2024-09-22 [1] CRAN (R 4.4.1)
 cachem                 1.1.0   2024-05-16 [1] CRAN (R 4.4.0)
 cli                    3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
 codetools              0.2-20  2024-03-31 [1] CRAN (R 4.4.1)
 colorspace             2.1-1   2024-07-26 [1] CRAN (R 4.4.0)
 cowplot                1.1.3   2024-01-22 [1] CRAN (R 4.4.0)
 crayon                 1.5.3   2024-06-20 [1] CRAN (R 4.4.0)
 DelayedArray           0.30.1  2024-05-30 [1] Bioconduc~
 DelayedMatrixStats     1.26.0  2024-04-30 [1] Bioconduc~
 devtools               2.4.5   2022-10-11 [1] CRAN (R 4.4.0)
 digest                 0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
 dplyr                * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
 ellipsis               0.3.2   2021-04-29 [1] CRAN (R 4.4.0)
 evaluate               1.0.1   2024-10-10 [1] CRAN (R 4.4.1)
 fansi                  1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
 farver                 2.1.2   2024-05-13 [1] CRAN (R 4.4.0)
 fastmap                1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
 forcats              * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
 fs                     1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
 generics               0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
 GenomeInfoDb         * 1.40.1  2024-06-16 [1] Bioconduc~
 GenomeInfoDbData       1.2.12  2024-10-29 [1] Bioconductor
 GenomicRanges        * 1.56.2  2024-10-09 [1] Bioconduc~
 ggbeeswarm             0.7.2   2023-04-29 [1] CRAN (R 4.4.0)
 ggplot2              * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
 ggrepel                0.9.6   2024-09-07 [1] CRAN (R 4.4.1)
 glue                   1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
 gridExtra              2.3     2017-09-09 [1] CRAN (R 4.4.0)
 gtable                 0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
 hms                    1.1.3   2023-03-21 [1] CRAN (R 4.4.0)
 htmltools              0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
 htmlwidgets            1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
 httpuv                 1.6.15  2024-03-26 [1] CRAN (R 4.4.0)
 httr                   1.4.7   2023-08-15 [1] CRAN (R 4.4.0)
 igraph                 2.1.1   2024-10-19 [1] CRAN (R 4.4.1)
 IRanges              * 2.38.1  2024-07-03 [1] Bioconduc~
 irlba                  2.3.5.1 2022-10-03 [1] CRAN (R 4.4.0)
 jsonlite               1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
 knitr                  1.48    2024-07-07 [1] CRAN (R 4.4.0)
 labeling               0.4.3   2023-08-29 [1] CRAN (R 4.4.0)
 later                  1.3.2   2023-12-06 [1] CRAN (R 4.4.0)
 lattice                0.22-6  2024-03-20 [1] CRAN (R 4.4.1)
 lifecycle              1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
 lubridate            * 1.9.3   2023-09-27 [1] CRAN (R 4.4.0)
 magrittr               2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
 Matrix                 1.7-1   2024-10-18 [1] CRAN (R 4.4.1)
 MatrixGenerics       * 1.16.0  2024-04-30 [1] Bioconduc~
 matrixStats          * 1.4.1   2024-09-08 [1] CRAN (R 4.4.1)
 memoise                2.0.1   2021-11-26 [1] CRAN (R 4.4.0)
 mgcv                   1.9-1   2023-12-21 [1] CRAN (R 4.4.1)
 mime                   0.12    2021-09-28 [1] CRAN (R 4.4.0)
 miniUI                 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
 munsell                0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
 nlme                   3.1-164 2023-11-27 [1] CRAN (R 4.4.1)
 pillar                 1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
 pkgbuild               1.4.5   2024-10-28 [1] CRAN (R 4.4.1)
 pkgconfig              2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
 pkgload                1.4.0   2024-06-28 [1] CRAN (R 4.4.0)
 princurve              2.1.6   2021-01-18 [1] CRAN (R 4.4.0)
 profvis                0.4.0   2024-09-20 [1] CRAN (R 4.4.1)
 promises               1.3.0   2024-04-05 [1] CRAN (R 4.4.0)
 purrr                * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
 R6                     2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
 Rcpp                   1.0.13  2024-07-17 [1] CRAN (R 4.4.0)
 readr                * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
 remotes                2.5.0   2024-03-17 [1] CRAN (R 4.4.0)
 rlang                  1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown              2.28    2024-08-17 [1] CRAN (R 4.4.0)
 rsvd                   1.0.5   2021-04-16 [1] CRAN (R 4.4.0)
 S4Arrays               1.4.1   2024-05-30 [1] Bioconduc~
 S4Vectors            * 0.42.1  2024-07-03 [1] Bioconduc~
 ScaledMatrix           1.12.0  2024-04-30 [1] Bioconduc~
 scales                 1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
 scater                 1.32.1  2024-07-21 [1] Bioconduc~
 scuttle                1.14.0  2024-04-30 [1] Bioconduc~
 sessioninfo            1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
 shiny                  1.9.1   2024-08-01 [1] CRAN (R 4.4.0)
 SingleCellExperiment * 1.26.0  2024-04-30 [1] Bioconduc~
 SingleR                2.6.0   2024-04-30 [1] Bioconduc~
 slingshot              2.12.0  2024-04-30 [1] Bioconduc~
 SparseArray            1.4.8   2024-05-30 [1] Bioconduc~
 sparseMatrixStats      1.16.0  2024-04-30 [1] Bioconduc~
 stringi                1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
 stringr              * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
 SummarizedExperiment * 1.34.0  2024-04-30 [1] Bioconduc~
 tibble               * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
 tidyr                * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
 tidyselect             1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
 tidyverse            * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
 timechange             0.3.0   2024-01-18 [1] CRAN (R 4.4.0)
 TrajectoryUtils        1.12.0  2024-04-30 [1] Bioconduc~
 tzdb                   0.4.0   2023-05-12 [1] CRAN (R 4.4.0)
 UCSC.utils             1.0.0   2024-05-06 [1] Bioconduc~
 urlchecker             1.0.1   2021-11-30 [1] CRAN (R 4.4.0)
 usethis                3.0.0   2024-07-29 [1] CRAN (R 4.4.0)
 utf8                   1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
 vctrs                  0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
 vipor                  0.4.7   2023-12-18 [1] CRAN (R 4.4.0)
 viridis                0.6.5   2024-01-29 [1] CRAN (R 4.4.0)
 viridisLite            0.4.2   2023-05-02 [1] CRAN (R 4.4.0)
 vroom                  1.6.5   2023-12-05 [1] CRAN (R 4.4.0)
 withr                  3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
 xfun                   0.48    2024-10-03 [1] CRAN (R 4.4.1)
 xtable                 1.8-4   2019-04-21 [1] CRAN (R 4.4.0)
 XVector                0.44.0  2024-04-30 [1] Bioconduc~
 zlibbioc               1.50.0  2024-04-30 [1] Bioconduc~

 [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────