6  Lab 3: Introduction to R/Bioconductor

Notes

The estimated time for this lab is around 1h20.

Aims
  • Install and load R packages from different sources.
  • Understand the basic classes used in R and Bioconductor.
  • Load and manipulate single-cell RNA-seq data in R and Bioconductor.
  • Compare Bioconductor and Seurat approaches to scRNAseq data analysis.
  • Read scRNAseq data from local files.

6.1 Installing packages in R

“Hey, I’ve heard so many good things about this piece of software, it’s called ‘slingshot’? Heard of it? I really want to try it out on my dataset!!”

Or, in other words: “how do I install this or that brand new cutting-edge fancy package?”

R works with packages, available from different sources:

  • CRAN, the R developer team and official package provider: CRAN (which can probably win the title of “the worst webpage ever designed that survived until 2023”).
  • Bioconductor, another package provider, with a primary focus on genomic-related packages: Bioconductor.
  • Other unofficial sources, such as GitHub.

Let’s start by going over package installation.

Question

Install mgcv, HCAData and revelio packages. Each of these three packages is available from a different source:

  • mgcv is a CRAN package
  • HCAData is a Bioconductor package
  • revelio is a GitHub package
R
Installing package into ‘/home/rsg/R/x86_64-pc-linux-gnu-library/4.4’
(as ‘lib’ is unspecified)

trying URL 'https://cloud.r-project.org/src/contrib/mgcv_1.9-1.tar.gz'
Content type 'application/x-gzip' length 1083217 bytes (1.0 MB)
==================================================
downloaded 1.0 MB

* installing *source* package ‘mgcv’ ...
** package ‘mgcv’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
using C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
gcc -I"/usr/share/R/include" -DNDEBUG      -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-4dpK2T/r-base-4.4.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c coxph.c -o coxph.o

...

installing to /home/rsg/R/x86_64-pc-linux-gnu-library/4.4/00LOCK-mgcv/00new/mgcv/libs
** R
** data
** inst
** byte-compile and prepare package for lazy loading

** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (mgcv)

The downloaded source packages are in
        ‘/tmp/RtmpHmbVYD/downloaded_packages’
R
BiocManager::install('HCAData')
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cloud.r-project.org
Bioconductor version 3.19 (BiocManager 1.30.23), R 4.4.1 (2024-06-14)
Installing package(s) 'HCAData'
trying URL 'https://bioconductor.org/packages/3.19/data/experiment/src/contrib/HCAData_1.20.0.tar.gz'
Content type 'application/x-gzip' length 1542758 bytes (1.5 MB)
==================================================
downloaded 1.5 MB

* installing *source* package ‘HCAData’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading

...

** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (HCAData)

The downloaded source packages are in
        ‘/tmp/RtmpHmbVYD/downloaded_packages’
R
devtools::install_github('danielschw188/Revelio')
Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
Downloading GitHub repo danielschw188/Revelio@HEAD
These packages have more recent versions available.
It is recommended to update all of them.

...

** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (Revelio)

Package help pages are available at different places, depending on their source. That being said, there is a place I like to go to easily find information related to most packages:

https://rdrr.io/

For instance, check out Revelio package help pages.

  • What is this package designed for?
  • What are its main functions? What type of input does it require?

6.2 Basic R and Bioconductor classes

While CRAN is a repository of general-purpose packages, Bioconductor is the greatest source of analytical tools, data and workflows dedicated to genomic projects in R. Read more about Bioconductor to fully understand how it builds up on top of R general features, especially with the specific classes it introduces.

The two main concepts behind Bioconductor’s success are the non-redundant classes of objects it provides and their inter-operability. Huber et al., Nat. Methods 2015 summarizes it well.

6.2.1 Important R concepts:

6.2.1.1 tibble tables:

tibbles are built on the fundamental data.frame objects. They follow “tidy” concepts, all gathered in a common tidyverse. This set of key concepts help general data investigation and data visualization through a set of associated packages such as ggplot2.

── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
dat <- tibble(
    x = 1:5, 
    y = 1, 
    z = x ^ 2 + y, 
    class = c('a', 'a', 'b', 'b', 'c')
)
dat
# A tibble: 5 × 4
      x     y     z class
  <int> <dbl> <dbl> <chr>
1     1     1     2 a    
2     2     1     5 a    
3     3     1    10 b    
4     4     1    17 b    
5     5     1    26 c    
  • Import a text file into tibbles

tibbles can be created from text files (or Excel files) using the readr package (part of tidyverse)

R
genes <- read_tsv('~/Share/GSM4486714_AXH009_genes.tsv', col_names = c('ID', 'Symbol'))
Rows: 32738 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): ID, Symbol

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
genes
# A tibble: 32,738 × 2
   ID              Symbol       
   <chr>           <chr>        
 1 ENSG00000243485 MIR1302-10   
 2 ENSG00000237613 FAM138A      
 3 ENSG00000186092 OR4F5        
 4 ENSG00000238009 RP11-34P13.7 
 5 ENSG00000239945 RP11-34P13.8 
 6 ENSG00000237683 AL627309.1   
 7 ENSG00000239906 RP11-34P13.14
 8 ENSG00000241599 RP11-34P13.9 
 9 ENSG00000228463 AP006222.2   
10 ENSG00000237094 RP4-669L17.10
# ℹ 32,728 more rows

6.2.1.2 Handling of tibbles:

tibbles can be readily “sliced” (i.e. selecting rows by number/name), “filtered” (i.e. selecting rows by condition) and columns can be “selected”. All these operations are performed using verbs (most of them provided by the dplyr package, part of tidyverse).

R
slice(genes, 1:4)
# A tibble: 4 × 2
  ID              Symbol      
  <chr>           <chr>       
1 ENSG00000243485 MIR1302-10  
2 ENSG00000237613 FAM138A     
3 ENSG00000186092 OR4F5       
4 ENSG00000238009 RP11-34P13.7
filter(genes, Symbol == 'CCDC67')
# A tibble: 1 × 2
  ID              Symbol
  <chr>           <chr> 
1 ENSG00000165325 CCDC67
filter(genes, grepl('^CCDC.*', Symbol))
# A tibble: 159 × 2
   ID              Symbol  
   <chr>           <chr>   
 1 ENSG00000162592 CCDC27  
 2 ENSG00000160050 CCDC28B 
 3 ENSG00000186409 CCDC30  
 4 ENSG00000177868 CCDC23  
 5 ENSG00000159214 CCDC24  
 6 ENSG00000236624 CCDC163P
 7 ENSG00000159588 CCDC17  
 8 ENSG00000122483 CCDC18  
 9 ENSG00000213085 CCDC19  
10 ENSG00000117477 CCDC181 
# ℹ 149 more rows
filter(genes, grepl('^CCDC.*', Symbol), grepl('.*5$', Symbol))
# A tibble: 9 × 2
  ID              Symbol 
  <chr>           <chr>  
1 ENSG00000136710 CCDC115
2 ENSG00000183323 CCDC125
3 ENSG00000147419 CCDC25 
4 ENSG00000149548 CCDC15 
5 ENSG00000139537 CCDC65 
6 ENSG00000151838 CCDC175
7 ENSG00000159625 CCDC135
8 ENSG00000160994 CCDC105
9 ENSG00000161609 CCDC155
select(genes, 1)
# A tibble: 32,738 × 1
   ID             
   <chr>          
 1 ENSG00000243485
 2 ENSG00000237613
 3 ENSG00000186092
 4 ENSG00000238009
 5 ENSG00000239945
 6 ENSG00000237683
 7 ENSG00000239906
 8 ENSG00000241599
 9 ENSG00000228463
10 ENSG00000237094
# ℹ 32,728 more rows
select(genes, ID)
# A tibble: 32,738 × 1
   ID             
   <chr>          
 1 ENSG00000243485
 2 ENSG00000237613
 3 ENSG00000186092
 4 ENSG00000238009
 5 ENSG00000239945
 6 ENSG00000237683
 7 ENSG00000239906
 8 ENSG00000241599
 9 ENSG00000228463
10 ENSG00000237094
# ℹ 32,728 more rows
select(genes, matches('Sym.*'))
# A tibble: 32,738 × 1
   Symbol       
   <chr>        
 1 MIR1302-10   
 2 FAM138A      
 3 OR4F5        
 4 RP11-34P13.7 
 5 RP11-34P13.8 
 6 AL627309.1   
 7 RP11-34P13.14
 8 RP11-34P13.9 
 9 AP006222.2   
10 RP4-669L17.10
# ℹ 32,728 more rows

Columns can also be quickly added/modified using the mutate verb.

R
mutate(genes, chr = sample(1:22, n(), replace = TRUE))
# A tibble: 32,738 × 3
   ID              Symbol          chr
   <chr>           <chr>         <int>
 1 ENSG00000243485 MIR1302-10       15
 2 ENSG00000237613 FAM138A          18
 3 ENSG00000186092 OR4F5            21
 4 ENSG00000238009 RP11-34P13.7      1
 5 ENSG00000239945 RP11-34P13.8      9
 6 ENSG00000237683 AL627309.1       20
 7 ENSG00000239906 RP11-34P13.14    15
 8 ENSG00000241599 RP11-34P13.9     15
 9 ENSG00000228463 AP006222.2        4
10 ENSG00000237094 RP4-669L17.10    10
# ℹ 32,728 more rows

6.2.1.3 |> pipe:

Actions on tibbles can be piped as a chain with |>, just like | pipes stdout as the stdin of the next command in bash. In this case, the first argument is always the output of the previous function and is ommited. Because tidyverse functions generally return a modified version of the input, pipping works remarkably well in such context.

R
read_tsv('~/Share/GSM4486714_AXH009_genes.tsv', col_names = c('ID', 'Symbol')) |> 
    mutate(chr = sample(1:22, n(), replace = TRUE)) |> 
    filter(chr == 2, grepl('^CCDC.*', Symbol)) |> 
    select(ID) |> 
    slice_head(n = 3)
Rows: 32738 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): ID, Symbol

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 3 × 1
  ID             
  <chr>          
1 ENSG00000117477
2 ENSG00000173421
3 ENSG00000229140

6.2.2 Important Bioconductor concepts:

6.2.2.1 SummarizedExperiment class:

The most fundamental class used to hold the content of large-scale quantitative analyses, such as counts of RNA-seq experiments, or high-throughput cytometry experiments or proteomics experiments.

Make sure you understand the structure of objects from this class. A dedicated workshop that I would recommend quickly going over is available here. Generally speaking, a SummarizedExperiment object contains matrix-like objects (the assays), with rows representing features (e.g. genes, transcripts, …) and each column representing a sample. Information specific to genes and samples are stored in “parallel” data frames, for example to store gene locations, tissue of expression, biotypes (for genes) or batch, generation date, or machine ID (for samples). On top of that, metadata are also stored in the object (to store description of a project, …).

An important difference with S3 list-like objects usually used in R is that most of the underlying data (organized in precisely structured "slots") is accessed using getter functions, rather than the familiar $ or [. Here are some important getters:

  • assay(), assays(): Extrant matrix-like or list of matrix-like objects of identical dimensions. Since the objects are matrix-like, dim(), dimnames(), and 2-dimensional [, [<- methods are available.
  • colData(): Annotations on each column (as a DataFrame): usually, description of each sample
  • rowData(): Annotations on each row (as a DataFrame): usually, description of each gene
  • metadata(): List of unstructured metadata describing the overall content of the object.

Let’s dig into an example (you may need to install the airway package from Bioconductor…)

Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: 'matrixStats'
The following object is masked from 'package:dplyr':

    count

Attaching package: 'MatrixGenerics'
The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps,
    colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates,
    colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse,
    rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans,
    rowWeightedMedians, rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'
The following objects are masked from 'package:lubridate':

    intersect, setdiff, union
The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, table, tapply, union, unique,
    unsplit, which.max, which.min
Loading required package: S4Vectors

Attaching package: 'S4Vectors'
The following objects are masked from 'package:lubridate':

    second, second<-
The following objects are masked from 'package:dplyr':

    first, rename
The following object is masked from 'package:tidyr':

    expand
The following object is masked from 'package:utils':

    findMatches
The following objects are masked from 'package:base':

    expand.grid, I, unname
Loading required package: IRanges

Attaching package: 'IRanges'
The following object is masked from 'package:lubridate':

    %within%
The following objects are masked from 'package:dplyr':

    collapse, desc, slice
The following object is masked from 'package:purrr':

    reduce
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

Attaching package: 'Biobase'
The following object is masked from 'package:MatrixGenerics':

    rowMedians
The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians
library(airway)
data(airway)
airway
class: RangedSummarizedExperiment 
dim: 63677 8 
metadata(1): ''
assays(1): counts
rownames(63677): ENSG00000000003 ENSG00000000005 ... ENSG00000273492 ENSG00000273493
rowData names(10): gene_id gene_name ... seq_coord_system symbol
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(9): SampleName cell ... Sample BioSample
Question

What are the dimensions of the dataset? What type of quantitative data is stored? Which features are assessed?

R
dim(airway)
[1] 63677     8
rowData(airway)
DataFrame with 63677 rows and 10 columns
                        gene_id     gene_name  entrezid   gene_biotype gene_seq_start gene_seq_end              seq_name seq_strand seq_coord_system        symbol
                    <character>   <character> <integer>    <character>      <integer>    <integer>           <character>  <integer>        <integer>   <character>
ENSG00000000003 ENSG00000000003        TSPAN6        NA protein_coding       99883667     99894988                     X         -1               NA        TSPAN6
ENSG00000000005 ENSG00000000005          TNMD        NA protein_coding       99839799     99854882                     X          1               NA          TNMD
ENSG00000000419 ENSG00000000419          DPM1        NA protein_coding       49551404     49575092                    20         -1               NA          DPM1
ENSG00000000457 ENSG00000000457         SCYL3        NA protein_coding      169818772    169863408                     1         -1               NA         SCYL3
ENSG00000000460 ENSG00000000460      C1orf112        NA protein_coding      169631245    169823221                     1          1               NA      C1orf112
...                         ...           ...       ...            ...            ...          ...                   ...        ...              ...           ...
ENSG00000273489 ENSG00000273489 RP11-180C16.1        NA      antisense      131178723    131182453                     7         -1               NA RP11-180C16.1
ENSG00000273490 ENSG00000273490        TSEN34        NA protein_coding       54693789     54697585 HSCHR19LRC_LRC_J_CTG1          1               NA        TSEN34
ENSG00000273491 ENSG00000273491  RP11-138A9.2        NA        lincRNA      130600118    130603315          HG1308_PATCH          1               NA  RP11-138A9.2
ENSG00000273492 ENSG00000273492    AP000230.1        NA        lincRNA       27543189     27589700                    21          1               NA    AP000230.1
ENSG00000273493 ENSG00000273493  RP11-80H18.4        NA        lincRNA       58315692     58315845                     3          1               NA  RP11-80H18.4
colData(airway)
DataFrame with 8 rows and 9 columns
           SampleName     cell      dex    albut        Run avgLength Experiment    Sample    BioSample
             <factor> <factor> <factor> <factor>   <factor> <integer>   <factor>  <factor>     <factor>
SRR1039508 GSM1275862  N61311     untrt    untrt SRR1039508       126  SRX384345 SRS508568 SAMN02422669
SRR1039509 GSM1275863  N61311     trt      untrt SRR1039509       126  SRX384346 SRS508567 SAMN02422675
SRR1039512 GSM1275866  N052611    untrt    untrt SRR1039512       126  SRX384349 SRS508571 SAMN02422678
SRR1039513 GSM1275867  N052611    trt      untrt SRR1039513        87  SRX384350 SRS508572 SAMN02422670
SRR1039516 GSM1275870  N080611    untrt    untrt SRR1039516       120  SRX384353 SRS508575 SAMN02422682
SRR1039517 GSM1275871  N080611    trt      untrt SRR1039517       126  SRX384354 SRS508576 SAMN02422673
SRR1039520 GSM1275874  N061011    untrt    untrt SRR1039520       101  SRX384357 SRS508579 SAMN02422683
SRR1039521 GSM1275875  N061011    trt      untrt SRR1039521        98  SRX384358 SRS508580 SAMN02422677
Question

Can you create a subset of the data corresponding to LRG genes in untreated samples?

R
untreated_LRG <- airway[grepl('^LRG_', rownames(airway)), airway$dex == 'untrt']
untreated_LRG
class: RangedSummarizedExperiment 
dim: 0 4 
metadata(1): ''
assays(1): counts
rownames(0):
rowData names(10): gene_id gene_name ... seq_coord_system symbol
colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520
colData names(9): SampleName cell ... Sample BioSample

6.2.2.2 GenomicRanges class (a.k.a. GRanges):

GenomicRanges are a type of IntervalRanges, they are useful to describe genomic intervals. Each entry in a GRanges object has a seqnames(), a start() and an end() coordinates, a strand(), as well as associated metadata (mcols()). They can be built from scratch using tibbles converted with makeGRangesFromDataFrame().

R
library(GenomicRanges)
gr <- read_tsv('~/Share/GSM4486714_AXH009_genes.tsv', col_names = c('ID', 'Symbol')) |> 
    mutate(
        chr = sample(1:22, n(), replace = TRUE), 
        start = sample(1:1000, n(), replace = TRUE),
        end = sample(10000:20000, n(), replace = TRUE),
        strand = sample(c('-', '+'), n(), replace = TRUE)
    ) |> 
    makeGRangesFromDataFrame(keep.extra.columns = TRUE)
Rows: 32738 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): ID, Symbol

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gr
GRanges object with 32738 ranges and 2 metadata columns:
          seqnames    ranges strand |              ID       Symbol
             <Rle> <IRanges>  <Rle> |     <character>  <character>
      [1]       12 789-11554      - | ENSG00000243485   MIR1302-10
      [2]        2 483-10455      - | ENSG00000237613      FAM138A
      [3]        1 928-18540      + | ENSG00000186092        OR4F5
      [4]       17 583-13775      + | ENSG00000238009 RP11-34P13.7
      [5]       20 556-16032      - | ENSG00000239945 RP11-34P13.8
      ...      ...       ...    ... .             ...          ...
  [32734]       18 105-16111      - | ENSG00000215635   AC145205.1
  [32735]       12  20-11980      - | ENSG00000268590        BAGE5
  [32736]        7 560-10165      - | ENSG00000251180   CU459201.1
  [32737]       11  57-13558      + | ENSG00000215616   AC002321.2
  [32738]        4 880-19993      - | ENSG00000215611   AC002321.1
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths
mcols(gr)
DataFrame with 32738 rows and 2 columns
                   ID       Symbol
          <character>  <character>
1     ENSG00000243485   MIR1302-10
2     ENSG00000237613      FAM138A
3     ENSG00000186092        OR4F5
4     ENSG00000238009 RP11-34P13.7
5     ENSG00000239945 RP11-34P13.8
...               ...          ...
32734 ENSG00000215635   AC145205.1
32735 ENSG00000268590        BAGE5
32736 ENSG00000251180   CU459201.1
32737 ENSG00000215616   AC002321.2
32738 ENSG00000215611   AC002321.1

Just like tidyverse in R, tidy functions are provided for GRanges by the plyranges package.

R
library(plyranges)

Attaching package: 'plyranges'
The following object is masked from 'package:IRanges':

    slice
The following objects are masked from 'package:dplyr':

    between, n, n_distinct
The following object is masked from 'package:stats':

    filter
gr |> 
    filter(start < 400, end > 12000, end < 15000) |> 
    seqnames() |> 
    table()

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22 
185 172 201 181 173 166 165 183 169 167 211 188 191 194 162 202 180 205 173 167 196 168 
Question

Can you find a way to easily read common input files such as bed files into GRanges?

R
library(rtracklayer)
genes2 <- import('~/Share/GRCm39_genes.bed')
genes2
GRanges object with 53700 ranges and 2 metadata columns:
          seqnames            ranges strand |               name     score
             <Rle>         <IRanges>  <Rle> |        <character> <numeric>
      [1]        1   3143476-3144545      + | ENSMUSG00000102693         0
      [2]        1   3172239-3172348      + | ENSMUSG00000064842         0
      [3]        1   3276124-3741721      - | ENSMUSG00000051951         0
      [4]        1   3322980-3323459      + | ENSMUSG00000102851         0
      [5]        1   3435954-3438772      - | ENSMUSG00000103377         0
      ...      ...               ...    ... .                ...       ...
  [53696]        Y 90763696-90766736      - | ENSMUSG00000095366         0
  [53697]        Y 90764326-90774754      + | ENSMUSG00000095134         0
  [53698]        Y 90796007-90827734      + | ENSMUSG00000096768         0
  [53699]        Y 90848682-90855309      + | ENSMUSG00000099871         0
  [53700]        Y 90850138-90850446      - | ENSMUSG00000096850         0
  -------
  seqinfo: 36 sequences from an unspecified genome; no seqlengths
Question

How would you have proceeded without rtracklayer? Check the start coordinates: what do you see? Comment on the interest of using Bioconductor.

R
library(rtracklayer)
genes2_manual <- read_tsv('~/Share/GRCm39_genes.bed', col_names = FALSE) |> 
    drop_na() |>
    purrr::set_names(c('chr', 'start', 'stop', 'id', 'score', 'strand')) |> 
    makeGRangesFromDataFrame(keep.extra.columns = TRUE)
Rows: 53700 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (3): X1, X4, X6
dbl (3): X2, X3, X5

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
genes2_manual
GRanges object with 53700 ranges and 2 metadata columns:
          seqnames            ranges strand |                 id     score
             <Rle>         <IRanges>  <Rle> |        <character> <numeric>
      [1]        1   3143475-3144545      + | ENSMUSG00000102693         0
      [2]        1   3172238-3172348      + | ENSMUSG00000064842         0
      [3]        1   3276123-3741721      - | ENSMUSG00000051951         0
      [4]        1   3322979-3323459      + | ENSMUSG00000102851         0
      [5]        1   3435953-3438772      - | ENSMUSG00000103377         0
      ...      ...               ...    ... .                ...       ...
  [53696]        Y 90763695-90766736      - | ENSMUSG00000095366         0
  [53697]        Y 90764325-90774754      + | ENSMUSG00000095134         0
  [53698]        Y 90796006-90827734      + | ENSMUSG00000096768         0
  [53699]        Y 90848681-90855309      + | ENSMUSG00000099871         0
  [53700]        Y 90850137-90850446      - | ENSMUSG00000096850         0
  -------
  seqinfo: 36 sequences from an unspecified genome; no seqlengths
head(start(genes2))
[1] 3143476 3172239 3276124 3322980 3435954 3445779
head(start(genes2_manual))
[1] 3143475 3172238 3276123 3322979 3435953 3445778

6.3 CRAN & Bioconductor approaches to scRNAseq

6.3.1 scRNAseq in Bioconductor

For single-cell RNA-seq projects, Bioconductor has been introducting new classes and standards very rapidly in the past few years. Notably, several packages are increasingly becoming central for single-cell analysis:

  • SingleCellExperiment
  • scater
  • scran
  • scuttle
  • batchelor
  • SingleR
  • bluster
  • DropletUtils
  • slingshot
  • tradeSeq

SingleCellExperiment is the fundamental class designed to contain single-cell (RNA-seq) data in Bioconductor ecosystem. It is a modified version of the SummarizedExperiment object, so most of the getters/setters are shared with this class.

R
library(SingleCellExperiment)
source('~/Share/bin/prepare_Nestorowa.R') # Adapted from Nestorowa et al., Blood 2016 (doi: 10.1182/blood-2016-05-716480)
loading from cache
sce
class: SingleCellExperiment 
dim: 500 1920 
metadata(0):
assays(2): counts logcounts
rownames(500): ENSMUSG00000076609 ENSMUSG00000021250 ... ENSMUSG00000026000 ENSMUSG00000005982
rowData names(0):
colnames(1920): HSPC_007 HSPC_013 ... Prog_852 Prog_810
colData names(11): gate broad ... label sizeFactor
reducedDimNames(1): diffusion
mainExpName: endogenous
altExpNames(0):
class(sce)
[1] "SingleCellExperiment"
attr(,"package")
[1] "SingleCellExperiment"

Several slots can be accessed in a SingleCellExperiment object, just like the SummarizedExperiment object it’s been adapted from:

R
colData(sce)
DataFrame with 1920 rows and 11 columns
                gate       broad   broad.mpp        fine    fine.mpp     ESLAM      HSC1 projected             metrics       label sizeFactor
         <character> <character> <character> <character> <character> <logical> <logical> <logical>         <DataFrame> <character>  <numeric>
HSPC_007        HSPC          NA          NA          NA          NA     FALSE     FALSE     FALSE 194829:  5022:0:...        HSPC  0.0272469
HSPC_013        HSPC        LMPP          NA        LMPP          NA     FALSE     FALSE     FALSE 110530: 15271:0:...        HSPC  0.0904215
HSPC_019        HSPC        LMPP          NA          NA          NA     FALSE     FALSE     FALSE  86825:  2708:0:...        HSPC  0.0199211
HSPC_025        HSPC         MPP        MPP1          NA          NA     FALSE     FALSE     FALSE 212206:107278:0:...        HSPC  0.5217920
HSPC_031        HSPC         MPP       STHSC          NA          NA     FALSE     FALSE     FALSE 690411:227480:0:...        HSPC  1.1062306
...              ...         ...         ...         ...         ...       ...       ...       ...                 ...         ...        ...
Prog_834        Prog         CMP          NA          NA          NA     FALSE     FALSE     FALSE 471273:296060:0:...        Prog   1.668383
Prog_840        Prog         GMP          NA         GMP          NA     FALSE     FALSE     FALSE 421195:317394:0:...        Prog   1.728596
Prog_846        Prog         GMP          NA          NA          NA     FALSE     FALSE     FALSE 337564:253387:0:...        Prog   1.282827
Prog_852        Prog         MEP          NA         MEP          NA     FALSE     FALSE     FALSE 200193:136478:0:...        Prog   0.643172
Prog_810        Prog         CMP          NA         CMP          NA     FALSE     FALSE     FALSE 857257:396247:0:...        Prog   2.614351
rowData(sce)
DataFrame with 500 rows and 0 columns
dim(sce)
[1]  500 1920
assays(sce)
List of length 2
names(2): counts logcounts

Quantitative metrics for scRNAseq studies can also be stored in assays:

R
assays(sce)
List of length 2
names(2): counts logcounts
assay(sce, 'counts')[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                            
ENSMUSG00000076609 30  16  7   17  19   11 1359   13  15   6
ENSMUSG00000021250  3   2  4    .   1    3  118    2  69  41
ENSMUSG00000076617 40  54 13   29 298  417 1107    9  16  15
ENSMUSG00000075602  3 248  7  537 640    7  300 1530 324 971
ENSMUSG00000006389  3   4  1 1171   6    2  271 5497 293 192
ENSMUSG00000041481  3   .  1    3   1    3  177    5   1 192
ENSMUSG00000024190  1  29  1 1733   1    5   20    3   1   1
ENSMUSG00000003949  4  15  3  175 915 1131   41 1465 330 258
ENSMUSG00000052684  1   4  2  144  82    5  142  578   .  78
ENSMUSG00000026358  .  27  6  344   3    9    3 3612 185   .
assay(sce, 'logcounts')[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                                                                 
ENSMUSG00000076609 10.1060  7.4753 8.4610  5.0695 4.18392  4.4846 13.0580  2.79276 4.9075  4.2566
ENSMUSG00000021250  6.7958  4.5310 7.6567  .      0.92901  2.7725  9.5341  0.93526 7.0711  6.9632
ENSMUSG00000076617 10.5207  9.2245 9.3522  5.8222 8.07886  9.6650 12.7621  2.35193 4.9976  5.5325
ENSMUSG00000075602  6.7958 11.4219 8.4610 10.0086 9.17877  3.8689 10.8791  9.44886 9.2939 11.5179
ENSMUSG00000006389  6.7958  5.4994 5.6780 11.1326 2.68343  2.2894 10.7325 11.29248 9.1491  9.1815
ENSMUSG00000041481  6.7958  .      5.6780  2.7548 0.92901  2.7725 10.1184  1.71395 1.5530  9.1815
ENSMUSG00000024190  5.2365  8.3297 5.6780 11.6979 0.92901  3.4225  6.9829  1.24388 1.5530  2.0068
ENSMUSG00000003949  7.2076  7.3828 7.2441  8.3940 9.69372 11.1033  8.0126  9.38632 9.3203  9.6071
ENSMUSG00000052684  5.2365  5.4994 6.6639  8.1136 6.23123  3.4225  9.8009  8.04786 .       7.8856
ENSMUSG00000026358  .       8.2269 8.2393  9.3669 1.89216  4.2094  4.3091 10.68693 8.4872  .     
counts(sce)[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                            
ENSMUSG00000076609 30  16  7   17  19   11 1359   13  15   6
ENSMUSG00000021250  3   2  4    .   1    3  118    2  69  41
ENSMUSG00000076617 40  54 13   29 298  417 1107    9  16  15
ENSMUSG00000075602  3 248  7  537 640    7  300 1530 324 971
ENSMUSG00000006389  3   4  1 1171   6    2  271 5497 293 192
ENSMUSG00000041481  3   .  1    3   1    3  177    5   1 192
ENSMUSG00000024190  1  29  1 1733   1    5   20    3   1   1
ENSMUSG00000003949  4  15  3  175 915 1131   41 1465 330 258
ENSMUSG00000052684  1   4  2  144  82    5  142  578   .  78
ENSMUSG00000026358  .  27  6  344   3    9    3 3612 185   .
logcounts(sce)[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                                                                 
ENSMUSG00000076609 10.1060  7.4753 8.4610  5.0695 4.18392  4.4846 13.0580  2.79276 4.9075  4.2566
ENSMUSG00000021250  6.7958  4.5310 7.6567  .      0.92901  2.7725  9.5341  0.93526 7.0711  6.9632
ENSMUSG00000076617 10.5207  9.2245 9.3522  5.8222 8.07886  9.6650 12.7621  2.35193 4.9976  5.5325
ENSMUSG00000075602  6.7958 11.4219 8.4610 10.0086 9.17877  3.8689 10.8791  9.44886 9.2939 11.5179
ENSMUSG00000006389  6.7958  5.4994 5.6780 11.1326 2.68343  2.2894 10.7325 11.29248 9.1491  9.1815
ENSMUSG00000041481  6.7958  .      5.6780  2.7548 0.92901  2.7725 10.1184  1.71395 1.5530  9.1815
ENSMUSG00000024190  5.2365  8.3297 5.6780 11.6979 0.92901  3.4225  6.9829  1.24388 1.5530  2.0068
ENSMUSG00000003949  7.2076  7.3828 7.2441  8.3940 9.69372 11.1033  8.0126  9.38632 9.3203  9.6071
ENSMUSG00000052684  5.2365  5.4994 6.6639  8.1136 6.23123  3.4225  9.8009  8.04786 .       7.8856
ENSMUSG00000026358  .       8.2269 8.2393  9.3669 1.89216  4.2094  4.3091 10.68693 8.4872  .     
Question

Check the colData() output of the sce object. What information is stored there? How can you access the different objects stored in colData?

R
colData(sce)
DataFrame with 1920 rows and 11 columns
                gate       broad   broad.mpp        fine    fine.mpp     ESLAM      HSC1 projected             metrics       label sizeFactor
         <character> <character> <character> <character> <character> <logical> <logical> <logical>         <DataFrame> <character>  <numeric>
HSPC_007        HSPC          NA          NA          NA          NA     FALSE     FALSE     FALSE 194829:  5022:0:...        HSPC  0.0272469
HSPC_013        HSPC        LMPP          NA        LMPP          NA     FALSE     FALSE     FALSE 110530: 15271:0:...        HSPC  0.0904215
HSPC_019        HSPC        LMPP          NA          NA          NA     FALSE     FALSE     FALSE  86825:  2708:0:...        HSPC  0.0199211
HSPC_025        HSPC         MPP        MPP1          NA          NA     FALSE     FALSE     FALSE 212206:107278:0:...        HSPC  0.5217920
HSPC_031        HSPC         MPP       STHSC          NA          NA     FALSE     FALSE     FALSE 690411:227480:0:...        HSPC  1.1062306
...              ...         ...         ...         ...         ...       ...       ...       ...                 ...         ...        ...
Prog_834        Prog         CMP          NA          NA          NA     FALSE     FALSE     FALSE 471273:296060:0:...        Prog   1.668383
Prog_840        Prog         GMP          NA         GMP          NA     FALSE     FALSE     FALSE 421195:317394:0:...        Prog   1.728596
Prog_846        Prog         GMP          NA          NA          NA     FALSE     FALSE     FALSE 337564:253387:0:...        Prog   1.282827
Prog_852        Prog         MEP          NA         MEP          NA     FALSE     FALSE     FALSE 200193:136478:0:...        Prog   0.643172
Prog_810        Prog         CMP          NA         CMP          NA     FALSE     FALSE     FALSE 857257:396247:0:...        Prog   2.614351
lapply(colData(sce), class)
$gate
[1] "character"

$broad
[1] "character"

$broad.mpp
[1] "character"

$fine
[1] "character"

$fine.mpp
[1] "character"

$ESLAM
[1] "logical"

$HSC1
[1] "logical"

$projected
[1] "logical"

$metrics
[1] "DFrame"
attr(,"package")
[1] "S4Vectors"

$label
[1] "character"

$sizeFactor
[1] "numeric"
head(colData(sce)[[1]])
[1] "HSPC" "HSPC" "HSPC" "HSPC" "HSPC" "HSPC"
head(colData(sce)[['FACS']])
NULL
head(sce$sizeFactor)
HSPC_007 HSPC_013 HSPC_019 HSPC_025 HSPC_031 HSPC_037 
0.027247 0.090422 0.019921 0.521792 1.106231 0.514315 
Question

Are there any reduced dimensionality representation of the data stored in the sce object? How can we run a PCA using normalized counts?

R
List of length 1
names(1): diffusion
pca <- prcomp(t(logcounts(sce)))
names(pca)
[1] "sdev"     "rotation" "center"   "scale"    "x"       
dim(pca$x)
[1] 1920  500
head(pca$x[, 1:50])
              PC1      PC2      PC3      PC4     PC5      PC6        PC7     PC8     PC9      PC10     PC11    PC12     PC13      PC14     PC15    PC16      PC17     PC18      PC19     PC20      PC21
HSPC_007   5.5957 -3.38018   9.9597 13.13023  3.7096 10.32141   3.764118  3.7285  8.0435 -10.48526  1.15326  4.1454  -5.2693 -1.813323 -0.97617  4.7550   3.11405 -1.74971  2.479306  6.36876  3.063672
HSPC_013 -12.1968 -5.40485   1.5587 13.77344 -1.3253 -0.94266   0.426347 -1.2937  4.5126  -8.51455  4.07508  3.2265  -2.3242 -6.277261  0.83638 -1.1571  -2.58356  2.27125 -7.971672  2.78289 -1.501527
HSPC_019  -3.5459  7.74575   5.3837 -0.69904  3.9597  0.25072   5.827758  5.6821  4.6550  -6.72713  5.06253  1.6937 -10.8387 -0.068593 -6.44997 -8.0882  -1.97497  0.23654  0.067473 -0.22922  1.049034
HSPC_025 -25.0520  5.25948   4.2798 13.99371 -1.7324 -5.80749   0.067956 -5.3545 -2.5282  -4.97378 -0.77335 -2.2739   6.4097 -3.363907  2.04061  5.8590 -10.45179  9.55541  2.684936  2.26075  4.727528
HSPC_031 -16.1440  8.09702 -10.0759  4.31911  2.7599 -6.63687  -8.269004  1.9340  2.4052  -0.78640  5.34394 12.1510  -6.0860  5.866753 -0.35564 -2.5135  -1.12345  9.87570 -5.632063 -5.99195  0.055566
HSPC_037 -20.4713 -0.66065  -7.4788 14.13894  7.8887  0.82878 -10.215540 -1.1051  3.3203   0.89116 10.82329 -1.3852  -6.6037 -2.795674  3.17142  9.6886  -0.79144  2.22416  3.350303 10.48825  7.977354
            PC22     PC23     PC24    PC25      PC26    PC27     PC28      PC29     PC30    PC31     PC32      PC33    PC34      PC35     PC36     PC37    PC38     PC39    PC40    PC41     PC42
HSPC_007  3.0668 -4.98611 -2.44344 -4.4294 -4.793706 -3.7329  2.63928   0.10687 -4.45481 -5.4965 -1.38617  -0.47128  7.0242 -3.064532 -6.12037 -6.24617 -0.6804  -3.3962 -1.5903  4.1173  6.51825
HSPC_013  4.9239 -0.80967 -2.46145 -3.2220 -0.050257  2.9270  2.57239  -3.69495 -3.27098  1.1131  1.38702   4.07027 -5.2938 -4.267308  2.19268 -0.89853  1.7735   2.0599 -2.7113  1.5901 -4.90143
HSPC_019  4.0552  0.50168 -0.40174 -7.7278  5.796156  5.3452  5.36809 -10.29223  8.39879 -2.8710  3.38587  -4.43033  6.4016 -0.261125  5.56004 -0.21594  5.5926  -2.6249  4.4587  2.6632  4.37120
HSPC_025  2.8838  0.75559 -2.31164  5.6351  0.201399 -1.5091 -5.13768   3.25790  0.61794 -5.3367 -3.22935   1.31144  4.7864  2.222136  1.38136  0.49663  6.6661  -2.1598 -4.0360 -1.3179  0.76347
HSPC_031 -1.0387 -5.81508  6.23710 -1.9101 10.224710  4.6615  2.47881   1.67633 -4.07955 -7.6437 -6.53825  -4.78450  1.4944 14.043554  0.43717 -2.46710 14.2073 -11.3719 -9.6034 -9.2047 -4.13460
HSPC_037  1.5760 -3.89167 -3.24755 -3.5053 -8.775323  1.7835  0.44296   0.85686  0.13623 -1.8063  0.82291 -14.66814  1.0963  0.069053  2.66947  0.51324  5.0497   3.7888 -6.9892 -2.2889  0.13915
              PC43   PC44    PC45     PC46     PC47     PC48     PC49     PC50
HSPC_007 -3.642617 2.3962  4.7195  0.64813  2.23608  0.11513 -3.67102 -3.04973
HSPC_013 -5.056999 1.8504 -1.4505  2.78728 -2.25562 -0.49966 -4.66937 -1.93109
HSPC_019 -3.252537 3.4987  1.2089  3.95118  1.14975 -0.66164  2.54210 -9.56988
HSPC_025 -2.772840 1.0145 -3.5630 -0.70085 -8.45979  0.23048 -0.27531  6.04259
HSPC_031  0.087917 1.1393  7.9606  3.08551  0.87608 -2.49711  8.00412  0.62299
HSPC_037  3.347148 7.4915  1.6934  7.40074  6.66899  1.43225 -8.25343 -2.52982
Question

Now, let’s compute a UMAP embedding from this PCA and compare it to the PCA embedding.

R
umap <- uwot::umap(pca$x)
colnames(umap) <- c('UMAP1', 'UMAP2')
plot(pca$x[,1], pca$x[,2])

plot(umap[,1], umap[,2])

We will see more advanced ways of reducing scRNAseq data dimensionality in the coming lectures and labs.

6.3.2 scRNAseq in R

Seurat is another very popular ecosystem to investigate scRNAseq data. It is primarily developed and maintained by the Sajita Lab. It originally begun as a single package aiming at encompassing “all” (most) aspects of scRNAseq analysis. However, it rapidly evolved in a much larger project, and now operates along with other “wrappers” and extensions. It also has a very extended support from the lab group. All in all, is provides a (somewhat) simple workflow to start investigating scRNAseq data.

It is important to chose one standard that you feel comfortable with yourself. Which standard provides the most intuitive approach for you? Do you prefer an “all-in-one, plug-n-play” workflow (Seurat-style), or a modular approach (Bioconductor-style)? Which documentation is easier to read for you, a central full-featured website with extensive examples (Seurat-style), or “programmatic”-style vignettes (Bioconductor-style)?

This course will mostly rely on Bioconductor-based methods, but sometimes use Seurat-based methods.s In the absence of coordination of data structures, the next best solution is to write functions to coerce an object from a certain class to another class (i.e. Seurat to SingleCellExperiment, or vice-versa). Luckily, this is quite straightforward in R for these 2 data classes:

R
sce_seurat <- Seurat::as.Seurat(sce)
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from DC to DC_
sce
class: SingleCellExperiment 
dim: 500 1920 
metadata(0):
assays(2): counts logcounts
rownames(500): ENSMUSG00000076609 ENSMUSG00000021250 ... ENSMUSG00000026000 ENSMUSG00000005982
rowData names(0):
colnames(1920): HSPC_007 HSPC_013 ... Prog_852 Prog_810
colData names(11): gate broad ... label sizeFactor
reducedDimNames(1): diffusion
mainExpName: endogenous
altExpNames(0):
sce_seurat
An object of class Seurat 
500 features across 1920 samples within 1 assay 
Active assay: endogenous (500 features, 0 variable features)
 2 layers present: counts, data
 1 dimensional reduction calculated: diffusion
sce2 <- Seurat::as.SingleCellExperiment(sce_seurat)
Question

Do you see any change between sce and the corresponding, “back-converted”, sce2 objects? Explain these differences.

R
sce
class: SingleCellExperiment 
dim: 500 1920 
metadata(0):
assays(2): counts logcounts
rownames(500): ENSMUSG00000076609 ENSMUSG00000021250 ... ENSMUSG00000026000 ENSMUSG00000005982
rowData names(0):
colnames(1920): HSPC_007 HSPC_013 ... Prog_852 Prog_810
colData names(11): gate broad ... label sizeFactor
reducedDimNames(1): diffusion
mainExpName: endogenous
altExpNames(0):
sce2
class: SingleCellExperiment 
dim: 500 1920 
metadata(0):
assays(2): counts logcounts
rownames(500): ENSMUSG00000076609 ENSMUSG00000021250 ... ENSMUSG00000026000 ENSMUSG00000005982
rowData names(0):
colnames(1920): HSPC_007 HSPC_013 ... Prog_852 Prog_810
colData names(19): orig.ident nCount_endogenous ... sizeFactor ident
reducedDimNames(1): DIFFUSION
mainExpName: endogenous
altExpNames(0):
#
colData(sce)
DataFrame with 1920 rows and 11 columns
                gate       broad   broad.mpp        fine    fine.mpp     ESLAM      HSC1 projected             metrics       label sizeFactor
         <character> <character> <character> <character> <character> <logical> <logical> <logical>         <DataFrame> <character>  <numeric>
HSPC_007        HSPC          NA          NA          NA          NA     FALSE     FALSE     FALSE 194829:  5022:0:...        HSPC  0.0272469
HSPC_013        HSPC        LMPP          NA        LMPP          NA     FALSE     FALSE     FALSE 110530: 15271:0:...        HSPC  0.0904215
HSPC_019        HSPC        LMPP          NA          NA          NA     FALSE     FALSE     FALSE  86825:  2708:0:...        HSPC  0.0199211
HSPC_025        HSPC         MPP        MPP1          NA          NA     FALSE     FALSE     FALSE 212206:107278:0:...        HSPC  0.5217920
HSPC_031        HSPC         MPP       STHSC          NA          NA     FALSE     FALSE     FALSE 690411:227480:0:...        HSPC  1.1062306
...              ...         ...         ...         ...         ...       ...       ...       ...                 ...         ...        ...
Prog_834        Prog         CMP          NA          NA          NA     FALSE     FALSE     FALSE 471273:296060:0:...        Prog   1.668383
Prog_840        Prog         GMP          NA         GMP          NA     FALSE     FALSE     FALSE 421195:317394:0:...        Prog   1.728596
Prog_846        Prog         GMP          NA          NA          NA     FALSE     FALSE     FALSE 337564:253387:0:...        Prog   1.282827
Prog_852        Prog         MEP          NA         MEP          NA     FALSE     FALSE     FALSE 200193:136478:0:...        Prog   0.643172
Prog_810        Prog         CMP          NA         CMP          NA     FALSE     FALSE     FALSE 857257:396247:0:...        Prog   2.614351
colData(sce2)
DataFrame with 1920 rows and 19 columns
         orig.ident nCount_endogenous nFeature_endogenous        gate       broad   broad.mpp        fine    fine.mpp     ESLAM      HSC1 projected metrics.X__no_feature metrics.X__ambiguous
           <factor>         <numeric>           <integer> <character> <character> <character> <character> <character> <logical> <logical> <logical>             <numeric>            <numeric>
HSPC_007       HSPC              1262                 381        HSPC          NA          NA          NA          NA     FALSE     FALSE     FALSE                194829                 5022
HSPC_013       HSPC              5903                 409        HSPC        LMPP          NA        LMPP          NA     FALSE     FALSE     FALSE                110530                15271
HSPC_019       HSPC              1002                 355        HSPC        LMPP          NA          NA          NA     FALSE     FALSE     FALSE                 86825                 2708
HSPC_025       HSPC             57990                 433        HSPC         MPP        MPP1          NA          NA     FALSE     FALSE     FALSE                212206               107278
HSPC_031       HSPC            118570                 435        HSPC         MPP       STHSC          NA          NA     FALSE     FALSE     FALSE                690411               227480
...             ...               ...                 ...         ...         ...         ...         ...         ...       ...       ...       ...                   ...                  ...
Prog_834       Prog            185729                 476        Prog         CMP          NA          NA          NA     FALSE     FALSE     FALSE                471273               296060
Prog_840       Prog            245357                 484        Prog         GMP          NA         GMP          NA     FALSE     FALSE     FALSE                421195               317394
Prog_846       Prog            198660                 483        Prog         GMP          NA          NA          NA     FALSE     FALSE     FALSE                337564               253387
Prog_852       Prog             56540                 463        Prog         MEP          NA         MEP          NA     FALSE     FALSE     FALSE                200193               136478
Prog_810       Prog            266950                 480        Prog         CMP          NA         CMP          NA     FALSE     FALSE     FALSE                857257               396247
         metrics.X__too_low_aQual metrics.X__not_aligned metrics.X__alignment_not_unique       label sizeFactor                ident
                        <numeric>              <numeric>                       <numeric> <character>  <numeric>             <factor>
HSPC_007                        0                5820455                               0        HSPC  0.0272469 SingleCellExperiment
HSPC_013                        0                1562724                               0        HSPC  0.0904215 SingleCellExperiment
HSPC_019                        0                1407254                               0        HSPC  0.0199211 SingleCellExperiment
HSPC_025                        0                1810368                               0        HSPC  0.5217920 SingleCellExperiment
HSPC_031                        0                6097116                               0        HSPC  1.1062306 SingleCellExperiment
...                           ...                    ...                             ...         ...        ...                  ...
Prog_834                        0                4312094                               0        Prog   1.668383 SingleCellExperiment
Prog_840                        0                3770447                               0        Prog   1.728596 SingleCellExperiment
Prog_846                        0                2549458                               0        Prog   1.282827 SingleCellExperiment
Prog_852                        0                1442745                               0        Prog   0.643172 SingleCellExperiment
Prog_810                        0                4518455                               0        Prog   2.614351 SingleCellExperiment
Question

Try and access the underlying raw or normalized data from the sce_seurat object. How does it compare to data access from an SingleCellExperiment object?

R
colnames(sce_seurat)
   [1] "HSPC_007"   "HSPC_013"   "HSPC_019"   "HSPC_025"   "HSPC_031"   "HSPC_037"   "LT-HSC_001" "HSPC_001"   "HSPC_008"   "HSPC_014"   "HSPC_020"   "HSPC_026"   "HSPC_032"   "HSPC_038"  
  [15] "LT-HSC_002" "HSPC_002"   "HSPC_009"   "HSPC_015"   "HSPC_021"   "HSPC_027"   "HSPC_033"   "HSPC_039"   "LT-HSC_003" "HSPC_003"   "HSPC_010"   "HSPC_016"   "HSPC_022"   "HSPC_028"  
  [29] "HSPC_034"   "HSPC_040"   "LT-HSC_004" "HSPC_004"   "HSPC_011"   "HSPC_017"   "HSPC_023"   "HSPC_029"   "HSPC_035"   "HSPC_041"   "LT-HSC_005" "HSPC_005"   "HSPC_012"   "HSPC_018"  
  [43] "HSPC_024"   "HSPC_030"   "HSPC_036"   "HSPC_042"   "LT-HSC_006" "HSPC_006"   "Prog_007"   "Prog_013"   "Prog_019"   "Prog_025"   "Prog_031"   "Prog_037"   "LT-HSC_007" "Prog_001"  
  [57] "Prog_008"   "Prog_014"   "Prog_020"   "Prog_026"   "Prog_032"   "Prog_038"   "LT-HSC_008" "Prog_002"   "Prog_009"   "Prog_015"   "Prog_021"   "Prog_027"   "Prog_033"   "Prog_039"  
  [71] "LT-HSC_009" "Prog_003"   "Prog_010"   "Prog_016"   "Prog_022"   "Prog_028"   "Prog_034"   "Prog_040"   "LT-HSC_010" "Prog_004"   "Prog_011"   "Prog_017"   "Prog_023"   "Prog_029"  
  [85] "Prog_035"   "Prog_041"   "LT-HSC_011" "Prog_005"   "Prog_012"   "Prog_018"   "Prog_024"   "Prog_030"   "Prog_036"   "Prog_042"   "LT-HSC_012" "Prog_006"   "HSPC_049"   "HSPC_055"  
  [99] "HSPC_061"   "HSPC_067"   "HSPC_073"   "HSPC_079"   "LT-HSC_013" "HSPC_043"   "HSPC_050"   "HSPC_056"   "HSPC_062"   "HSPC_068"   "HSPC_074"   "HSPC_080"   "LT-HSC_014" "HSPC_044"  
 [113] "HSPC_051"   "HSPC_057"   "HSPC_063"   "HSPC_069"   "HSPC_075"   "HSPC_081"   "LT-HSC_015" "HSPC_045"   "HSPC_052"   "HSPC_058"   "HSPC_064"   "HSPC_070"   "HSPC_076"   "HSPC_082"  
 [127] "LT-HSC_016" "HSPC_046"   "HSPC_053"   "HSPC_059"   "HSPC_065"   "HSPC_071"   "HSPC_077"   "HSPC_083"   "LT-HSC_017" "HSPC_047"   "HSPC_054"   "HSPC_060"   "HSPC_066"   "HSPC_072"  
 [141] "HSPC_078"   "HSPC_084"   "LT-HSC_018" "HSPC_048"   "Prog_049"   "Prog_055"   "Prog_061"   "Prog_067"   "Prog_073"   "Prog_079"   "LT-HSC_019" "Prog_043"   "Prog_050"   "Prog_056"  
 [155] "Prog_062"   "Prog_068"   "Prog_074"   "Prog_080"   "LT-HSC_020" "Prog_044"   "Prog_051"   "Prog_057"   "Prog_063"   "Prog_069"   "Prog_075"   "Prog_081"   "LT-HSC_021" "Prog_045"  
 [169] "Prog_052"   "Prog_058"   "Prog_064"   "Prog_070"   "Prog_076"   "Prog_082"   "LT-HSC_022" "Prog_046"   "Prog_053"   "Prog_059"   "Prog_065"   "Prog_071"   "Prog_077"   "Prog_083"  
 [183] "LT-HSC_023" "Prog_047"   "Prog_054"   "Prog_060"   "Prog_066"   "Prog_072"   "Prog_078"   "Prog_084"   "LT-HSC_024" "Prog_048"   "HSPC_091"   "HSPC_097"   "HSPC_103"   "HSPC_109"  
 [197] "HSPC_115"   "HSPC_121"   "LT-HSC_025" "HSPC_085"   "HSPC_092"   "HSPC_098"   "HSPC_104"   "HSPC_110"   "HSPC_116"   "HSPC_122"   "LT-HSC_026" "HSPC_086"   "HSPC_093"   "HSPC_099"  
 [211] "HSPC_105"   "HSPC_111"   "HSPC_117"   "HSPC_123"   "LT-HSC_027" "HSPC_087"   "HSPC_094"   "HSPC_100"   "HSPC_106"   "HSPC_112"   "HSPC_118"   "HSPC_124"   "LT-HSC_028" "HSPC_088"  
 [225] "HSPC_095"   "HSPC_101"   "HSPC_107"   "HSPC_113"   "HSPC_119"   "HSPC_125"   "LT-HSC_029" "HSPC_089"   "HSPC_096"   "HSPC_102"   "HSPC_108"   "HSPC_114"   "HSPC_120"   "HSPC_126"  
 [239] "LT-HSC_030" "HSPC_090"   "Prog_091"   "Prog_097"   "Prog_103"   "Prog_109"   "Prog_115"   "Prog_121"   "LT-HSC_031" "Prog_085"   "Prog_092"   "Prog_098"   "Prog_104"   "Prog_110"  
 [253] "Prog_116"   "Prog_122"   "LT-HSC_032" "Prog_086"   "Prog_093"   "Prog_099"   "Prog_105"   "Prog_111"   "Prog_117"   "Prog_123"   "LT-HSC_033" "Prog_087"   "Prog_094"   "Prog_100"  
 [267] "Prog_106"   "Prog_112"   "Prog_118"   "Prog_124"   "LT-HSC_034" "Prog_088"   "Prog_095"   "Prog_101"   "Prog_107"   "Prog_113"   "Prog_119"   "Prog_125"   "LT-HSC_035" "Prog_089"  
 [281] "Prog_096"   "Prog_102"   "Prog_108"   "Prog_114"   "Prog_120"   "Prog_126"   "LT-HSC_036" "Prog_090"   "HSPC_133"   "HSPC_139"   "HSPC_145"   "HSPC_151"   "HSPC_157"   "HSPC_163"  
 [295] "LT-HSC_037" "HSPC_127"   "HSPC_134"   "HSPC_140"   "HSPC_146"   "HSPC_152"   "HSPC_158"   "HSPC_164"   "LT-HSC_038" "HSPC_128"   "HSPC_135"   "HSPC_141"   "HSPC_147"   "HSPC_153"  
 [309] "HSPC_159"   "HSPC_165"   "LT-HSC_039" "HSPC_129"   "HSPC_136"   "HSPC_142"   "HSPC_148"   "HSPC_154"   "HSPC_160"   "HSPC_166"   "LT-HSC_040" "HSPC_130"   "HSPC_137"   "HSPC_143"  
 [323] "HSPC_149"   "HSPC_155"   "HSPC_161"   "HSPC_167"   "LT-HSC_041" "HSPC_131"   "HSPC_138"   "HSPC_144"   "HSPC_150"   "HSPC_156"   "HSPC_162"   "HSPC_168"   "LT-HSC_042" "HSPC_132"  
 [337] "Prog_133"   "Prog_139"   "Prog_145"   "Prog_151"   "Prog_157"   "Prog_163"   "LT-HSC_043" "Prog_127"   "Prog_134"   "Prog_140"   "Prog_146"   "Prog_152"   "Prog_158"   "Prog_164"  
 [351] "LT-HSC_044" "Prog_128"   "Prog_135"   "Prog_141"   "Prog_147"   "Prog_153"   "Prog_159"   "Prog_165"   "LT-HSC_045" "Prog_129"   "Prog_136"   "Prog_142"   "Prog_148"   "Prog_154"  
 [365] "Prog_160"   "Prog_166"   "LT-HSC_046" "Prog_130"   "Prog_137"   "Prog_143"   "Prog_149"   "Prog_155"   "Prog_161"   "Prog_167"   "LT-HSC_047" "Prog_131"   "Prog_138"   "Prog_144"  
 [379] "Prog_150"   "Prog_156"   "Prog_162"   "Prog_168"   "LT-HSC_048" "Prog_132"   "HSPC_175"   "HSPC_181"   "HSPC_187"   "HSPC_193"   "HSPC_199"   "HSPC_205"   "LT-HSC_049" "HSPC_169"  
 [393] "HSPC_176"   "HSPC_182"   "HSPC_188"   "HSPC_194"   "HSPC_200"   "HSPC_206"   "LT-HSC_050" "HSPC_170"   "HSPC_177"   "HSPC_183"   "HSPC_189"   "HSPC_195"   "HSPC_201"   "HSPC_207"  
 [407] "LT-HSC_051" "HSPC_171"   "HSPC_178"   "HSPC_184"   "HSPC_190"   "HSPC_196"   "HSPC_202"   "HSPC_208"   "LT-HSC_052" "HSPC_172"   "HSPC_179"   "HSPC_185"   "HSPC_191"   "HSPC_197"  
 [421] "HSPC_203"   "HSPC_209"   "LT-HSC_053" "HSPC_173"   "HSPC_180"   "HSPC_186"   "HSPC_192"   "HSPC_198"   "HSPC_204"   "HSPC_210"   "LT-HSC_054" "HSPC_174"   "Prog_175"   "Prog_181"  
 [435] "Prog_187"   "Prog_193"   "Prog_199"   "Prog_205"   "LT-HSC_055" "Prog_169"   "Prog_176"   "Prog_182"   "Prog_188"   "Prog_194"   "Prog_200"   "Prog_206"   "LT-HSC_056" "Prog_170"  
 [449] "Prog_177"   "Prog_183"   "Prog_189"   "Prog_195"   "Prog_201"   "Prog_207"   "LT-HSC_057" "Prog_171"   "Prog_178"   "Prog_184"   "Prog_190"   "Prog_196"   "Prog_202"   "Prog_208"  
 [463] "LT-HSC_058" "Prog_172"   "Prog_179"   "Prog_185"   "Prog_191"   "Prog_197"   "Prog_203"   "Prog_209"   "LT-HSC_059" "Prog_173"   "Prog_180"   "Prog_186"   "Prog_192"   "Prog_198"  
 [477] "Prog_204"   "Prog_210"   "LT-HSC_060" "Prog_174"   "HSPC_217"   "HSPC_223"   "HSPC_229"   "HSPC_235"   "HSPC_241"   "HSPC_247"   "LT-HSC_061" "HSPC_211"   "HSPC_218"   "HSPC_224"  
 [491] "HSPC_230"   "HSPC_236"   "HSPC_242"   "HSPC_248"   "LT-HSC_062" "HSPC_212"   "HSPC_219"   "HSPC_225"   "HSPC_231"   "HSPC_237"   "HSPC_243"   "HSPC_249"   "LT-HSC_063" "HSPC_213"  
 [505] "HSPC_220"   "HSPC_226"   "HSPC_232"   "HSPC_238"   "HSPC_244"   "HSPC_250"   "LT-HSC_064" "HSPC_214"   "HSPC_221"   "HSPC_227"   "HSPC_233"   "HSPC_239"   "HSPC_245"   "HSPC_251"  
 [519] "LT-HSC_065" "HSPC_215"   "HSPC_222"   "HSPC_228"   "HSPC_234"   "HSPC_240"   "HSPC_246"   "HSPC_252"   "LT-HSC_066" "HSPC_216"   "Prog_217"   "Prog_223"   "Prog_229"   "Prog_235"  
 [533] "Prog_241"   "Prog_247"   "LT-HSC_067" "Prog_211"   "Prog_218"   "Prog_224"   "Prog_230"   "Prog_236"   "Prog_242"   "Prog_248"   "LT-HSC_068" "Prog_212"   "Prog_219"   "Prog_225"  
 [547] "Prog_231"   "Prog_237"   "Prog_243"   "Prog_249"   "LT-HSC_069" "Prog_213"   "Prog_220"   "Prog_226"   "Prog_232"   "Prog_238"   "Prog_244"   "Prog_250"   "LT-HSC_070" "Prog_214"  
 [561] "Prog_221"   "Prog_227"   "Prog_233"   "Prog_239"   "Prog_245"   "Prog_251"   "LT-HSC_071" "Prog_215"   "Prog_222"   "Prog_228"   "Prog_234"   "Prog_240"   "Prog_246"   "Prog_252"  
 [575] "LT-HSC_072" "Prog_216"   "HSPC_259"   "HSPC_265"   "HSPC_271"   "HSPC_277"   "HSPC_283"   "HSPC_289"   "LT-HSC_073" "HSPC_253"   "HSPC_260"   "HSPC_266"   "HSPC_272"   "HSPC_278"  
 [589] "HSPC_284"   "HSPC_290"   "LT-HSC_074" "HSPC_254"   "HSPC_261"   "HSPC_267"   "HSPC_273"   "HSPC_279"   "HSPC_285"   "HSPC_291"   "LT-HSC_075" "HSPC_255"   "HSPC_262"   "HSPC_268"  
 [603] "HSPC_274"   "HSPC_280"   "HSPC_286"   "HSPC_292"   "LT-HSC_076" "HSPC_256"   "HSPC_263"   "HSPC_269"   "HSPC_275"   "HSPC_281"   "HSPC_287"   "HSPC_293"   "LT-HSC_077" "HSPC_257"  
 [617] "HSPC_264"   "HSPC_270"   "HSPC_276"   "HSPC_282"   "HSPC_288"   "HSPC_294"   "LT-HSC_078" "HSPC_258"   "Prog_259"   "Prog_265"   "Prog_271"   "Prog_277"   "Prog_283"   "Prog_289"  
 [631] "LT-HSC_079" "Prog_253"   "Prog_260"   "Prog_266"   "Prog_272"   "Prog_278"   "Prog_284"   "Prog_290"   "LT-HSC_080" "Prog_254"   "Prog_261"   "Prog_267"   "Prog_273"   "Prog_279"  
 [645] "Prog_285"   "Prog_291"   "LT-HSC_081" "Prog_255"   "Prog_262"   "Prog_268"   "Prog_274"   "Prog_280"   "Prog_286"   "Prog_292"   "LT-HSC_082" "Prog_256"   "Prog_263"   "Prog_269"  
 [659] "Prog_275"   "Prog_281"   "Prog_287"   "Prog_293"   "LT-HSC_083" "Prog_257"   "Prog_264"   "Prog_270"   "Prog_276"   "Prog_282"   "Prog_288"   "Prog_294"   "LT-HSC_084" "Prog_258"  
 [673] "HSPC_301"   "HSPC_307"   "HSPC_313"   "HSPC_319"   "HSPC_325"   "HSPC_331"   "LT-HSC_085" "HSPC_295"   "HSPC_302"   "HSPC_308"   "HSPC_314"   "HSPC_320"   "HSPC_326"   "HSPC_332"  
 [687] "LT-HSC_086" "HSPC_296"   "HSPC_303"   "HSPC_309"   "HSPC_315"   "HSPC_321"   "HSPC_327"   "HSPC_333"   "LT-HSC_087" "HSPC_297"   "HSPC_304"   "HSPC_310"   "HSPC_316"   "HSPC_322"  
 [701] "HSPC_328"   "HSPC_334"   "LT-HSC_088" "HSPC_298"   "HSPC_305"   "HSPC_311"   "HSPC_317"   "HSPC_323"   "HSPC_329"   "HSPC_335"   "LT-HSC_089" "HSPC_299"   "HSPC_306"   "HSPC_312"  
 [715] "HSPC_318"   "HSPC_324"   "HSPC_330"   "HSPC_336"   "LT-HSC_090" "HSPC_300"   "Prog_301"   "Prog_307"   "Prog_313"   "Prog_319"   "Prog_325"   "Prog_331"   "LT-HSC_091" "Prog_295"  
 [729] "Prog_302"   "Prog_308"   "Prog_314"   "Prog_320"   "Prog_326"   "Prog_332"   "LT-HSC_092" "Prog_296"   "Prog_303"   "Prog_309"   "Prog_315"   "Prog_321"   "Prog_327"   "Prog_333"  
 [743] "LT-HSC_093" "Prog_297"   "Prog_304"   "Prog_310"   "Prog_316"   "Prog_322"   "Prog_328"   "Prog_334"   "LT-HSC_094" "Prog_298"   "Prog_305"   "Prog_311"   "Prog_317"   "Prog_323"  
 [757] "Prog_329"   "Prog_335"   "LT-HSC_095" "Prog_299"   "Prog_306"   "Prog_312"   "Prog_318"   "Prog_324"   "Prog_330"   "Prog_336"   "LT-HSC_096" "Prog_300"   "HSPC_343"   "HSPC_349"  
 [771] "HSPC_355"   "HSPC_361"   "HSPC_367"   "HSPC_373"   "LT-HSC_097" "HSPC_337"   "HSPC_344"   "HSPC_350"   "HSPC_356"   "HSPC_362"   "HSPC_368"   "HSPC_374"   "LT-HSC_098" "HSPC_338"  
 [785] "HSPC_345"   "HSPC_351"   "HSPC_357"   "HSPC_363"   "HSPC_369"   "HSPC_375"   "LT-HSC_099" "HSPC_339"   "HSPC_346"   "HSPC_352"   "HSPC_358"   "HSPC_364"   "HSPC_370"   "HSPC_376"  
 [799] "LT-HSC_100" "HSPC_340"   "HSPC_347"   "HSPC_353"   "HSPC_359"   "HSPC_365"   "HSPC_371"   "HSPC_377"   "LT-HSC_101" "HSPC_341"   "HSPC_348"   "HSPC_354"   "HSPC_360"   "HSPC_366"  
 [813] "HSPC_372"   "HSPC_378"   "LT-HSC_102" "HSPC_342"   "Prog_343"   "Prog_349"   "Prog_355"   "Prog_361"   "Prog_367"   "Prog_373"   "LT-HSC_103" "Prog_337"   "Prog_344"   "Prog_350"  
 [827] "Prog_356"   "Prog_362"   "Prog_368"   "Prog_374"   "LT-HSC_104" "Prog_338"   "Prog_345"   "Prog_351"   "Prog_357"   "Prog_363"   "Prog_369"   "Prog_375"   "LT-HSC_105" "Prog_339"  
 [841] "Prog_346"   "Prog_352"   "Prog_358"   "Prog_364"   "Prog_370"   "Prog_376"   "LT-HSC_106" "Prog_340"   "Prog_347"   "Prog_353"   "Prog_359"   "Prog_365"   "Prog_371"   "Prog_377"  
 [855] "LT-HSC_107" "Prog_341"   "Prog_348"   "Prog_354"   "Prog_360"   "Prog_366"   "Prog_372"   "Prog_378"   "LT-HSC_108" "Prog_342"   "HSPC_385"   "HSPC_391"   "HSPC_397"   "HSPC_403"  
 [869] "HSPC_409"   "HSPC_415"   "HSPC_421"   "HSPC_379"   "HSPC_386"   "HSPC_392"   "HSPC_398"   "HSPC_404"   "HSPC_410"   "HSPC_416"   "HSPC_422"   "HSPC_380"   "HSPC_387"   "HSPC_393"  
 [883] "HSPC_399"   "HSPC_405"   "HSPC_411"   "HSPC_417"   "HSPC_423"   "HSPC_381"   "HSPC_388"   "HSPC_394"   "HSPC_400"   "HSPC_406"   "HSPC_412"   "HSPC_418"   "HSPC_424"   "HSPC_382"  
 [897] "HSPC_389"   "HSPC_395"   "HSPC_401"   "HSPC_407"   "HSPC_413"   "HSPC_419"   "HSPC_425"   "HSPC_383"   "HSPC_390"   "HSPC_396"   "HSPC_402"   "HSPC_408"   "HSPC_414"   "HSPC_420"  
 [911] "HSPC_426"   "HSPC_384"   "Prog_385"   "Prog_391"   "Prog_397"   "Prog_403"   "Prog_409"   "Prog_415"   "Prog_421"   "Prog_379"   "Prog_386"   "Prog_392"   "Prog_398"   "Prog_404"  
 [925] "Prog_410"   "Prog_416"   "Prog_422"   "Prog_380"   "Prog_387"   "Prog_393"   "Prog_399"   "Prog_405"   "Prog_411"   "Prog_417"   "Prog_423"   "Prog_381"   "Prog_388"   "Prog_394"  
 [939] "Prog_400"   "Prog_406"   "Prog_412"   "Prog_418"   "Prog_424"   "Prog_382"   "Prog_389"   "Prog_395"   "Prog_401"   "Prog_407"   "Prog_413"   "Prog_419"   "Prog_425"   "Prog_383"  
 [953] "Prog_390"   "Prog_396"   "Prog_402"   "Prog_408"   "Prog_414"   "Prog_420"   "Prog_426"   "Prog_384"   "HSPC_433"   "HSPC_439"   "HSPC_445"   "HSPC_451"   "HSPC_457"   "HSPC_463"  
 [967] "LT-HSC_109" "HSPC_427"   "HSPC_434"   "HSPC_440"   "HSPC_446"   "HSPC_452"   "HSPC_458"   "HSPC_464"   "LT-HSC_110" "HSPC_428"   "HSPC_435"   "HSPC_441"   "HSPC_447"   "HSPC_453"  
 [981] "HSPC_459"   "HSPC_465"   "LT-HSC_111" "HSPC_429"   "HSPC_436"   "HSPC_442"   "HSPC_448"   "HSPC_454"   "HSPC_460"   "HSPC_466"   "LT-HSC_112" "HSPC_430"   "HSPC_437"   "HSPC_443"  
 [995] "HSPC_449"   "HSPC_455"   "HSPC_461"   "HSPC_467"   "LT-HSC_113" "HSPC_431"   "HSPC_438"   "HSPC_444"   "HSPC_450"   "HSPC_456"   "HSPC_462"   "HSPC_468"   "LT-HSC_114" "HSPC_432"  
[1009] "Prog_433"   "Prog_439"   "Prog_445"   "Prog_451"   "Prog_457"   "Prog_463"   "LT-HSC_115" "Prog_427"   "Prog_434"   "Prog_440"   "Prog_446"   "Prog_452"   "Prog_458"   "Prog_464"  
[1023] "LT-HSC_116" "Prog_428"   "Prog_435"   "Prog_441"   "Prog_447"   "Prog_453"   "Prog_459"   "Prog_465"   "LT-HSC_117" "Prog_429"   "Prog_436"   "Prog_442"   "Prog_448"   "Prog_454"  
[1037] "Prog_460"   "Prog_466"   "LT-HSC_118" "Prog_430"   "Prog_437"   "Prog_443"   "Prog_449"   "Prog_455"   "Prog_461"   "Prog_467"   "LT-HSC_119" "Prog_431"   "Prog_438"   "Prog_444"  
[1051] "Prog_450"   "Prog_456"   "Prog_462"   "Prog_468"   "LT-HSC_120" "Prog_432"   "HSPC_475"   "HSPC_481"   "HSPC_487"   "HSPC_493"   "HSPC_499"   "HSPC_505"   "LT-HSC_121" "HSPC_469"  
[1065] "HSPC_476"   "HSPC_482"   "HSPC_488"   "HSPC_494"   "HSPC_500"   "HSPC_506"   "LT-HSC_122" "HSPC_470"   "HSPC_477"   "HSPC_483"   "HSPC_489"   "HSPC_495"   "HSPC_501"   "HSPC_507"  
[1079] "LT-HSC_123" "HSPC_471"   "HSPC_478"   "HSPC_484"   "HSPC_490"   "HSPC_496"   "HSPC_502"   "HSPC_508"   "LT-HSC_124" "HSPC_472"   "HSPC_479"   "HSPC_485"   "HSPC_491"   "HSPC_497"  
[1093] "HSPC_503"   "HSPC_509"   "LT-HSC_125" "HSPC_473"   "HSPC_480"   "HSPC_486"   "HSPC_492"   "HSPC_498"   "HSPC_504"   "HSPC_510"   "LT-HSC_126" "HSPC_474"   "Prog_475"   "Prog_481"  
[1107] "Prog_487"   "Prog_493"   "Prog_499"   "Prog_505"   "LT-HSC_127" "Prog_469"   "Prog_476"   "Prog_482"   "Prog_488"   "Prog_494"   "Prog_500"   "Prog_506"   "LT-HSC_128" "Prog_470"  
[1121] "Prog_477"   "Prog_483"   "Prog_489"   "Prog_495"   "Prog_501"   "Prog_507"   "LT-HSC_129" "Prog_471"   "Prog_478"   "Prog_484"   "Prog_490"   "Prog_496"   "Prog_502"   "Prog_508"  
[1135] "LT-HSC_130" "Prog_472"   "Prog_479"   "Prog_485"   "Prog_491"   "Prog_497"   "Prog_503"   "Prog_509"   "LT-HSC_131" "Prog_473"   "Prog_480"   "Prog_486"   "Prog_492"   "Prog_498"  
[1149] "Prog_504"   "Prog_510"   "LT-HSC_132" "Prog_474"   "HSPC_517"   "HSPC_523"   "HSPC_529"   "HSPC_535"   "HSPC_541"   "HSPC_547"   "LT-HSC_133" "HSPC_511"   "HSPC_518"   "HSPC_524"  
[1163] "HSPC_530"   "HSPC_536"   "HSPC_542"   "HSPC_548"   "LT-HSC_134" "HSPC_512"   "HSPC_519"   "HSPC_525"   "HSPC_531"   "HSPC_537"   "HSPC_543"   "HSPC_549"   "LT-HSC_135" "HSPC_513"  
[1177] "HSPC_520"   "HSPC_526"   "HSPC_532"   "HSPC_538"   "HSPC_544"   "HSPC_550"   "LT-HSC_136" "HSPC_514"   "HSPC_521"   "HSPC_527"   "HSPC_533"   "HSPC_539"   "HSPC_545"   "HSPC_551"  
[1191] "LT-HSC_137" "HSPC_515"   "HSPC_522"   "HSPC_528"   "HSPC_534"   "HSPC_540"   "HSPC_546"   "HSPC_552"   "LT-HSC_138" "HSPC_516"   "Prog_517"   "Prog_523"   "Prog_529"   "Prog_535"  
[1205] "Prog_541"   "Prog_547"   "LT-HSC_139" "Prog_511"   "Prog_518"   "Prog_524"   "Prog_530"   "Prog_536"   "Prog_542"   "Prog_548"   "LT-HSC_140" "Prog_512"   "Prog_519"   "Prog_525"  
[1219] "Prog_531"   "Prog_537"   "Prog_543"   "Prog_549"   "LT-HSC_141" "Prog_513"   "Prog_520"   "Prog_526"   "Prog_532"   "Prog_538"   "Prog_544"   "Prog_550"   "LT-HSC_142" "Prog_514"  
[1233] "Prog_521"   "Prog_527"   "Prog_533"   "Prog_539"   "Prog_545"   "Prog_551"   "LT-HSC_143" "Prog_515"   "Prog_522"   "Prog_528"   "Prog_534"   "Prog_540"   "Prog_546"   "Prog_552"  
[1247] "LT-HSC_144" "Prog_516"   "HSPC_559"   "HSPC_565"   "HSPC_571"   "HSPC_577"   "HSPC_583"   "HSPC_589"   "LT-HSC_145" "HSPC_553"   "HSPC_560"   "HSPC_566"   "HSPC_572"   "HSPC_578"  
[1261] "HSPC_584"   "HSPC_590"   "LT-HSC_146" "HSPC_554"   "HSPC_561"   "HSPC_567"   "HSPC_573"   "HSPC_579"   "HSPC_585"   "HSPC_591"   "LT-HSC_147" "HSPC_555"   "HSPC_562"   "HSPC_568"  
[1275] "HSPC_574"   "HSPC_580"   "HSPC_586"   "HSPC_592"   "LT-HSC_148" "HSPC_556"   "HSPC_563"   "HSPC_569"   "HSPC_575"   "HSPC_581"   "HSPC_587"   "HSPC_593"   "LT-HSC_149" "HSPC_557"  
[1289] "HSPC_564"   "HSPC_570"   "HSPC_576"   "HSPC_582"   "HSPC_588"   "HSPC_594"   "LT-HSC_150" "HSPC_558"   "Prog_559"   "Prog_565"   "Prog_571"   "Prog_577"   "Prog_583"   "Prog_589"  
[1303] "LT-HSC_151" "Prog_553"   "Prog_560"   "Prog_566"   "Prog_572"   "Prog_578"   "Prog_584"   "Prog_590"   "LT-HSC_152" "Prog_554"   "Prog_561"   "Prog_567"   "Prog_573"   "Prog_579"  
[1317] "Prog_585"   "Prog_591"   "LT-HSC_153" "Prog_555"   "Prog_562"   "Prog_568"   "Prog_574"   "Prog_580"   "Prog_586"   "Prog_592"   "LT-HSC_154" "Prog_556"   "Prog_563"   "Prog_569"  
[1331] "Prog_575"   "Prog_581"   "Prog_587"   "Prog_593"   "LT-HSC_155" "Prog_557"   "Prog_564"   "Prog_570"   "Prog_576"   "Prog_582"   "Prog_588"   "Prog_594"   "LT-HSC_156" "Prog_558"  
[1345] "HSPC_601"   "HSPC_607"   "HSPC_613"   "HSPC_619"   "HSPC_625"   "HSPC_631"   "LT-HSC_157" "HSPC_595"   "HSPC_602"   "HSPC_608"   "HSPC_614"   "HSPC_620"   "HSPC_626"   "HSPC_632"  
[1359] "LT-HSC_158" "HSPC_596"   "HSPC_603"   "HSPC_609"   "HSPC_615"   "HSPC_621"   "HSPC_627"   "HSPC_633"   "LT-HSC_159" "HSPC_597"   "HSPC_604"   "HSPC_610"   "HSPC_616"   "HSPC_622"  
[1373] "HSPC_628"   "HSPC_634"   "LT-HSC_160" "HSPC_598"   "HSPC_605"   "HSPC_611"   "HSPC_617"   "HSPC_623"   "HSPC_629"   "HSPC_635"   "LT-HSC_161" "HSPC_599"   "HSPC_606"   "HSPC_612"  
[1387] "HSPC_618"   "HSPC_624"   "HSPC_630"   "HSPC_636"   "LT-HSC_162" "HSPC_600"   "Prog_601"   "Prog_607"   "Prog_613"   "Prog_619"   "Prog_625"   "Prog_631"   "LT-HSC_163" "Prog_595"  
[1401] "Prog_602"   "Prog_608"   "Prog_614"   "Prog_620"   "Prog_626"   "Prog_632"   "LT-HSC_164" "Prog_596"   "Prog_603"   "Prog_609"   "Prog_615"   "Prog_621"   "Prog_627"   "Prog_633"  
[1415] "LT-HSC_165" "Prog_597"   "Prog_604"   "Prog_610"   "Prog_616"   "Prog_622"   "Prog_628"   "Prog_634"   "LT-HSC_166" "Prog_598"   "Prog_605"   "Prog_611"   "Prog_617"   "Prog_623"  
[1429] "Prog_629"   "Prog_635"   "LT-HSC_167" "Prog_599"   "Prog_606"   "Prog_612"   "Prog_618"   "Prog_624"   "Prog_630"   "Prog_636"   "LT-HSC_168" "Prog_600"   "HSPC_643"   "HSPC_649"  
[1443] "HSPC_655"   "HSPC_661"   "HSPC_667"   "HSPC_673"   "LT-HSC_169" "HSPC_637"   "HSPC_644"   "HSPC_650"   "HSPC_656"   "HSPC_662"   "HSPC_668"   "HSPC_674"   "LT-HSC_170" "HSPC_638"  
[1457] "HSPC_645"   "HSPC_651"   "HSPC_657"   "HSPC_663"   "HSPC_669"   "HSPC_675"   "LT-HSC_171" "HSPC_639"   "HSPC_646"   "HSPC_652"   "HSPC_658"   "HSPC_664"   "HSPC_670"   "HSPC_676"  
[1471] "LT-HSC_172" "HSPC_640"   "HSPC_647"   "HSPC_653"   "HSPC_659"   "HSPC_665"   "HSPC_671"   "HSPC_677"   "LT-HSC_173" "HSPC_641"   "HSPC_648"   "HSPC_654"   "HSPC_660"   "HSPC_666"  
[1485] "HSPC_672"   "HSPC_678"   "LT-HSC_174" "HSPC_642"   "Prog_643"   "Prog_649"   "Prog_655"   "Prog_661"   "Prog_667"   "Prog_673"   "LT-HSC_175" "Prog_637"   "Prog_644"   "Prog_650"  
[1499] "Prog_656"   "Prog_662"   "Prog_668"   "Prog_674"   "LT-HSC_176" "Prog_638"   "Prog_645"   "Prog_651"   "Prog_657"   "Prog_663"   "Prog_669"   "Prog_675"   "LT-HSC_177" "Prog_639"  
[1513] "Prog_646"   "Prog_652"   "Prog_658"   "Prog_664"   "Prog_670"   "Prog_676"   "LT-HSC_178" "Prog_640"   "Prog_647"   "Prog_653"   "Prog_659"   "Prog_665"   "Prog_671"   "Prog_677"  
[1527] "LT-HSC_179" "Prog_641"   "Prog_648"   "Prog_654"   "Prog_660"   "Prog_666"   "Prog_672"   "Prog_678"   "LT-HSC_180" "Prog_642"   "HSPC_685"   "HSPC_691"   "HSPC_697"   "HSPC_703"  
[1541] "HSPC_709"   "HSPC_715"   "LT-HSC_181" "HSPC_679"   "HSPC_686"   "HSPC_692"   "HSPC_698"   "HSPC_704"   "HSPC_710"   "HSPC_716"   "LT-HSC_182" "HSPC_680"   "HSPC_687"   "HSPC_693"  
[1555] "HSPC_699"   "HSPC_705"   "HSPC_711"   "HSPC_717"   "LT-HSC_183" "HSPC_681"   "HSPC_688"   "HSPC_694"   "HSPC_700"   "HSPC_706"   "HSPC_712"   "HSPC_718"   "LT-HSC_184" "HSPC_682"  
[1569] "HSPC_689"   "HSPC_695"   "HSPC_701"   "HSPC_707"   "HSPC_713"   "HSPC_719"   "LT-HSC_185" "HSPC_683"   "HSPC_690"   "HSPC_696"   "HSPC_702"   "HSPC_708"   "HSPC_714"   "HSPC_720"  
[1583] "LT-HSC_186" "HSPC_684"   "Prog_685"   "Prog_691"   "Prog_697"   "Prog_703"   "Prog_709"   "Prog_715"   "LT-HSC_187" "Prog_679"   "Prog_686"   "Prog_692"   "Prog_698"   "Prog_704"  
[1597] "Prog_710"   "Prog_716"   "LT-HSC_188" "Prog_680"   "Prog_687"   "Prog_693"   "Prog_699"   "Prog_705"   "Prog_711"   "Prog_717"   "LT-HSC_189" "Prog_681"   "Prog_688"   "Prog_694"  
[1611] "Prog_700"   "Prog_706"   "Prog_712"   "Prog_718"   "LT-HSC_190" "Prog_682"   "Prog_689"   "Prog_695"   "Prog_701"   "Prog_707"   "Prog_713"   "Prog_719"   "LT-HSC_191" "Prog_683"  
[1625] "Prog_690"   "Prog_696"   "Prog_702"   "Prog_708"   "Prog_714"   "Prog_720"   "LT-HSC_192" "Prog_684"   "HSPC_727"   "HSPC_733"   "HSPC_739"   "HSPC_745"   "HSPC_751"   "HSPC_757"  
[1639] "LT-HSC_193" "HSPC_721"   "HSPC_728"   "HSPC_734"   "HSPC_740"   "HSPC_746"   "HSPC_752"   "HSPC_758"   "LT-HSC_194" "HSPC_722"   "HSPC_729"   "HSPC_735"   "HSPC_741"   "HSPC_747"  
[1653] "HSPC_753"   "HSPC_759"   "LT-HSC_195" "HSPC_723"   "HSPC_730"   "HSPC_736"   "HSPC_742"   "HSPC_748"   "HSPC_754"   "HSPC_760"   "LT-HSC_196" "HSPC_724"   "HSPC_731"   "HSPC_737"  
[1667] "HSPC_743"   "HSPC_749"   "HSPC_755"   "HSPC_761"   "LT-HSC_197" "HSPC_725"   "HSPC_732"   "HSPC_738"   "HSPC_744"   "HSPC_750"   "HSPC_756"   "HSPC_762"   "LT-HSC_198" "HSPC_726"  
[1681] "Prog_727"   "Prog_733"   "Prog_739"   "Prog_745"   "Prog_751"   "Prog_757"   "LT-HSC_199" "Prog_721"   "Prog_728"   "Prog_734"   "Prog_740"   "Prog_746"   "Prog_752"   "Prog_758"  
[1695] "LT-HSC_200" "Prog_722"   "Prog_729"   "Prog_735"   "Prog_741"   "Prog_747"   "Prog_753"   "Prog_759"   "LT-HSC_201" "Prog_723"   "Prog_730"   "Prog_736"   "Prog_742"   "Prog_748"  
[1709] "Prog_754"   "Prog_760"   "LT-HSC_202" "Prog_724"   "Prog_731"   "Prog_737"   "Prog_743"   "Prog_749"   "Prog_755"   "Prog_761"   "LT-HSC_203" "Prog_725"   "Prog_732"   "Prog_738"  
[1723] "Prog_744"   "Prog_750"   "Prog_756"   "Prog_762"   "LT-HSC_204" "Prog_726"   "HSPC_769"   "HSPC_775"   "HSPC_781"   "HSPC_787"   "HSPC_793"   "HSPC_799"   "LT-HSC_205" "HSPC_763"  
[1737] "HSPC_770"   "HSPC_776"   "HSPC_782"   "HSPC_788"   "HSPC_794"   "HSPC_800"   "LT-HSC_206" "HSPC_764"   "HSPC_771"   "HSPC_777"   "HSPC_783"   "HSPC_789"   "HSPC_795"   "HSPC_801"  
[1751] "LT-HSC_207" "HSPC_765"   "HSPC_772"   "HSPC_778"   "HSPC_784"   "HSPC_790"   "HSPC_796"   "HSPC_802"   "LT-HSC_208" "HSPC_766"   "HSPC_773"   "HSPC_779"   "HSPC_785"   "HSPC_791"  
[1765] "HSPC_797"   "HSPC_803"   "LT-HSC_209" "HSPC_767"   "HSPC_774"   "HSPC_780"   "HSPC_786"   "HSPC_792"   "HSPC_798"   "HSPC_804"   "LT-HSC_210" "HSPC_768"   "Prog_769"   "Prog_775"  
[1779] "Prog_781"   "Prog_787"   "Prog_793"   "Prog_799"   "LT-HSC_211" "Prog_763"   "Prog_770"   "Prog_776"   "Prog_782"   "Prog_788"   "Prog_794"   "Prog_800"   "LT-HSC_212" "Prog_764"  
[1793] "Prog_771"   "Prog_777"   "Prog_783"   "Prog_789"   "Prog_795"   "Prog_801"   "LT-HSC_213" "Prog_765"   "Prog_772"   "Prog_778"   "Prog_784"   "Prog_790"   "Prog_796"   "Prog_802"  
[1807] "LT-HSC_214" "Prog_766"   "Prog_773"   "Prog_779"   "Prog_785"   "Prog_791"   "Prog_797"   "Prog_803"   "LT-HSC_215" "Prog_767"   "Prog_774"   "Prog_780"   "Prog_786"   "Prog_792"  
[1821] "Prog_798"   "Prog_804"   "LT-HSC_216" "Prog_768"   "HSPC_811"   "HSPC_817"   "HSPC_823"   "HSPC_829"   "HSPC_835"   "HSPC_841"   "HSPC_847"   "HSPC_805"   "HSPC_812"   "HSPC_818"  
[1835] "HSPC_824"   "HSPC_830"   "HSPC_836"   "HSPC_842"   "HSPC_848"   "HSPC_806"   "HSPC_813"   "HSPC_819"   "HSPC_825"   "HSPC_831"   "HSPC_837"   "HSPC_843"   "HSPC_849"   "HSPC_807"  
[1849] "HSPC_814"   "HSPC_820"   "HSPC_826"   "HSPC_832"   "HSPC_838"   "HSPC_844"   "HSPC_850"   "HSPC_808"   "HSPC_815"   "HSPC_821"   "HSPC_827"   "HSPC_833"   "HSPC_839"   "HSPC_845"  
[1863] "HSPC_851"   "HSPC_809"   "HSPC_816"   "HSPC_822"   "HSPC_828"   "HSPC_834"   "HSPC_840"   "HSPC_846"   "HSPC_852"   "HSPC_810"   "Prog_811"   "Prog_817"   "Prog_823"   "Prog_829"  
[1877] "Prog_835"   "Prog_841"   "Prog_847"   "Prog_805"   "Prog_812"   "Prog_818"   "Prog_824"   "Prog_830"   "Prog_836"   "Prog_842"   "Prog_848"   "Prog_806"   "Prog_813"   "Prog_819"  
[1891] "Prog_825"   "Prog_831"   "Prog_837"   "Prog_843"   "Prog_849"   "Prog_807"   "Prog_814"   "Prog_820"   "Prog_826"   "Prog_832"   "Prog_838"   "Prog_844"   "Prog_850"   "Prog_808"  
[1905] "Prog_815"   "Prog_821"   "Prog_827"   "Prog_833"   "Prog_839"   "Prog_845"   "Prog_851"   "Prog_809"   "Prog_816"   "Prog_822"   "Prog_828"   "Prog_834"   "Prog_840"   "Prog_846"  
[1919] "Prog_852"   "Prog_810"  
ncol(sce_seurat)
[1] 1920
nrow(sce_seurat)
[1] 500
# cells and features access
head(Seurat::Cells(sce_seurat))
[1] "HSPC_007" "HSPC_013" "HSPC_019" "HSPC_025" "HSPC_031" "HSPC_037"
head(rownames(sce_seurat))
[1] "ENSMUSG00000076609" "ENSMUSG00000021250" "ENSMUSG00000076617" "ENSMUSG00000075602" "ENSMUSG00000006389" "ENSMUSG00000041481"
# cell data access
head(sce_seurat[[]])
         orig.ident nCount_endogenous nFeature_endogenous gate broad broad.mpp fine fine.mpp ESLAM  HSC1 projected metrics.X__no_feature metrics.X__ambiguous metrics.X__too_low_aQual
HSPC_007       HSPC              1262                 381 HSPC  <NA>      <NA> <NA>     <NA> FALSE FALSE     FALSE                194829                 5022                        0
HSPC_013       HSPC              5903                 409 HSPC  LMPP      <NA> LMPP     <NA> FALSE FALSE     FALSE                110530                15271                        0
HSPC_019       HSPC              1002                 355 HSPC  LMPP      <NA> <NA>     <NA> FALSE FALSE     FALSE                 86825                 2708                        0
HSPC_025       HSPC             57990                 433 HSPC   MPP      MPP1 <NA>     <NA> FALSE FALSE     FALSE                212206               107278                        0
HSPC_031       HSPC            118570                 435 HSPC   MPP     STHSC <NA>     <NA> FALSE FALSE     FALSE                690411               227480                        0
HSPC_037       HSPC             53925                 445 HSPC   MPP     STHSC <NA>     <NA> FALSE FALSE     FALSE                242472               126874                        0
         metrics.X__not_aligned metrics.X__alignment_not_unique label sizeFactor
HSPC_007                5820455                               0  HSPC   0.027247
HSPC_013                1562724                               0  HSPC   0.090422
HSPC_019                1407254                               0  HSPC   0.019921
HSPC_025                1810368                               0  HSPC   0.521792
HSPC_031                6097116                               0  HSPC   1.106231
HSPC_037                2267894                               0  HSPC   0.514315
head(sce_seurat$label)
HSPC_007 HSPC_013 HSPC_019 HSPC_025 HSPC_031 HSPC_037 
  "HSPC"   "HSPC"   "HSPC"   "HSPC"   "HSPC"   "HSPC" 
# Counts access
Seurat::GetAssayData(object = sce_seurat, layer = "counts")[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                            
ENSMUSG00000076609 30  16  7   17  19   11 1359   13  15   6
ENSMUSG00000021250  3   2  4    .   1    3  118    2  69  41
ENSMUSG00000076617 40  54 13   29 298  417 1107    9  16  15
ENSMUSG00000075602  3 248  7  537 640    7  300 1530 324 971
ENSMUSG00000006389  3   4  1 1171   6    2  271 5497 293 192
ENSMUSG00000041481  3   .  1    3   1    3  177    5   1 192
ENSMUSG00000024190  1  29  1 1733   1    5   20    3   1   1
ENSMUSG00000003949  4  15  3  175 915 1131   41 1465 330 258
ENSMUSG00000052684  1   4  2  144  82    5  142  578   .  78
ENSMUSG00000026358  .  27  6  344   3    9    3 3612 185   .
Seurat::GetAssayData(object = sce_seurat, layer = "data")[1:10, 1:10]
10 x 10 sparse Matrix of class "dgCMatrix"
  [[ suppressing 10 column names 'HSPC_007', 'HSPC_013', 'HSPC_019' ... ]]
                                                                                                 
ENSMUSG00000076609 10.1060  7.4753 8.4610  5.0695 4.18392  4.4846 13.0580  2.79276 4.9075  4.2566
ENSMUSG00000021250  6.7958  4.5310 7.6567  .      0.92901  2.7725  9.5341  0.93526 7.0711  6.9632
ENSMUSG00000076617 10.5207  9.2245 9.3522  5.8222 8.07886  9.6650 12.7621  2.35193 4.9976  5.5325
ENSMUSG00000075602  6.7958 11.4219 8.4610 10.0086 9.17877  3.8689 10.8791  9.44886 9.2939 11.5179
ENSMUSG00000006389  6.7958  5.4994 5.6780 11.1326 2.68343  2.2894 10.7325 11.29248 9.1491  9.1815
ENSMUSG00000041481  6.7958  .      5.6780  2.7548 0.92901  2.7725 10.1184  1.71395 1.5530  9.1815
ENSMUSG00000024190  5.2365  8.3297 5.6780 11.6979 0.92901  3.4225  6.9829  1.24388 1.5530  2.0068
ENSMUSG00000003949  7.2076  7.3828 7.2441  8.3940 9.69372 11.1033  8.0126  9.38632 9.3203  9.6071
ENSMUSG00000052684  5.2365  5.4994 6.6639  8.1136 6.23123  3.4225  9.8009  8.04786 .       7.8856
ENSMUSG00000026358  .       8.2269 8.2393  9.3669 1.89216  4.2094  4.3091 10.68693 8.4872  .     
# Embeddings 
head(Seurat::Embeddings(object = sce_seurat, reduction = "diffusion"))
              DC_1       DC_2     DC_3
HSPC_007        NA         NA       NA
HSPC_013        NA         NA       NA
HSPC_019        NA         NA       NA
HSPC_025 -0.011016 -0.0014016 0.016446
HSPC_031 -0.013784 -0.0110409 0.013672
HSPC_037 -0.013780 -0.0033031 0.033327

6.4 Reading scRNAseq data

Question

Try to load the raw 10X single-cell RNA-seq data downloaded yesterday (from Lier et al.) into a SingleCellExperiment object using DropletUtils package

R
library(SingleCellExperiment)
sce <- DropletUtils::read10xCounts('~/Share/data_wrangling/counts/outs/filtered_feature_bc_matrix.h5')
sce
class: SingleCellExperiment 
dim: 32285 686 
metadata(1): Samples
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ... ENSMUSG00000095019 ENSMUSG00000095041
rowData names(3): ID Symbol Type
colnames: NULL
colData names(2): Sample Barcode
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
colData(sce)
DataFrame with 686 rows and 2 columns
                    Sample            Barcode
               <character>        <character>
1   ~/Share/data_wrangli.. AAACCTGAGGGTCTCC-1
2   ~/Share/data_wrangli.. AAACCTGCACGGCGTT-1
3   ~/Share/data_wrangli.. AAACGGGTCTGCAAGT-1
4   ~/Share/data_wrangli.. AAAGATGGTTCGAATC-1
5   ~/Share/data_wrangli.. AAAGCAAAGGTGGGTT-1
...                    ...                ...
682 ~/Share/data_wrangli.. TTTCCTCAGACTCGGA-1
683 ~/Share/data_wrangli.. TTTCCTCTCGGAGGTA-1
684 ~/Share/data_wrangli.. TTTGTCAAGTGTCCCG-1
685 ~/Share/data_wrangli.. TTTGTCACATGGAATA-1
686 ~/Share/data_wrangli.. TTTGTCAGTCGAGATG-1
rowData(sce)
DataFrame with 32285 rows and 3 columns
                                   ID      Symbol            Type
                          <character> <character>     <character>
ENSMUSG00000051951 ENSMUSG00000051951        Xkr4 Gene Expression
ENSMUSG00000089699 ENSMUSG00000089699      Gm1992 Gene Expression
ENSMUSG00000102331 ENSMUSG00000102331     Gm19938 Gene Expression
ENSMUSG00000102343 ENSMUSG00000102343     Gm37381 Gene Expression
ENSMUSG00000025900 ENSMUSG00000025900         Rp1 Gene Expression
...                               ...         ...             ...
ENSMUSG00000095523 ENSMUSG00000095523  AC124606.1 Gene Expression
ENSMUSG00000095475 ENSMUSG00000095475  AC133095.2 Gene Expression
ENSMUSG00000094855 ENSMUSG00000094855  AC133095.1 Gene Expression
ENSMUSG00000095019 ENSMUSG00000095019  AC234645.1 Gene Expression
ENSMUSG00000095041 ENSMUSG00000095041  AC149090.1 Gene Expression

Public single-cell RNA-seq data can be retrieved from within R directly, thanks to several data packages, for instance scRNAseq or HCAData.

Question

Check out the He et al., Genome Biol. 2020 paper. Can you find a way to load the scRNAseq data from this paper without having to leave the R console?

R
organs <- scRNAseq::HeOrganAtlasData(ensembl = TRUE)
loading from cache
Warning: Unable to map 1056 of 12021 requested IDs.
organs
class: SingleCellExperiment 
dim: 10965 84363 
metadata(0):
assays(1): counts
rownames(10965): ENSG00000225880 ENSG00000188976 ... ENSG00000273748 ENSG00000271254
rowData names(1): originalName
colnames(84363): AAACCTGAGACACTAA-1 AAACCTGAGCATGGCA-1 ... TTTGTCATCTCTAGGA-1 TTTGTCATCTGCGACG-1
colData names(10): Tissue nCount_RNA ... reclustered.broad reclustered.fine
reducedDimNames(1): TSNE
mainExpName: NULL
altExpNames(0):

The interest of this approach is that one can recover a full-fledged SingleCellExperiment (often) provided by the authors of the corresponding study. This means that lots of information, such as batch ID, clustering, cell annotation, etc., may be readily available.

Question

Check the data available for cells/features in the dataset from He et al..

R
colData(organs)
DataFrame with 84363 rows and 10 columns
                        Tissue nCount_RNA nFeature_RNA percent.mito RNA_snn_res.orig seurat_clusters Cell_type_in_each_tissue Cell_type_in_merged_data reclustered.broad   reclustered.fine
                   <character>  <integer>    <integer>    <numeric>        <integer>       <integer>              <character>              <character>       <character>        <character>
AAACCTGAGACACTAA-1     Bladder       1152          610    0.0789931                7              15         Monocyte Bladder                 Monocyte           Myeloid Classical_Mon_AQP9
AAACCTGAGCATGGCA-1     Bladder       3551         1415    0.0568854               14               8   Macrophage HLA-DPB1_..          Macrophage C1QB           Myeloid               cDC2
AAACCTGAGCTGAACG-1     Bladder       1842          632    0.0266015                4               1   T Cell IL7R_high Bla..              T Cell IL7R               CD4           TCM_KLF2
AAACCTGCAAGCTGGA-1     Bladder       1599          890    0.0268918               13              13   Fibroblast APOD_high..         Fibroblast PTGDS            FibSmo    Fib_CHI3L1_high
AAACCTGCATTACCTT-1     Bladder       1347          613    0.0541945                1               2   T Cell CCL5_high Bla..              T Cell XCL1                NA                 NA
...                        ...        ...          ...          ...              ...             ...                      ...                      ...               ...                ...
TTTGTCATCAGTTGAC-1     Trachea       3334         1461    0.0422915               13              36           B Cell Trachea             B Cell MS4A1      B_and_plasma       Memory_B_TNF
TTTGTCATCATCATTC-1     Trachea       6634         2271    0.0611999                7               8   Macrophage/Monocyte ..          Macrophage C1QB           Myeloid           Mac_SPP1
TTTGTCATCTACGAGT-1     Trachea       2255          769    0.2119734               13               2           B Cell Trachea              T Cell XCL1      B_and_plasma    Plasma_IGKV3-20
TTTGTCATCTCTAGGA-1     Trachea       1568          832    0.0554847                0              21   CD8 T Cell CCL5_high..              T Cell CCL5               CD8           TRM_PRR4
TTTGTCATCTGCGACG-1     Trachea       1907          999    0.0167803               18              20        NK/T Cell Trachea           NK/T Cell GNLY                NA                 NA
table(organs$Tissue)

         Bladder            Blood Common.bile.duct        Esophagus            Heart            Liver       Lymph.node           Marrow           Muscle           Rectum             Skin 
            7572             1407             3160             9117             7881             2839             7771             3230             5732             6280             7710 
 Small.intestine           Spleen          Stomach          Trachea 
            4312             4512             5318             7522 
table(organs$reclustered.fine, organs$Tissue)
                                               
                                                Bladder Blood Common.bile.duct Esophagus Heart Liver Lymph.node Marrow Muscle Rectum Skin Small.intestine Spleen Stomach Trachea
  Absorptive Cell                                     0     0                0         0     0     0          0      0      0      0    0             170      0       0       0
  Basal Epithelial Cell APOE_high                     4     0                0         3     0     0          0      0      0      0  504               1      0       0       1
  Basal Epithelial Cell IFITM1_high                   0     0                1         2     0     0          0      0      0      0    0               0      0       0     441
  Basal Epithelial Cell MMP10_high                    0     0                0         0     0     0          0      0      0      0    0               0      0       0    1747
  Basal Epithelial Cell POSTN_high                    0     0                0         0     0     0          0      0      0      0  545               0      0       0       0
  Basal Epithelial Cell SERPINB3_high                 0     0                0         0     0     0          0      0      0      0    0               0      0       0     839
  BEC_ACKR1                                         105     0                2       177   589     2          0      0    179     38   60               0      0       3      55
  BEC_APOC1                                           0     0                0         0     3     2          0      0     49      0    0               0      0       0       0
  BEC_CA4                                            13     0                1        13   824     9          0      0     53      6    3               0      0       1       7
  BEC_CTSC                                           38     0                0        66    38     1          0      0     12      1 1058               0      0       0      70
  BEC_FABP4                                           3     0                0         1    22     0          0      0    740      1    1               0      0       0       2
  BEC_IGFBP3                                         19     0               13        63   201    27          0      0     41     18  245               0      0       7      56
  BEC_PHLDA1                                         80     0                6       449    36     1          0      0      2      3   56               0      0       1      29
  BEC_PRSS23                                          5     0               74         0     0     3          0      0      0      2    2               0      0       1       5
  BEC_RTEL1-TNFRSF6B                                 15     0                2        25    14     3          0      0      3     10   11               0      0       7     642
  BEC_TIMP1                                           0     0                0         0     0   157          0      0      0      0    0               0      0       1       0
  BEC_TNFRSF4                                         0     0                0         4    22     3          0      0     65      0    3               0      0       0       1
  cDC1                                               36     0                4         1     1     0         12      0      0      2    1               0      1       0       7
  cDC2                                              122     0               12       121   117     3          0      2      4     62   13               1      3       5      18
  Cholangiocyte FXYD2_high                            0     0              625         0     0     6          0      0      0      0    0               0      0       0       0
  Cholangiocyte HIST1H2AM_high                        0     0               83         0     0     0          0      0      0      0    0               0      0       0       0
  Classical_Mon_AQP9                                217     0               22        11    21   277          6      1     24     38   13               3     91      37     106
  Classical_Mon_HCAR3                                13     0              307         2     3    23          3      0      0      4    3               0      0       2      33
  Classical_Mon_S100A12                              26     1               12         3     9   130          1      0      7      3    9               0     81      21      11
  Classical_Mon_S100A8                               11    69               67         0     1    86          0     45      1      3    3               0      9       2      14
  Enterocyte APOA1_high                               0     0                0         0     0     0          0      0      0      0    0             654      0       0       3
  Enterocyte APOB_high                                0     0                0         0     0     0          0      0      0      0    0             586      0       0       4
  Enterocyte PCK1_high                                0     0                0         0     0     0          0      0      0      0    0             665      0       0       4
  Enterocyte PRAP1_high                               0     0                0         0     0     0          0      0      0      0    0             556      0       0       1
  Enterocyte RBP2_high                                0     0                0         0     0     0          0      0      0      0    0             412      0       0       1
  Epithelial Cell DCD_high                            1     0                2        41     0     0          0      0      0      0   73               0      0       1       3
  Epithelial Cell FABP4_high                        116     0                1         0     0     0          0      0      0      0    0               0      0       7       5
  Fib_ANGPTL7_high                                   82     0               30        86    31     0          0      0      4      2   12               0      0       0      10
  Fib_APOD_high                                       0     0                0         4     0     0          0      0    850      0    0               0      0       0       0
  Fib_C1QTNF3_high                                    7     0                1      1291     2     0          0      0      5      3    2               0      0       4       0
  Fib_CHI3L1_high                                   167     0               40        76     6     0          0      0      1     44    0               0      0       9     329
  Fib_CRABP1_high                                     0     0                0         1     0     0          0      0      0      0    0               0      0       0     281
  Fib_CXCL14_high                                    11     0                0         4     4     0          0      0      4      3  687               0      0       0       4
  Fib_IGFBP3_high                                   635     0                0         2     0     0          0      0      0      6    0               0      0       2       0
  Fib_IGFBP6_high                                     2     0                0       573     0     0          0      0      0      2    0               0      0       8       0
  Fib_MT_high                                        22     0                1        74     4     0          0      0      0     21    8               0      0      10      13
  Fib_PCOLCE2_high                                    4     0                0         0  1251     0          0      0      1      2    2               0      0       3       6
  Fib_PTGDS_high                                      1     0                2       781     1     0          0      0      0      2    0               0      0       0       1
  Fib_PTN_high                                        1     0                0        10  2467     1          0      0      1      4    6               0      0       2       7
  Fib_SFRP2_high                                      5     0                0         0     0     0          0      0      0    328    0               0      0      10       0
  Fib_SLIT3_high                                   1110     0                7       137    20     0          0      0      0     16    2               0      0      23       3
  FibSmo_ADAMDEC1_high                                1     0                0         1     0     0          0      0      0     80    0               0      0      30       1
  FibSmo_FOXF1_high                                 199     0                0         0     0     0          0      0      0      0    0               0      0       0       0
  FibSmo_LINC01082_high                             922     0                0         0     0     0          0      0      0     10    0               0      0       2       0
  FibSmo_SLC14A1_high                               841     0                1         0     0     0          0      0      0      0    0               0      0       1       0
  FibSmo_THBS4_high                                   1     0                0         2     0     0          0      0      0    731    0               0      0       4       1
  Follicular Epithelial Cell                          0     0                0         1     0     0          0      0      0      0  645               0      0       0       0
  Goblet Cell                                         0     0                0         0     0     0          0      0      0    205    0              47      0       9      22
  Granular Epithelial Cell                            0     0                0         0     0     0          0      0      0      0  224               0      0       0       0
  Hepatic Oval Cell                                   0     0                4         0     0    48          0      0      0      0    0               0      0       0       0
  High Proliferation Epithelial Cell TOP2A_high       0     0                0         0     0     0          0      0      0      0  229               1      0       0       0
  High Proliferation Epithelial Cell UBE2C_high       0     0                1       416     0     0          0      0      0      0    6               0      0       0       2
  IEL_TMIGD2                                          0     0                0         0     0     0          0      0      0      4    0               1      0     443       0
  IEL_TRBV7-3                                         0     0                1         0     0     0          0      0      0      0    0               0      0       0     243
  Intermediate_Mon_CCL20                              0     0                0         0     0     0          0      0      0    232    0               0      0       1       3
  Intermediate_Mon_FN1                               33     0                2         1     9     7          0      0      4    432    0               4      1       4      50
  Langerhans                                          0     0                0         0     0     0          0      0      0      0   72               0      0       0       0
  LEC_CCL21                                          14     0                0        48     3    13          0      0      0      5   26               0      0       0      13
  LEC_FCN3                                            0     0                0         0     0   129          0      0      0      0    0               0      0       0       0
  Mac_APOE                                            5     0               59         0     0     3          0      0      0      0    1               0      0       1       5
  Mac_FTL                                            15     2                4         2     3     5          0      2      1      1    0               2      4       0       5
  Mac_HIST1H4C                                       10     0                2        12    11     0          0      0      1    110    2               0      0       0       1
  Mac_IGFBP7                                          2     0                1         3    44     0          0      0     12      0    0               0      0       0       0
  Mac_RNASE1                                        153     0                5       130   284     4          1      0     12     54   24               0      0       1      10
  Mac_SDC3                                           23     0                2         3     3   385          3      6     11      1    5               0     99       2       5
  Mac_SPARCL1                                         3     0                0         4    90     2          0      0     23      1    1               0      0       0       2
  Mac_SPP1                                           28     0                3         1    83    12          0      0      2      9    0               0      0       0      31
  MAIT_SLC4A10                                        4     0                7         0     0    53         17      0      0      7    1               7    126      23       2
  Memory_B_AC079767.4                                 0     3                0         1     0     1         36     11      0      6    0               7    292      27       3
  Memory_B_CRIP1                                      3     5                0         0     0     1         12      9      0    462    0              64    134     680      38
  Memory_B_HSPA8                                     13     1                0         0     0    25       1579     11      0     17    0               2     86      14     104
  Memory_B_LTB                                        1   173                0         0     0     5         43    376      0      4    0               3    127      49       4
  Memory_B_TNF                                       12     0                0         0     0     1        826      2      0      8    0               1     15       7      38
  Naive_B_ly_TCL1A                                    7     3                0         0     0    14       1992     10      0     12    0               3    162      30      20
  Naive_B_TCL1A                                       1    12                0         0     0     0         59     44      0     22    0              11    343     221       1
  Non_Classicial_Mon_FCGR3A                          57    11                3         4    13   152          0     85     68      5    6               0     20      11      12
  Pit Mucosal Epithelial Cell                         1     0                0         0     0     0          0      0      0      0    0               0      0     446       0
  Plasma_HIST1H4C                                     0    17                0         0     0     0          5     20      0      1    0               0    155       8       1
  Plasma_IGHG1                                        1     7                0         2     0     2          2     12      0      3    0               0    122      14       2
  Plasma_IGKV1-39                                     0     0                0         3     0     0          2      2      0     72    0               1      6      59       2
  Plasma_IGKV3-15                                     0     1                0         0     0     0          1      1      0      1    0               2      4      71       2
  Plasma_IGKV3-20                                     1     4                0        55     0     9         47      5      0    268    0              20     64     162      20
  Plasma_IGLC2                                        0     1                0        36     0     1          5      2      0    249    0               3      6      54       5
  Plasma_IGLV3-1                                      0     0                0         0     0     2          3      0      0      3    0               7      6     147       2
  Secretory Epithelial Cell                           0     0                0        13     0     0          0      0      0      0    0               0      0       0     311
  Simple Epithelial Cell                              0     0                1         0     0     0          0      0      0    535    0               0      0       1       1
  Smo_CCL21_high                                    122     0               15       109    96     1          0      0      9      8  298               0      0       5      94
  Smo_CREM_high                                      11     0               36         3     4    22          0      0      0    110    3               0      0       5       4
  Smo_FABP4_high                                     21     0                2         6   150     0          0      0    460      0    1               0      0       0       0
  Smo_MYH11_high                                    154     0                8       232   650     7          0      0    122     44  247               0      0       1     105
  Spinous Epithelial Cell DMKN_high                   0     0                0         0     0     0          0      0      0      0  887               1      0       0       0
  Spinous Epithelial Cell KRT1_high                   0     0                0         0     0     0          0      0      0      0  464               0      0       0       1
  Spinous Epithelial Cell KRTDAP_high                 0     0                0         0     0     0          0      0      0      0  785               0      0       0       0
  Squamous Epithelial Cell DST_high                   0     0                0       983     0     0          0      0      0      0    0               0      0       0       0
  Squamous Epithelial Cell FABP5_high                 0     0                0       667     0     0          0      0      0      0    0               0      0       0       0
  Squamous Epithelial Cell HSPA1A_high                0     0                0       422     0     0          0      0      0      0    2               0      0       0       0
  Squamous Epithelial Cell KRT13_high                 0     0                0       964     0     0          0      0      0      0    0               0      0       0       0
  Squamous Epithelial Cell KRT4_high                  0     0                0       653     0     0          0      0      0      0    0               0      0       0       0
  Stem Cell                                           0     0                1         0     0     0          0      0      0      0   39              62      0       0       0
  TCM_KLF2                                            8     6                0         0     4     3         12      8      9    181    1               7    705     170      14
  TCM_STMN1                                          18     0                6         1     0    14       1341      1      0      2    0               0     42       1      36
  TEFF_GNLY                                           0    20                0         0     0    12          0    196      8      1    1               0      6       1       3
  TEFF_MT1E                                          37     0               33         3     1    38          3      1     95     24    3               1     21       5      41
  TEFF_TRBV4-2                                        4     3                6         1     1    48          0    186      8     11    1               0    167      11       2
  TEM_GIMAP4                                          3    43                9         0     2    59          9    654      3      2    0               3     26       2       2
  TEM_GZMK                                           10    69                7         0     8    50         37    246      5    119    7               2    495     121      14
  TEM_INFG                                           85     0               47         0    11   329        329      1      6      1    2               0     35       8      46
  Th1_NKG7                                           35     8               24         2     1    29          2     45     61      2    4               0     38       0      31
  TN_LINC00861                                        0   346                0         0     0    11         22    186      2      1    0               2     40       3       1
  TN_SELL                                             0   210                0         0     0     3          9    109      0      0    0               0     15       0       0
  TN/CM_GADD45B                                       1     0                9         0     0    11        179      0      0      0    0               0      2       2       3
  TN/CM_ITGB1                                         8   283               13         0     7    11         17    225      1      3    0               1     40       3       5
  TN/CM_KLF2                                          0     0                0         0     0     1         10      1      2     33    1               0    260       8       1
  TN/CM_LEF1                                          8     0                2         0     0    12        582      0      0      0    0               0     15       0       7
  Treg_CTLA4                                          3     1                5         0     1     3         64      0      1      9    1               1     27     139      19
  TRM_GZMB                                            1     0                1         0     6     3          0      0      0     78    0              23      6     875       3
  TRM_H2AFZ                                         882     2               32         7    12    16         88      1      3      6    5               0      9       6      44
  TRM_HSPA1A                                         59     0             1072         6    11    18          4      2      2      9    3               5      0      10      43
  TRM_LMNA                                            1     0                0         0     1     4          1      0      0    374    0              13     15     113       9
  TRM_MT1E                                           42     1                1         1     0    20          0      0     22    189    0               0      2       2       3
  TRM_MT1X                                            8     0                1         0     0     1          0      0      0    300    0               4      5      51       0
  TRM_NABP1                                           1     0               54         0     0     0          1      0      0      0    2               0      0       0       2
  TRM_PRR4                                           13     0                1        19     5     3          2      0      4      0    2               0      2       7     894
  TRM_RGS1                                            0     0                1         0     0     1          1      0      0     10    0             565      8      91      18
  TRM_TNF                                           362     8              202        12    14    89        290      0     22     11   18               0     46       2     171
  TRM_TYMS                                            1     0                1         6     2    46          1      3      1      3    0               4      9       1       1
  Tuft Cell                                           0     0                0         0     0     0          0      0      0      2    0             221      0       5       0

6.5 Bonus

To compare the two different approaches, try preparing both a SingleCellExperiment or a Seurat object from scratch, using the matrix files generated in the previous lab. Read the documentation of the two related packages to understand how to do this.
This will be extensively covered in the next lab for everybody.

6.6 Session info