bash
cat ~/Share/data_wrangling/cellranger-tiny-bcl-simple-1.2.0.csvLane,Sample,Index
1,test_sample,SI-GA-E3
The estimated time for this lab is around 1h15.
cellranger mkfastq.cellranger count.cellranger mkfastq
Navigate to your terminal in RStudio on AWS.
Go to the cellranger mkfastq page and read the Overview.
For this workshop, we are going to look at a toy dataset provided by 10x Genomics. The dataset is cellranger-tiny-bcl-1.2.0, provided in the ~/Share/data_wrangling/ folder. With the cellranger-tiny-bcl-1.2.0 dataset, we have a samplesheet file cellranger-tiny-bcl-simple-1.2.0.csv that contains the sample information.
Go to the Terminal tab in your RStudio and take a look at the 10x samplesheet .csv file
bash
cat ~/Share/data_wrangling/cellranger-tiny-bcl-simple-1.2.0.csvLane,Sample,Index
1,test_sample,SI-GA-E3
Next, explore the contents of the sequencing directory:
bash
ls -l ~/Share/data_wrangling/cellranger-tiny-bcl-1.2.0total 28
drwxr-xr-x 2 rsg rsg 4096 Oct 29 2024 Config
drwxr-xr-x 3 rsg rsg 4096 Oct 29 2024 Data
drwxr-xr-x 2 rsg rsg 4096 Oct 29 2024 InterOp
-rw-r--r-- 1 rsg rsg 48 Oct 29 2024 RTAComplete.txt
-rw-r--r-- 1 rsg rsg 673 Oct 29 2024 RunInfo.xml
-rw-r--r-- 1 rsg rsg 5366 Oct 29 2024 runParameters.xml
Now we can demultiplex our bcl files by running the following command in the terminal. To do that, we will use the cellranger mkfastq command.
bash
cellranger mkfastq --help/opt/cellranger/bin
cellranger mkfastq (cellranger-5.0.1)
Copyright (c) 2020 10x Genomics, Inc. All rights reserved.
-------------------------------------------------------------------------------
Run Illumina demultiplexer on sample sheets that contain 10x-specific sample
index sets, and generate 10x-specific quality metrics after the demultiplex.
Any bcl2fastq argument will work (except a few that are set by the pipeline
to ensure proper trimming and sample indexing). The FASTQ output generated
will be the same as when running bcl2fastq directly.
These bcl2fastq arguments are overridden by this pipeline:
--fastq-cluster-count
--minimum-trimmed-read-length
--mask-short-adapter-reads
Usage:
cellranger mkfastq --run=PATH [options]
cellranger mkfastq -h | --help | --version
Required:
--run=PATH Path of Illumina BCL run folder.
Optional:
# Sample Sheet
--id=NAME Name of the folder created by mkfastq. If not supplied,
will default to the name of the flowcell referred to
by the --run argument.
--csv=PATH
--samplesheet=PATH
--sample-sheet=PATH
Path to the sample sheet. The sample sheet can either be
a simple CSV with lane, sample and index columns, or
an Illumina Experiment Manager-compatible sample
sheet. Sample sheet indexes can refer to 10x sample
index set names (e.g., SI-GA-A12).
--simple-csv=PATH Deprecated. Same meaning as --csv.
--qc Deprecated: will be removed in a future version.
Calculate both sequencing and 10x-specific metrics,
including per-sample barcode matching rate. Will not
be performed unless this flag is specified.
--force-single-index
If 10x-supplied i7/i5 paired indices are specified,
but the flowcell was run with only one sample
index, allow the demultiplex to proceed using
the i7 half of the sample index pair.
--filter-single-index
Only demultiplex samples identified
by an i7-only sample index, ignoring dual-indexed
samples. Dual-indexed samples will not be
demultiplexed.
--filter-dual-index
Only demultiplex samples identified
by i7/i5 dual-indices (e.g., SI-TT-A6), ignoring single-
index samples. Single-index samples will not be
demultiplexed.
--rc-i2-override=BOOL
Indicates if the bases in the I2 read are emitted as
reverse complement by the sequencing workflow.
Set to 'true' for Workflow A / NovaSeq Reagent Kit v1.5
or greater. Set to 'false' for Wokflow B / older NovaSeq
Reagent Kits. NOTE: this parameter is autodetected based
and should only be passed in special circumstances.
# bcl2fastq Pass-Through
--lanes=NUMS Comma-delimited series of lanes to demultiplex. Shortcut
for the --tiles argument.
--use-bases-mask=MASK
Same as bcl2fastq; override the read lengths as
specified in RunInfo.xml. See Illumina bcl2fastq
documentation for more information.
--delete-undetermined
Delete the Undetermined FASTQ files left by bcl2fastq
Useful if your sample sheet is only expected to
match a subset of the flowcell.
--output-dir=PATH Same as in bcl2fastq. Folder where FASTQs, reports and
stats will be generated.
--project=NAME Custom project name, to override the samplesheet or to
use in conjunction with the --csv argument.
# Martian Runtime
--jobmode=MODE Job manager to use. Valid options: local (default), sge,
lsf, or a .template file
--localcores=NUM Set max cores the pipeline may request at one time. Only
applies to local jobs.
--localmem=NUM Set max GB the pipeline may request at one time. Only
applies to local jobs.
--localvmem=NUM Set max virtual address space in GB for the pipeline.
Only applies to local jobs.
--mempercore=NUM Reserve enough threads for each job to ensure enough
memory will be available, assuming each core on your
cluster has at least this much memory available. Only
applies in cluster jobmodes.
--maxjobs=NUM Set max jobs submitted to cluster at one time. Only
applies in cluster jobmodes.
--jobinterval=NUM Set delay between submitting jobs to cluster, in ms.
Only applies in cluster jobmodes.
--overrides=PATH The path to a JSON file that specifies stage-level
overrides for cores and memory. Finer-grained
than --localcores, --mempercore and --localmem.
Consult the 10x support website for an example
override file.
--uiport=PORT Serve web UI at http://localhost:PORT
--disable-ui Do not serve the UI.
--noexit Keep web UI running after pipestance completes or fails.
--nopreflight Skip preflight checks.
-h --help Show this message.
--version Show version.
If you demultiplexed with 'cellranger mkfastq' or directly with Illumina
bcl2fastq, then set --fastqs to the project folder containing FASTQ files. In
addition, set --sample to the name prefixed to the FASTQ files comprising
your sample. For example, if your FASTQs are named:
subject1_S1_L001_R1_001.fastq.gz
then set --sample=subject1
What is the purpose of --id, --run, and --csv arguments ?
Choose these arguments wisely and execute the command to demultiplex the sequencing data.
bash
cellranger mkfastq --id tiny-bcl --run ~/Share/data_wrangling/cellranger-tiny-bcl-1.2.0 --csv ~/Share/data_wrangling/cellranger-tiny-bcl-simple-1.2.0.csv/opt/gensoft/exe/cellranger/7.1.0/libexec/bin
cellranger mkfastq (cellranger-7.1.0)
Copyright (c) 2021 10x Genomics, Inc. All rights reserved.
-------------------------------------------------------------------------------
Martian Runtime - v4.0.10
2025-11-13 18:54:39 [runtime] Reattaching in local mode.
Serving UI at http://maestro-submit:33167?auth=UT6bG4dNCAFoyY5KH9jUeLvbk0g-HjGeeSnZgLYGHok
2025-11-13 18:54:39 [runtime] (reset-partial) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_FASTQS_PREFLIGHT_LOCAL.fork0.chnk0
2025-11-13 18:54:39 [runtime] Found orphaned local stage: ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MAKE_FASTQS_PREFLIGHT_LOCAL
Checking run folder...
Checking RunInfo.xml...
Checking system environment...
Emitting run information...
Checking read specification...
Checking samplesheet specs...
2025-11-13 18:54:42 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2025-11-13 18:54:42 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET.fork0.chnk0.main
2025-11-13 18:54:44 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.PREPARE_SAMPLESHEET
2025-11-13 18:54:44 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2025-11-13 18:54:44 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.split
2025-11-13 18:54:45 [runtime] (split_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2025-11-13 18:54:45 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.chnk0.main
2025-11-13 18:54:49 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2025-11-13 18:54:49 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET.fork0.join
2025-11-13 18:54:50 [runtime] (join_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.BCL2FASTQ_WITH_SAMPLESHEET
2025-11-13 18:54:50 [runtime] (ready) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE
2025-11-13 18:54:50 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.split
2025-11-13 18:54:52 [runtime] (split_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE
2025-11-13 18:54:52 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.chnk0.main
2025-11-13 18:54:54 [runtime] (chunks_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE
2025-11-13 18:54:54 [runtime] (run:local) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE.fork0.join
2025-11-13 18:54:56 [runtime] (join_complete) ID.tiny-bcl.MAKE_FASTQS_CS.MAKE_FASTQS.MERGE_FASTQS_BY_LANE_SAMPLE
Outputs:
- FASTQ output folder: /pasteur/appa/homes/jaseriza/tiny-bcl/outs/fastq_path
- Interop output folder: /pasteur/appa/homes/jaseriza/tiny-bcl/outs/interop_path
- Input samplesheet: /pasteur/appa/homes/jaseriza/tiny-bcl/outs/input_samplesheet.csv
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
2024-10-29 11:43:06 Shutting down.
Saving pipestance info to "tiny-bcl/tiny-bcl.mri.tgz"
The output folders can be viewed by running the ls command:
bash
ls -l ~/Share/data_wrangling/tiny-bcl/outs/fastq_path/H35KCBCXY/test_sampletotal 48156
-rw-r--r-- 1 rsg rsg 3930346 Oct 29 2024 test_sample_S1_L001_I1_001.fastq.gz
-rw-r--r-- 1 rsg rsg 11093755 Oct 29 2024 test_sample_S1_L001_R1_001.fastq.gz
-rw-r--r-- 1 rsg rsg 34273874 Oct 29 2024 test_sample_S1_L001_R2_001.fastq.gz
Look at the first reads listed in index read (I1), read 1 (R1), and read (R2) files.
What is the purpose of the index read? What does it contain?
bash
zcat ~/Share/data_wrangling/tiny-bcl/outs/fastq_path/H35KCBCXY/test_sample/test_sample_S1_L001_I1_001.fastq.gz | head -n 12@D00547:905:H35KCBCXY:1:1101:1284:1973 1:N:0:AGGTATTG
AGGTATTG
+
GGGGGIII
@D00547:905:H35KCBCXY:1:1101:1729:1981 1:N:0:AGGTATTG
AGGTATTG
+
GAGGGIIG
@D00547:905:H35KCBCXY:1:1101:2005:1944 1:N:0:AGGTATTG
AGGTATTG
+
AA<G.A.G
@D00547:905:H35KCBCXY:1:1101:1284:1973 1:N:0:AGGTATTG
AGGTATTG
+
GGGGGIII
@D00547:905:H35KCBCXY:1:1101:1729:1981 1:N:0:AGGTATTG
AGGTATTG
+
GAGGGIIG
@D00547:905:H35KCBCXY:1:1101:2005:1944 1:N:0:AGGTATTG
AGGTATTG
+
AA<G.A.G
Open the html file tiny-bcl/outs/fastq_path/Reports/html/index.html by navigating to the file in RStudio, using the Files Tab. When you click on the file, select the option to View in Web Browser. Take some time to explore the demultiplexed outputs.
cellranger count
Go to the cellranger count algorithm overview and read the section on Alignment (Read Trimming, Genome Alignment, MAPQ adjustment, Transcriptome Alignment, UMI Counting).
In bash, the cellranger count command is used to generate gene count matrices from the demultiplexed reads.
bash
cellranger count --helpcellranger-count
Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well
USAGE:
cellranger count [FLAGS] [OPTIONS] --id <ID> --transcriptome <PATH>
FLAGS:
--no-target-umi-filter Turn off the target UMI filtering subpipeline
--no-bam Do not generate a bam file
--nosecondary Disable secondary analysis, e.g. clustering. Optional
--include-introns Include intronic reads in count
--no-libraries Proceed with processing using a --feature-ref but no Feature Barcode libraries specified with the 'libraries' flag
--dry Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
--disable-ui Do not serve the web UI
--noexit Keep web UI running after pipestance completes or fails
--nopreflight Skip preflight checks
-h, --help Prints help information
OPTIONS:
--id <ID> A unique run id and output folder name [a-zA-Z0-9_-]+
--description <TEXT> Sample description to embed in output files
--transcriptome <PATH> Path of folder containing 10x-compatible transcriptome reference
--fastqs <PATH>... Path to input FASTQ data
--project <TEXT> Name of the project folder within a mkfastq or bcl2fastq-generated folder to pick FASTQs from
--sample <PREFIX>... Prefix of the filenames of FASTQs to select
--lanes <NUMS>... Only use FASTQs from selected lanes
--libraries <CSV> CSV file declaring input library data sources
--feature-ref <CSV> Feature reference CSV file, declaring Feature Barcode constructs and associated barcodes
--target-panel <CSV> The target panel CSV file declaring the target panel used, if any
--expect-cells <NUM> Expected number of recovered cells
--force-cells <NUM> Force pipeline to use this number of cells, bypassing cell detection
--r1-length <NUM> Hard trim the input Read 1 to this length before analysis
--r2-length <NUM> Hard trim the input Read 2 to this length before analysis
--chemistry <CHEM> Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommened mode. You usually will not
need to specify a chemistry. Options are: 'auto' for autodetection, 'threeprime' for Single Cell 3', 'fiveprime' for Single Cell 5',
'SC3Pv1' or 'SC3Pv2' or 'SC3Pv3' for Single Cell 3' v1/v2/v3, 'SC5P-PE' or 'SC5P-R2' for Single Cell 5', paired-end/R2-only, 'SC-FB' for
Single Cell Antibody-only 3' v2 or 5' [default: auto]
--jobmode <MODE> Job manager to use. Valid options: local (default), sge, lsf, slurm or a .template file. Search for help on "Cluster Mode" at
support.10xgenomics.com for more details on configuring the pipeline to use a compute cluster [default: local]
--localcores <NUM> Set max cores the pipeline may request at one time. Only applies to local jobs
--localmem <NUM> Set max GB the pipeline may request at one time. Only applies to local jobs
--localvmem <NUM> Set max virtual address space in GB for the pipeline. Only applies to local jobs
--mempercore <NUM> Reserve enough threads for each job to ensure enough memory will be available, assuming each core on your cluster has at least this much
memory available. Only applies in cluster jobmodes
--maxjobs <NUM> Set max jobs submitted to cluster at one time. Only applies in cluster jobmodes
--jobinterval <NUM> Set delay between submitting jobs to cluster, in ms. Only applies in cluster jobmodes
--overrides <PATH> The path to a JSON file that specifies stage-level overrides for cores and memory. Finer-grained than --localcores, --mempercore and
--localmem. Consult the 10x support website for an example override file
--uiport <PORT> Serve web UI at http://localhost:PORT
Now, we will generate a gene count matrix for the test_sample reads using cellranger count and the transcriptome reference provided in ~/Share/refdata-gex-mm10-2020-A/
What is the purpose of the --id, --transcriptome, --fastqs, --sample, and --create-bam arguments?
bash
cellranger count --id counts --transcriptome ~/Share/refdata-gex-mm10-2020-A/ --fastqs tiny-bcl/outs/fastq_path/H35KCBCXY/test_sample --sample test_sample --create-bam trueMartian Runtime - v4.0.10
2025-11-13 19:17:49 [runtime] Reattaching in local mode.
Serving UI at http://maestro-submit:44641?auth=o3OFtia-5rPZ9nn37zNcvPNwtZ0XLSszXeGPSzNA2Ds
2025-11-13 19:17:49 [runtime] (reset-partial) ID.counts.SC_RNA_COUNTER_CS.CELLRANGER_PREFLIGHT.fork0.chnk0
2025-11-13 19:17:49 [runtime] (reset-partial) ID.counts.SC_RNA_COUNTER_CS.CELLRANGER_PREFLIGHT_LOCAL.fork0.chnk0
2025-11-13 19:17:49 [runtime] Found orphaned local stage: ID.counts.SC_RNA_COUNTER_CS.CELLRANGER_PREFLIGHT_LOCAL
2025-11-13 19:17:49 [runtime] Found orphaned local stage: ID.counts.SC_RNA_COUNTER_CS.CELLRANGER_PREFLIGHT
Checking sample info...
Checking FASTQ folder...
Checking reference...
Checking reference_path (/pasteur/appa/scratch/jaseriza/Share/refdata-gex-mm10-2020-A) on maestro-submit...
Checking optional arguments...
mro: v4.0.10
mrp: v4.0.10
Anaconda: Python 3.9.7
numpy: 1.22.3
scipy: 1.9.1
pysam: 0.17.0
h5py: 3.6.0
pandas: 1.3.5
STAR: 2.7.2a
samtools: samtools 1.12
2025-11-13 19:17:53 [runtime] (ready) ID.counts.SC_RNA_COUNTER_CS.WRITE_GENE_INDEX
2025-11-13 19:17:53 [runtime] (run:local) ID.counts.SC_RNA_COUNTER_CS.WRITE_GENE_INDEX.fork0.chnk0.main
2025-11-13 19:17:53 [runtime] (ready) ID.counts.SC_RNA_COUNTER_CS.SC_MULTI_CORE.MAKE_FULL_CONFIG._MAKE_VDJ_CONFIG
...
Outputs:
- Run summary HTML: /home/rsg/counts/outs/web_summary.html
- Run summary CSV: /home/rsg/counts/outs/metrics_summary.csv
- BAM: /home/rsg/counts/outs/possorted_genome_bam.bam
- BAM BAI index: /home/rsg/counts/outs/possorted_genome_bam.bam.bai
- BAM CSI index: null
- Filtered feature-barcode matrices MEX: /home/rsg/counts/outs/filtered_feature_bc_matrix
- Filtered feature-barcode matrices HDF5: /home/rsg/counts/outs/filtered_feature_bc_matrix.h5
- Unfiltered feature-barcode matrices MEX: /home/rsg/counts/outs/raw_feature_bc_matrix
- Unfiltered feature-barcode matrices HDF5: /home/rsg/counts/outs/raw_feature_bc_matrix.h5
- Secondary analysis output CSV: /home/rsg/counts/outs/analysis
- Per-molecule read information: /home/rsg/counts/outs/molecule_info.h5
- CRISPR-specific analysis: null
- Antibody aggregate barcodes: null
- Loupe Browser file: /home/rsg/counts/outs/cloupe.cloupe
- Feature Reference: null
- Target Panel File: null
- Probe Set File: null
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!
2025-11-13 19:19:53 Shutting down.```
:::
:::
While the count command is running, read about the [format of the feature-barcode matrices](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices).
Once the count command is finished running, the pipeline outputs can be viewed with `ls`:
bash
ls -l ~/Share/data_wrangling/counts/outstotal 56920
drwxr-xr-x 7 rsg rsg 4096 Oct 29 2024 analysis
-rw-r--r-- 1 rsg rsg 2782024 Oct 29 2024 cloupe.cloupe
drwxr-xr-x 2 rsg rsg 4096 Oct 29 2024 filtered_feature_bc_matrix
-rw-r--r-- 1 rsg rsg 506767 Oct 29 2024 filtered_feature_bc_matrix.h5
-rw-r--r-- 1 rsg rsg 627 Oct 29 2024 metrics_summary.csv
-rw-r--r-- 1 rsg rsg 1806311 Oct 29 2024 molecule_info.h5
-rw-r--r-- 1 rsg rsg 47960294 Oct 29 2024 possorted_genome_bam.bam
-rw-r--r-- 1 rsg rsg 1744976 Oct 29 2024 possorted_genome_bam.bam.bai
drwxr-xr-x 2 rsg rsg 4096 Oct 29 2024 raw_feature_bc_matrix
-rw-r--r-- 1 rsg rsg 921409 Oct 29 2024 raw_feature_bc_matrix.h5
-rw-r--r-- 1 rsg rsg 2513806 Oct 29 2024 web_summary.html
Can you locate the feature-barcode matrices? What is the difference between the raw_feature_bc_matrix and filtered_feature_bc_matrix data types?
bash
ls -l ~/Share/data_wrangling/counts/outs/raw_feature_bc_matrix.h5
ls -l ~/Share/data_wrangling/counts/outs/filtered_feature_bc_matrix.h5How many cells and genes are stored in each matrix? Comment.
bash
# Number of genes
zcat ~/Share/data_wrangling/counts/outs/filtered_feature_bc_matrix/features.tsv.gz | wc -l
zcat ~/Share/data_wrangling/counts/outs/raw_feature_bc_matrix/features.tsv.gz | wc -l
# Number of cells
zcat ~/Share/data_wrangling/counts/outs/filtered_feature_bc_matrix/barcodes.tsv.gz | wc -l
zcat ~/Share/data_wrangling/counts/outs/raw_feature_bc_matrix/barcodes.tsv.gz | wc -lNow open the html file counts/outs/web_summary.html by navigating to the file in RStudio, using the Files Tab. When you click on the file, select the option to View in Web Browser. Take some time to explore the gene expression matrix outputs.