A Caltech Library Service

Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling

Gorin, Gennady and Pachter, Lior (2021) Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling. Biophysical Journal, 120 (3). 81A. ISSN 0006-3495. doi:10.1016/j.bpj.2020.11.706.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


Recent experimental advances in single-cell RNA sequencing (scRNA-seq) have enabled the quantification of transcriptomes with single-molecule resolution. However, thus far, the stochastic modeling of transcription has been separate from the discussion of the statistics of the sequencing process, leading to simplifications that may obfuscate transcriptional dynamics, and technical artifacts in the assays. For example, imputation, normalization, and smoothing, used to correct for stochastic sequencing phenomena, make experimental molecule count data incompatible with a discrete representation, thus rendering the data uninterpretable in the context of conventional Chemical Master Equation (CME) models. Models of gene expression - such as the negative binomial count model - are used with limited physical justification, whereas models for multimodal data are under-explored. Conversely, more detailed CME descriptions of gene expression do not directly address the complexities of the sequencing process. We demonstrate that modeling both phenomena reveals a pervasive gene length-based effect in the detection of unspliced mRNA: long genes are substantially more likely to have higher average unspliced mRNA expression. To explain this effect, we build a stochastic model that accounts for physiological and experimental events, and jointly infer hundreds of gene-specific as well as transcriptome-wide parameters. Specifically, we extend a joint model of mRNA processing described by Singh and Bokes (Biophys. J., 2012) to incorporate downstream Poisson sampling, representing cDNA library construction and sequencing. The explicit inclusion of sampling yields mechanistically interpretable results for the gene expression parameters, and suggests extensions to more complex models.

Item Type:Article
Related URLs:
URLURL TypeDescription
Gorin, Gennady0000-0001-6097-2029
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2021 Biophysical Society. Available online 12 February 2021.
Issue or Number:3
Record Number:CaltechAUTHORS:20210503-100056268
Persistent URL:
Official Citation:Gennady Gorin, Lior Pachter, Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling, Biophysical Journal, Volume 120, Issue 3, Supplement 1, 2021, Page 81a, ISSN 0006-3495, (
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108918
Deposited By: Tony Diaz
Deposited On:03 May 2021 17:56
Last Modified:03 May 2021 17:56

Repository Staff Only: item control page