A Caltech Library Service

Length Biases in Single-Cell RNA Sequencing of pre-mRNA

Gorin, Gennady and Pachter, Lior (2021) Length Biases in Single-Cell RNA Sequencing of pre-mRNA. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution.

[img] Archive (ZIP) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Single-molecule pre-mRNA and mRNA sequencing data can be modeled and analyzed using the Markov chain formalism to yield genome-wide insights into transcription. However, quantitative inference with such data requires careful assessment and understanding of noise sources. We find that long pre-mRNA transcripts are over-represented in sequencing data, and explore the mechanistic implications. A biological explanation for this phenomenon within our modeling framework requires unrealistic transcriptional parameters, leading us to posit a length-based model of capture bias. We provide solutions for this model, and use them to find concordant and mechanistically plausible parameter trends across data from multiple single-cell RNA-seq experiments in several species.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemCode ItemData ItemData
Gorin, Gennady0000-0001-6097-2029
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Posted July 31, 2021. G.G. and L.P. are partially funded by NIH U19MH114830. The DNA and RNA illustrations used in Figures 1 and 2 are derived from the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0. Data and code availability: contains a Python notebook that can be used to reproduce the figures, as well as a sample notebook that applies the computational pipeline to a 10X PBMC dataset. The same repository contains all scripts used to make references, download datasets, quantify transcripts, and process the resulting loom files through the inference pipeline. The raw loom files and all search results are deposited in the CaltechDATA repository [62, 63].
Funding AgencyGrant Number
Record Number:CaltechAUTHORS:20210802-221611892
Persistent URL:
Official Citation:Length Biases in Single-Cell RNA Sequencing of pre-mRNA. Gennady Gorin, Lior Pachter. bioRxiv 2021.07.30.454514; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:110118
Deposited By: Tony Diaz
Deposited On:02 Aug 2021 22:24
Last Modified:16 Nov 2021 19:39

Repository Staff Only: item control page