A Caltech Library Service

One read per cell per gene is optimal for single-cell RNA-Seq

Zhang, M. J. and Ntranos, V. and Tse, D. (2018) One read per cell per gene is optimal for single-cell RNA-Seq. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.


Use this Persistent URL to link to this item:


An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? A mathematical framework reveals that, for estimating many important gene properties, the optimal allocation is to sequence at the depth of one read per cell per gene. Interestingly, the corresponding optimal estimator is not the widely-used plug-in estimator but one developed via empirical Bayes.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Ntranos, V.0000-0002-2477-0670
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. This research was in part motivated by discussions on the experimental design question in the Human Cell Atlas First Annual Jamboree meeting. We would like to thank Lior Pachter for his valuable input and constructive suggestions throughout the course of this study; Jase Gehring, Wenying Pan, and Taibo Li for their helpful feedback. Thanks also to Patrick Marks for very useful feedback on an earlier version of the paper. MZ is partially supported by Stanford Graduate Fellowship. Author Contributions: All authors contributed extensively to the work presented in this paper. Code Availability: We developed the python package sceb (single-cell empirical Bayes) for the EB estimators used in this paper (available on PyPI). The code to reproduce all experiments and generate the figures presented in this paper can be found at Data Availability The datasets that we use are were generated by 10x Genomics’ v2 chemistry [17]. pbmc_4k, pbmc_8k contain peripheral blood mononuclear cells (PBMCs) from a healthy donor (the same donor). brain_1k, brain_2k, brain_9k, brain_1.3m contain cells from a combined cortex, hippocampus and sub ventricular zone of an E18 mouse. The pair 293T_1k, 3T3_1k contain 1:1 mixture of fresh frozen human (HEK293T) and mouse (NIH3T3) cells. So are the pairs 293T_6k, 3T3_6k and 293T_12k, 3T3_12k. The links of the datasets: pbmc_4k: pbmc_8k: brain_1k: brain_2k: brain_9k: brain_1.3m: 293T_1k, 3T3_1k: 293T_6k, 3T3_6k: 293T_12k, 3T3_12k: The authors declare that they have no competing financial interests.
Funding AgencyGrant Number
Stanford UniversityUNSPECIFIED
Record Number:CaltechAUTHORS:20180927-114223910
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:90001
Deposited By: George Porter
Deposited On:27 Sep 2018 22:57
Last Modified:16 Nov 2021 00:40

Repository Staff Only: item control page