A Caltech Library Service

Determining sequencing depth in a single-cell RNA-seq experiment

Zhang, Martin Jinye and Ntranos, Vasilis and Tse, David (2020) Determining sequencing depth in a single-cell RNA-seq experiment. Nature Communications, 11 . Art. No. 774. ISSN 2041-1723. PMCID PMC7005864. doi:10.1038/s41467-020-14482-y.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.

[img] PDF (Reporting Summary) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? Here we present a mathematical framework which reveals that, for estimating many important gene properties, the optimal allocation is to sequence at a depth of around one read per cell per gene. Interestingly, the corresponding optimal estimator is not the widely-used plug-in estimator, but one developed via empirical Bayes.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle ItemCode
Zhang, Martin Jinye0000-0003-0006-2466
Ntranos, Vasilis0000-0002-2477-0670
Additional Information:© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received: 6 September 2018; Accepted: 13 December 2019. Published online: 07 February 2020. This research was in part motivated by discussions on the experimental design question in the Human Cell Atlas First Annual Jamboree meeting. We thank Lior Pachter for his valuable input and constructive suggestions throughout the course of this study; Jase Gehring, Wenying Pan, and Taibo Li for their helpful feedback; and Dominic Gr�n for providing the smFISH data corresponding to the CEL-seq data. Thanks also to Patrick Marks for very useful feedback on an earlier version of the paper. D.T. and M.J.Z. are supported in part by the Center of Science of Information, an NSF Science and Technology Center, under grant agreement CCF-0939370 and in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG008164. M.J.Z. is also supported by a Stanford Graduate Fellowship (Inventec Fellow). V.N. is supported in part by the Center for Science of Information and in part by a gift from Qualcomm Inc. These authors contributed equally: Martin Jinye Zhang, Vasilis Ntranos. Author Contributions: M.J.Z. and V.N. conceived the idea and performed the empirical experiments. M.J.Z. performed the theoretical analysis. M.J.Z., V.N. and D.T. wrote the manuscript. D.T. supervised the research. All authors reviewed the manuscript. Data availability: The 10× datasets were generated by 10x Genomics’ v2 chemistry22. They are publicly available and can be downloaded via the following links: pbmc_4k: pbmc_8k: brain_1k: brain_2k: brain_9k: brain_1.3m: 293T_1k, 3T3_1k: 293T_6k, 3T3_6k: 293T_12k, 3T3_12k: We note that pbmc_4k and pbmc_8k are from the same donor; brain_1k and brain_9k are also from the same donor. Also, the following pairs of datasets are sequenced together: 293T_1k and 3T3_1k, 293T_6k and 3T3_6k, 293T_12k and 3T3_12k. These six datasets are from the same biological sample. The Drop-seq dataset and the corresponding smFISH data can be found from the original paper15 or a recent paper that analyzed the dataset16. The CEL-seq data can be found from the original paper27. the smFISH data accompany the CEL-seq can be obtained by contacting the author. The three ERCC datasets (Zheng, Klein, Svensson) can be found in a recent paper that analyzed the data set16, where we have used the 2 × (control RNA + ERCC) data in the Svensson et al.52 paper. The Klein dataset with the pure RNA controls (the Klein ERCC dataset being part of it) can be found from the original paper24. The data for sensitivity analysis (Supplementary Figs. 18–19) can be found from the original paper53. Code availability: We developed the python package sceb (single-cell empirical Bayes) for the EB estimators used in this paper (available on PyPI). The code to reproduce all experiments and generate the figures presented in this paper can be found at The authors declare no competing interests. Peer review information: Nature Communications thanks Jay West and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Funding AgencyGrant Number
Stanford UniversityUNSPECIFIED
PubMed Central ID:PMC7005864
Record Number:CaltechAUTHORS:20200218-151249832
Persistent URL:
Official Citation:Zhang, M.J., Ntranos, V. & Tse, D. Determining sequencing depth in a single-cell RNA-seq experiment. Nat Commun 11, 774 (2020).
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:101347
Deposited By: George Porter
Deposited On:18 Feb 2020 23:43
Last Modified:16 Nov 2021 18:01

Repository Staff Only: item control page