Svensson, Valentine and da Veiga Beltrame, Eduardo and Pachter, Lior (2019) Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20190910-074005263
![]() |
PDF
- Submitted Version
See Usage Policy. 5MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190910-074005263
Abstract
The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/.
Item Type: | Report or Paper (Discussion Paper) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||
ORCID: |
| ||||||||||||
Additional Information: | The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors. bioRxiv preprint first posted online Sep. 9, 2019. Code availability: A Snakemake [20] file used to subsample and process the data, together with Python notebooks used for downstream analyses are available on GitHub at https://github.com/pachterlab/SBP_2019/. Scripts and notebooks used to create the figures and results, together with gene count matrices outputted by kallisto bus and H5AD files with the UMI counts for all the subsampled read depths are available on CaltechDATA (https://doi.org/10.22002/d1.1276). Author contributions: V.S. designed the evaluation metric and performed statistical analysis. E.V.B. performed data processing and subsampling. V.S., E.V.B., and L.P. interpreted results and wrote the manuscript. The authors want to thank Romain Lopez for helpful feedback on the manuscript. V.S. and L.P. were funded in part by NIH U19MH114830. | ||||||||||||
Funders: |
| ||||||||||||
DOI: | 10.1101/762773 | ||||||||||||
Record Number: | CaltechAUTHORS:20190910-074005263 | ||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20190910-074005263 | ||||||||||||
Official Citation: | Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. Valentine Svensson, Eduardo da Veiga Beltrame, Lior Pachter. bioRxiv 762773; doi: https://doi.org/10.1101/762773 | ||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||
ID Code: | 98536 | ||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||
Deposited By: | Tony Diaz | ||||||||||||
Deposited On: | 10 Sep 2019 16:26 | ||||||||||||
Last Modified: | 16 Nov 2021 17:39 |
Repository Staff Only: item control page