CaltechAUTHORS
  A Caltech Library Service

scvi-tools: a library for deep probabilistic analysis of single-cell omics data

Gayoso, Adam and Lopez, Romain and Xing, Galen and Boyeau, Pierre and Wu, Katherine and Jayasuriya, Michael and Mehlman, Edouard and Langevin, Maxime and Liu, Yining and Samaran, Jules and Misrachi, Gabriel and Nazaret, Achille and Clivio, Oscar and Xu, Chenling and Ashuach, Tal and Lotfollahi, Mohammad and Svensson, Valentine and da Veiga Beltrame, Eduardo and Talavera-López, Carlos and Pachter, Lior and Theis, Fabian J. and Streets, Aaron and Jordan, Michael I. and Regier, Jeffrey and Yosef, Nir (2021) scvi-tools: a library for deep probabilistic analysis of single-cell omics data. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20210503-142332959

[img] PDF - Submitted Version
Creative Commons Attribution.

10MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210503-142332959

Abstract

Probabilistic models have provided the underpinnings for state-of-the-art performance in many single-cell omics data analysis tasks, including dimensionality reduction, clustering, differential expression, annotation, removal of unwanted variation, and integration across modalities. Many of the models being deployed are amenable to scalable stochastic inference techniques, and accordingly they are able to process single-cell datasets of realistic and growing sizes. However, the community-wide adoption of probabilistic approaches is hindered by a fractured software ecosystem resulting in an array of packages with distinct, and often complex interfaces. To address this issue, we developed scvi-tools (https://scvi-tools.org), a Python package that implements a variety of leading probabilistic methods. These methods, which cover many fundamental analysis tasks, are accessible through a standardized, easy-to-use interface with direct links to Scanpy, Seurat, and Bioconductor workflows. By standardizing the implementations, we were able to develop and reuse novel functionalities across different models, such as support for complex study designs through nonlinear removal of unwanted variation due to multiple covariates and reference-query integration via scArches. The extensible software building blocks that underlie scvi-tools also enable a developer environment in which new probabilistic models for single cell omics can be efficiently developed, benchmarked, and deployed. We demonstrate this through a code-efficient reimplementation of Stereoscope for deconvolution of spatial transcriptomics profiles. By catering to both the end user and developer audiences, we expect scvi-tools to become an essential software dependency and serve to formulate a community standard for probabilistic modeling of single cell omics.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2021.04.28.441833DOIDiscussion Paper
ORCID:
AuthorORCID
Gayoso, Adam0000-0001-9537-0845
Lopez, Romain0000-0003-0495-738X
Xing, Galen0000-0001-7376-6312
Boyeau, Pierre0000-0003-4549-3972
Wu, Katherine0000-0001-7562-4545
Jayasuriya, Michael0000-0003-2366-841X
Mehlman, Edouard0000-0001-6351-2220
Langevin, Maxime0000-0002-5498-4661
Liu, Yining0000-0002-8779-2906
Samaran, Jules0000-0001-7317-8190
Misrachi, Gabriel0000-0002-6020-4641
Nazaret, Achille0000-0002-5428-9810
Clivio, Oscar0000-0001-8668-4535
Xu, Chenling0000-0001-9610-7627
Ashuach, Tal0000-0003-1939-0865
Lotfollahi, Mohammad0000-0001-6858-7985
Svensson, Valentine0000-0002-9217-2330
da Veiga Beltrame, Eduardo0000-0002-1529-9207
Talavera-López, Carlos0000-0001-8590-2393
Pachter, Lior0000-0002-9164-6231
Theis, Fabian J.0000-0002-2419-1943
Streets, Aaron0000-0002-3909-8389
Jordan, Michael I.0000-0001-8935-817X
Regier, Jeffrey0000-0002-1472-5235
Yosef, Nir0000-0001-9004-1225
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. This version posted April 29, 2021. We acknowledge members of the Streets and Yosef laboratories for general feedback. We thank all the GitHub users who contributed code to scvi-tools over the years. We thank Nicholas Everetts for help with the analysis of the Drosophila data. We thank David Kelley and Nick Bernstein for help with implementing Solo. We thank Marco Wagenstetter and Sergei Rybakov for help with the transition of the scGen package to use scvi-tools as well as feedback on the scArches implementation. We thank Hector Roux de Bézieux for insightful discussions about the R ecosystem. We thank Kieran Campbell and Allen Zhang for clarifying aspects of the original CellAssign implementation. We thank the Pyro team, including Eli Bingham, Martin Jankowiak, and Fritz Obermeyer, for help with integrating Pyro in scvi-tools. Research reported in this manuscript was supported by the NIGMS of the National Institutes of Health under award number R35GM124916 and by the Chan-Zuckerberg Foundation Network under grant number 2019-02452. A.G. is supported by NIH Training Grant 5T32HG000047-19. A.S. and N.Y. are Chan Zuckerberg Biohub investigators. Author contributions: A.G., R.L, and G.X. contributed equally. A.G. designed the scvi-tools application programming interface with input from G.X. and R.L. G.X. and A.G. lead development of scvi-tools with input from R.L. G.X. reimplemented scVI, totalVI, AutoZI, and scANVI with input from A.G. R.L. implemented Stereoscope with input from A.G. Data analysis in this manuscript was led by A.G., R.L., and G.X with input from N.Y. A.G, R.L, P.B, E.M, M.L, Y.L, J.S, G.M, A.N, O.C. worked on the initial version of the codebase (scvi package), with input from M.I.J, J.R and N.Y. R.L, E.M and C.X contributed the scANVI model, with input from J.R and N.Y. A.G implemented totalVI with input from A.S and N.Y. T.A. implemented peakVI with input from A.G. A.G implemented scArches with input from M.L, F.T and N.Y. V.S. made several contributions to the codebase, including the LDVAE model. P.B. contributed the differential expression programming interface. E.B and C.T.L provided tutorials on differential expression and deconvolution of spatial transcriptomics, with input from L.P. K.W implemented CellAssign in the codebase with input from A.G.. M.J. made general code contributions and helped maintain scvi-tools. N.Y. supervised all research. A.G, R.L, G.X, J.R and N.Y wrote the manuscript. Competing Interest Statement: O.C. is supported by the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning (EP/S023151/1) and Novo Nordisk. V.S. is a full-time employee of Serqet Therapuetics and has ownership interest in Serqet Therapeutics. F.J.T. reports receiving consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc.
Funders:
Funding AgencyGrant Number
NIHR35GM124916
Chan-Zuckerberg Foundation2019-02452
NIH5T32HG000047-19
DOI:10.1101/2021.04.28.441833
Record Number:CaltechAUTHORS:20210503-142332959
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210503-142332959
Official Citation:scvi-tools: a library for deep probabilistic analysis of single-cell omics data. Adam Gayoso, Romain Lopez, Galen Xing, Pierre Boyeau, Katherine Wu, Michael Jayasuriya, Edouard Melhman, Maxime Langevin, Yining Liu, Jules Samaran, Gabriel Misrachi, Achille Nazaret, Oscar Clivio, Chenling Xu, Tal Ashuach, Mohammad Lotfollahi, Valentine Svensson, Eduardo da Veiga Beltrame, Carlos Talavera-López, Lior Pachter, Fabian J. Theis, Aaron Streets, Michael I. Jordan, Jeffrey Regier, Nir Yosef. bioRxiv 2021.04.28.441833; doi: https://doi.org/10.1101/2021.04.28.441833
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108947
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:03 May 2021 22:26
Last Modified:16 Nov 2021 19:33

Repository Staff Only: item control page