A Caltech Library Service

A Python library for probabilistic analysis of single-cell omics data

Gayoso, Adam and Lopez, Romain and Xing, Galen and Boyeau, Pierre and Amiri, Valeh Valiollah Pour and Hong, Justin and Wu, Katherine and Jayasuriya, Michael and Mehlman, Edouard and Langevin, Maxime and Liu, Yining and Samaran, Jules and Misrachi, Gabriel and Nazaret, Achille and Clivio, Oscar and Xu, Chenling and Ashuach, Tal and Gabitto, Mariano and Lotfollahi, Mohammad and Svensson, Valentine and da Veiga Beltrame, Eduardo and Kleshchevnikov, Vitalii and Talavera-López, Carlos and Pachter, Lior and Theis, Fabian J. and Streets, Aaron and Jordan, Michael I. and Regier, Jeffrey and Yosef, Nir (2022) A Python library for probabilistic analysis of single-cell omics data. Nature Biotechnology, 40 (2). pp. 163-166. ISSN 1087-0156. doi:10.1038/s41587-021-01206-w.

[img] PDF - Submitted Version
Creative Commons Attribution.

[img] PDF (Supplementary Figs. 1–8, Notes 1–5, Tables 1–3 and References) - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


Methods for analyzing single-cell data perform a core set of computational tasks. These tasks include dimensionality reduction, cell clustering, cell-state annotation, removal of unwanted variation, analysis of differential expression, identification of spatial patterns of gene expression, and joint analysis of multi-modal omics data. Many of these methods rely on likelihood-based models to represent variation in the data; we refer to these as ‘probabilistic models’. Probabilistic models provide principled ways to capture uncertainty in biological systems and are convenient for decomposing the many sources of variation that give rise to omics data.

Item Type:Article
Related URLs:
URLURL TypeDescription ReadCube access Paper
Gayoso, Adam0000-0001-9537-0845
Lopez, Romain0000-0003-0495-738X
Xing, Galen0000-0001-7376-6312
Boyeau, Pierre0000-0003-4549-3972
Amiri, Valeh Valiollah Pour0000-0002-2008-5297
Hong, Justin0000-0003-2115-9101
Wu, Katherine0000-0001-7562-4545
Jayasuriya, Michael0000-0003-2366-841X
Mehlman, Edouard0000-0001-6351-2220
Langevin, Maxime0000-0002-5498-4661
Liu, Yining0000-0002-8779-2906
Samaran, Jules0000-0001-7317-8190
Misrachi, Gabriel0000-0002-6020-4641
Nazaret, Achille0000-0002-5428-9810
Clivio, Oscar0000-0001-8668-4535
Xu, Chenling0000-0001-9610-7627
Ashuach, Tal0000-0003-1939-0865
Gabitto, Mariano0000-0001-6911-344X
Lotfollahi, Mohammad0000-0001-6858-7985
Svensson, Valentine0000-0002-9217-2330
da Veiga Beltrame, Eduardo0000-0002-1529-9207
Kleshchevnikov, Vitalii0000-0001-9110-7441
Talavera-López, Carlos0000-0001-8590-2393
Pachter, Lior0000-0002-9164-6231
Theis, Fabian J.0000-0002-2419-1943
Streets, Aaron0000-0002-3909-8389
Jordan, Michael I.0000-0001-8935-817X
Regier, Jeffrey0000-0002-1472-5235
Yosef, Nir0000-0001-9004-1225
Alternate Title:scvi-tools: a library for deep probabilistic analysis of single-cell omics data
Additional Information:© 2022 Nature Publishing Group. Published 07 February 2022. We acknowledge members of the Streets and Yosef laboratories for general feedback. We thank all the GitHub users who contributed code to scvi-tools over the years. We thank Nicholas Everetts for help with the analysis of the Drosophila data. We thank David Kelley and Nick Bernstein for help implementing Solo. We thank Marco Wagenstetter and Sergei Rybakov for help with the transition of the scGen package to use scvi-tools, as well as feedback on the scArches implementation. We thank Hector Roux de Bézieux for insightful discussions about the R ecosystem. We thank Kieran Campbell and Allen Zhang for clarifying aspects of the original CellAssign implementation. We thank the Pyro team, including Eli Bingham, Martin Jankowiak and Fritz Obermeyer, for help integrating Pyro in scvi-tools. Research reported in this manuscript was supported by the NIGMS of the National Institutes of Health under award number R35GM124916 and by the Chan-Zuckerberg Foundation Network under grant number 2019-02452. O.C. is supported by the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning (EP/S023151/1, studentship 2420649). A.G. is supported by NIH Training Grant 5T32HG000047-19. A.S. and N.Y. are Chan Zuckerberg Biohub investigators. Contributions: A.G., R.L and G.X. contributed equally. A.G. designed the scvi-tools application programming interface with input from G.X. and R.L. G.X. and A.G. led development of scvi-tools with input from R.L. G.X. reimplemented scVI, totalVI, AutoZI and scANVI with input from A.G. R.L. implemented Stereoscope with input from A.G. Data analysis in this manuscript was led by A.G., R.L. and G.X, with input from N.Y. A.G., R.L., P.B., E.M., M. Langevin., Y.L., J.S., G.M. and A.N., O.C. worked on the initial version of the codebase (scvi package), with input from M.I.J, J.R. and N.Y. R.L., E.M. and C.X. contributed the scANVI model, with input from J.R. and N.Y. A.G. implemented totalVI with input from A.S. and N.Y. T.A. implemented peakVI with input from A.G. A.G implemented scArches with input from M. Lotfollahi., F.J.T and N.Y. V.S. made several contributions to the codebase, including the LDVAE model. P.B. contributed the differential expression programming interface. E.d.V.B. and C.T.-L. provided tutorials on differential expression and deconvolution of spatial transcriptomics, with input from L.P. K.W. implemented CellAssign in the codebase with input from A.G. V.V.P.A., J.H. and M.J. made general code contributions and helped maintain scvi-tools. J.H. implemented LDA. T.A. and M.G. implemented MultiVI. V.K. improved Pyro support in scvi-tools and ported Cell2Location to use scvi-tools. N.Y. supervised all research. A.G., R.L., G.X., J.R. and N.Y. wrote the manuscript. Competing interests: V.S. is a full-time employee of Serqet Therapeutics and has ownership interest in Serqet Therapeutics. F.J.T. reports consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc. N.Y. is an advisor to and/or has equity in Cellarity, Celsius Therapeutics and Rheos Medicines. The remaining authors declare no competing interests. Peer review information: Nature Biotechnology thanks Martin Hemberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Funding AgencyGrant Number
Chan-Zuckerberg Foundation2019-02452
Engineering and Physical Sciences Research Council (EPSRC)EP/S023151/1
Engineering and Physical Sciences Research Council (EPSRC)2420649
NIH Predoctoral Fellowship5T32HG000047-19
Subject Keywords:Computational models; Machine learning; Software; Statistical methods
Issue or Number:2
Record Number:CaltechAUTHORS:20210503-142332959
Persistent URL:
Official Citation:Gayoso, A., Lopez, R., Xing, G. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 40, 163–166 (2022).
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108947
Deposited By: Tony Diaz
Deposited On:03 May 2021 22:26
Last Modified:02 Mar 2022 18:28

Repository Staff Only: item control page