A Caltech Library Service

Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data

Carilli, Maria and Gorin, Gennady and Choi, Yongin and Chari, Tara and Pachter, Lior (2023) Mechanistic modeling with a variational autoencoder for multimodal single-cell RNA sequencing data. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. In simulated benchmarking, biVI accurately recapitulates key properties of interest, including cell type structure, parameter values, and copy number distributions. In biological datasets, biVI provides a route for the identification of the biophysical mechanisms underlying differential expression. The analytical approach outlines a generalizable strategy for representing multimodal datasets generated by single-cell RNA sequencing.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Carilli, Maria0000-0002-8977-7224
Gorin, Gennady0000-0001-6097-2029
Choi, Yongin0000-0002-8996-8434
Chari, Tara0000-0002-6953-4313
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. M.C., G.G., T.C., and L.P. were partially funded by IGVF-1-UCI.IGVF and NIH U19MH114830. Y.C. was partially funded by T32 GM007377. G.G. thanks Drs. Ido Golding and Heng Xu for the inspiration leading to the explanatory model for the zero-inflated negative binomial distribution in Section S1.4. The RNA illustrations used in Figures 1, 2, S1, and S2 were derived from the DNA Twemoji by Twitter, Inc., used under the CC-BY 4.0 license. We thank the Caltech Bioinformatics Resource Center for GPU resources that helped in performing the analyses. Data availability. Simulated datasets, simulated parameters used to generate them, and Allen dataset B08 and its associated metadata are available in the Zenodo package 7497222. All analysis scripts and notebooks are available at The repository also contains a Google Colaboratory demonstration notebook applying the methods to a small human blood cell dataset. The authors have declared no competing interest.
Funding AgencyGrant Number
Impact of Genomic Variation on Function (IGVF) ConsortiumUNSPECIFIED
NIH Predoctoral FellowshipT32 GM007377
Record Number:CaltechAUTHORS:20230316-182533000.39
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120153
Deposited By: George Porter
Deposited On:18 Mar 2023 02:31
Last Modified:18 Mar 2023 02:31

Repository Staff Only: item control page