Studying stochastic systems biology of the cell with single-cell genomics data
- Creators
- Gorin, Gennady
- Vastola, John J.
- Pachter, Lior
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Additional Information
This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. G.G. and L.P. were partially funded by NIH 5UM1HG012077-02 and NIH U19MH114830. J.V. was partially funded by NIH 1U19NS118246-01. The RNA, DNA, and cDNA illustrations were derived from the DNA Twemoji by Twitter, Inc., used under the CC-BY 4.0 license. The authors thank Dr. A. Sina Booeshaghi, Maria Carilli, Tara Chari, Taleen Dilanyan, Dr. Kristján Eldjárn Hjörleifsson, Meichen Fang, Catherine Felce, and Delaney Sullivan for fruitful discussions of co-regulation, contamination, transient behaviors, catalysis, fragmentation, genomic alignment, and a variety of other phenomena and processes. Part of this work was performed during G.G.'s Data Sciences Co-op with Celsius Therapeutics, Inc. DATA AVAILABILITY. Notebooks that reproduce all of the results in the figures are hosted at https://github.com/pachterlab/GVP_2023. The raw data used to generate Figure 2b–c, as well as related supplementary figures, are hosted as the Zenodo package 7694182. The data and Monod fits reported in Figure 5d–e, originating from Gorin et al.21, are hosted as the Zenodo package 7388133, and were originally generated using the notebooks and scripts at https://github.com/pachterlab/GP_2021_3/. The authors have declared no competing interest.Attached Files
Submitted - nihpp-2023.05.17.541250v2.pdf
Supplemental Material - media-1.xlsx
Supplemental Material - media-2.pdf
Files
Additional details
- PMCID
- PMC10245677
- Eprint ID
- 122021
- Resolver ID
- CaltechAUTHORS:20230628-257070000.17
- NIH
- 5UM1HG012077-02
- NIH
- U19MH114830
- NIH
- 1U19NS118246-01
- Created
-
2023-06-30Created from EPrint's datestamp field
- Updated
-
2023-06-30Created from EPrint's last_modified field
- Caltech groups
- Tianqiao and Chrissy Chen Institute for Neuroscience, Division of Biology and Biological Engineering