CaltechAUTHORS
  A Caltech Library Service

The Specious Art of Single-Cell Genomics

Chari, Tara and Banerjee, Joeyta and Pachter, Lior (2021) The Specious Art of Single-Cell Genomics. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20210831-175013923

[img] PDF (September 27, 2021) - Submitted Version
Creative Commons Attribution Non-commercial.

20MB
[img] PDF - Supplemental Material
Creative Commons Attribution Non-commercial.

20MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210831-175013923

Abstract

Dimensionality reduction is standard practice for filtering noise and identifying relevant dimensions in large-scale data analyses. In biology, single-cell expression studies almost always begin with reduction to two or three dimensions to produce 'all-in-one' visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative analysis of cell relationships. However, there is little theoretical support for this practice. We examine the theoretical and practical implications of low-dimensional embedding of single-cell data, and find extensive distortions incurred on the global and local properties of biological patterns relative to the high-dimensional, ambient space. In lieu of this, we propose semi-supervised dimension reduction to higher dimension, and show that such targeted reduction guided by the metadata associated with single-cell experiments provides useful latent space representations for hypothesis-driven biological discovery.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2021.08.25.457696DOIDiscussion Paper
https://github.com/pachterlab/CBP_2021Related ItemCode
https://github.com/pachterlab/picassoRelated ItemCode
https://github.com/pachterlab/MCMLRelated ItemCode
ORCID:
AuthorORCID
Chari, Tara0000-0002-6953-4313
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. Version 1: August 26, 2021; Version 2: September 21, 2021; Version 3: September 27, 2021. Some of the computations presented here were conducted using machines in the Resnick High Performance Center, a facility supported by the Resnick Sustainability Institute at the California Institute of Technology. We thank Gennady Gorin and Benjamin Riviere for helpful discussions regarding the MCML and Picasso analyses, Sina Booeshaghi for helpful discussions regarding NCA and dimensionality reduction, Ingileif Hallgrimsdottir for valuable feedback on the manuscript, and Pall Melsted for useful insights regarding Theorem 1. The work was supported in part by NIH grant U19MH114830 and Joeyta Banerjee was supported in part by the Caltech Summer Undergraduate Research Fellowship (SURF). Data Availability: Download links for the original data used to generate the figures and results in the paper are listed in Table 1. Processed and normalized versions of the count matrices are available on CaltechData, with links provided in Supplementary Table 1. Code Availability: All analysis code used to generate the figures and results in the paper is available at https://github.com/pachterlab/CBP_2021 with Picasso and MCML analyses provided in notebooks which can be run on Google Colab. Picasso is also available at https://github.com/pachterlab/picasso. The MCML method as well as tools for quantitative analysis are available via a Python pip installable package from https://github.com/pachterlab/MCML. Author Contributions: Conceived of the project: TC and LP Wrote scripts for processing the data and code for the analysis: TC and JB Developed the Google Colab notebooks: TC and JB Analyzed and interpreted the data: TC and LP Writing and editing the manuscript: TC and LP. The authors declare no competing interests.
Group:Resnick Sustainability Institute
Funders:
Funding AgencyGrant Number
Resnick Sustainability InstituteUNSPECIFIED
NIHU19MH114830
Caltech Summer Undergraduate Research Fellowship (SURF)UNSPECIFIED
DOI:10.1101/2021.08.25.457696
Record Number:CaltechAUTHORS:20210831-175013923
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210831-175013923
Official Citation:The Specious Art of Single-Cell Genomics. Tara Chari, Joeyta Banerjee, Lior Pachter. bioRxiv 2021.08.25.457696; doi: https://doi.org/10.1101/2021.08.25.457696
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:110638
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:31 Aug 2021 18:08
Last Modified:16 Nov 2021 19:41

Repository Staff Only: item control page