CaltechAUTHORS
  A Caltech Library Service

Active feature selection discovers minimal gene-sets for classifying cell-types and disease states in single-cell mRNA-seq data

Chen, Xiaoqiao and Chen, Sisi and Thomson, Matt (2021) Active feature selection discovers minimal gene-sets for classifying cell-types and disease states in single-cell mRNA-seq data. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20210622-154854635

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

14MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210622-154854635

Abstract

Sequencing costs currently prohibit the application of single cell mRNA-seq for many biological and clinical tasks of interest. Here, we introduce an active learning framework that constructs compressed gene sets that enable high accuracy classification of cell-types and physiological states while analyzing a minimal number of gene transcripts. Our active feature selection procedure constructs gene sets through an iterative cell-type classification task where misclassified cells are examined at each round to identify maximally informative genes through an ‘active’ support vector machine (SVM) classifier. Our active SVM procedure automatically identifies gene sets that enables > 90% cell-type classification accuracy in the Tabula Muris mouse tissue survey as well as a 40 gene set that enables classification of multiple myeloma patient samples with > 95% accuracy. Broadly, the discovery of compact but highly informative gene sets might enable drastic reductions in sequencing requirements for applications of single-cell mRNA-seq.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2021.06.15.448478DOIDiscussion Paper
http://support.10xgenomics.com/single-cell/datasetsRelated ItemData
https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_ Mus_musculus_at_single_cell_resolution/27733Related ItemData
https://figshare.com/articles/dataset/PopAlign_Data/11837097/3Related ItemData
https://github.com/xqchen/Active-feature-selection-in-single-cell-mRNA-seq-dataRelated ItemCode
ORCID:
AuthorORCID
Chen, Sisi0000-0001-9448-9713
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. This version posted June 16, 2021. Data Availability: The PBMC Single-cell RNA-seq data have been deposited in the Short Read Archive under accession number SRP073767 by the authors of [13]. Data are also available at http://support.10xgenomics.com/single-cell/datasets. The original Tabula Muris dataset is available at https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_Mus_musculus_at_single_cell_resolution/27733. The original multiple myeloma PBMC dataset, containing 2 healthy donors and 4 multiple myeloma donors, is available at https://figshare.com/articles/dataset/PopAlign_Data/11837097/3. Code Availability: The example source codes including algorithm and preprocessing codes are publicly available on GitHub at https://github.com/xqchen/Active-feature-selection-in-single-cell-mRNA-seq-data.
DOI:10.1101/2021.06.15.448478
Record Number:CaltechAUTHORS:20210622-154854635
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210622-154854635
Official Citation:Active feature selection discovers minimal gene-sets for classifying cell-types and disease states in single-cell mRNA-seq data. Xiaoqiao Chen, Sisi Chen, Matt Thomson. bioRxiv 2021.06.15.448478; doi: https://doi.org/10.1101/2021.06.15.448478
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109524
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:23 Jun 2021 19:39
Last Modified:16 Nov 2021 19:36

Repository Staff Only: item control page