CaltechAUTHORS
  A Caltech Library Service

Deterministic column subset selection for single-cell RNA-Seq

McCurdy, Shannon R. and Ntranos, Vasilis and Pachter, Lior (2019) Deterministic column subset selection for single-cell RNA-Seq. PLoS ONE, 14 (1). Art. No. e0210571. ISSN 1932-6203. PMCID PMC6347249. https://resolver.caltech.edu/CaltechAUTHORS:20181029-133340286

[img] PDF - Published Version
Creative Commons Attribution.

2541Kb
[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

1298Kb
[img] PDF - Supplemental Material
Creative Commons Attribution.

496Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20181029-133340286

Abstract

Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1371/journal.pone.0210571DOIArticle
https://doi.org/10.1101/159079DOIDiscussion Paper
https://doi.org/10.1371/journal.pone.0210571.s001DOISupplementary Material
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347249/PubMed CentralArticle
ORCID:
AuthorORCID
McCurdy, Shannon R.0000-0001-5555-4156
Ntranos, Vasilis0000-0002-2477-0670
Pachter, Lior0000-0002-9164-6231
Alternate Title:Column subset selection for single-cell RNA-Seq clustering
Additional Information:© 2019 McCurdy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: March 27, 2018; Accepted: December 26, 2018; Published: January 25, 2019. Data Availability: All the single-cell gene expression files are available from the NCBI Sequence Read Archive (mouse brain: accession number SRA SRP045452, mouse bone marrow: accession number SRA SRP063520). The Python package containing code to perform the methods described in the article can be found at https://github.com/srmcc/dcss_single_cell.git. The package also contains code to download the datasets used as examples in the article" in your manuscript. SRM is funded by Award Number F32HG008713 from the National Human Genome Research Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors have declared that no competing interests exist. SRM would like to acknowledge Ilan Shomorony, Elaine Angelino, and Robert Tunney for useful comments. Author Contributions: Conceptualization: Shannon R. McCurdy, Vasilis Ntranos. Formal analysis: Shannon R. McCurdy. Methodology: Shannon R. McCurdy, Vasilis Ntranos. Software: Shannon R. McCurdy. Supervision: Lior Pachter. Validation: Shannon R. McCurdy, Vasilis Ntranos. Visualization: Shannon R. McCurdy. Writing – original draft: Shannon R. McCurdy. Writing – review & editing: Shannon R. McCurdy, Vasilis Ntranos.
Funders:
Funding AgencyGrant Number
NIH Postdoctoral FellowshipF32HG008713
Issue or Number:1
PubMed Central ID:PMC6347249
Record Number:CaltechAUTHORS:20181029-133340286
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20181029-133340286
Official Citation:McCurdy SR, Ntranos V, Pachter L (2019) Deterministic column subset selection for single-cell RNA-Seq. PLoS ONE 14(1): e0210571. https://doi.org/10.1371/journal.pone.0210571
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:90471
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:30 Oct 2018 01:58
Last Modified:09 Mar 2020 13:19

Repository Staff Only: item control page