CaltechAUTHORS
  A Caltech Library Service

Cancer Classification from Healthy DNA using Machine Learning

Jain, Siddharth and Mazaheri, Bijan and Raviv, Netanel and Bruck, Jehoshua (2019) Cancer Classification from Healthy DNA using Machine Learning. . (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:20190114-074334836

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

2817Kb
[img] Archive (ZIP) - Supplemental Material
Creative Commons Attribution Non-commercial No Derivatives.

339Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20190114-074334836

Abstract

The genome is traditionally viewed as a time-independent source of information; a paradigm that drives researchers to seek correlations between the presence of certain genes and a patient's risk of disease. This analysis neglects genomic temporal changes, which we believe to be a crucial signal for predicting an individual's susceptibility to cancer. We hypothesize that each individual's genome passes through an evolution channel (The term channel is motivated by the notion of communication channel introduced by Shannon in 1948 and started the area of Information Theory), that is controlled by hereditary, environmental and stochastic factors. This channel differs among individuals, giving rise to varying predispositions to developing cancer. We introduce the concept of mutation profiles that are computed without any comparative analysis, but by analyzing the short tandem repeat regions in a single healthy genome and capturing information about the individual's evolution channel. Using machine learning on data from more than 5,000 TCGA cancer patients, we demonstrate that these mutation profiles can accurately distinguish between patients with various types of cancer. For example, the pairwise validation accuracy of the classifier between PAAD (pancreas) patients and GBM (brain) patients is 93%. Our results show that healthy unaffected cells still contain a cancer-specific signal, which opens the possibility of cancer prediction from a healthy genome.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/517839DOIDiscussion Paper
ORCID:
AuthorORCID
Jain, Siddharth0000-0002-9164-6119
Raviv, Netanel0000-0002-1686-1994
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint first posted online Jan. 11, 2019. This work was supported in part by The Caltech Mead New Adventure Fund and a Caltech CI2 Fund. The authors would like to thank Eytan Ruppin for his valuable advice and feedback. Author contributions statement: S.J. analyzed the TCGA genomic data, implemented repeat finding and history estimation steps in the pipeline, helped with the machine learning step in building pairwise and multiclassifiers, and wrote the manuscript; B.M. implemented the machine learning pipeline; N.R. implemented the alignment algorithm; J.B. originated and guided the study. S.J., B.M., N.R. and J.B. participated in brainstorming of the concepts and discussions and revisions of the manuscript. The authors declare no competing interests. The ethics approval to the TCGA data was granted by Caltech Institutional Review Board.
Funders:
Funding AgencyGrant Number
Caltech Mead New Adventure FundUNSPECIFIED
Caltech Innovation Initiative (CI2)32070065
Record Number:CaltechAUTHORS:20190114-074334836
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20190114-074334836
Official Citation:Cancer Classification from Healthy DNA using Machine Learning. Siddharth Jain, Bijan Mazaheri, Netanel Raviv, Jehoshua Bruck. bioRxiv 517839; doi: https://doi.org/10.1101/517839
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:92237
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:14 Jan 2019 21:03
Last Modified:14 Jan 2019 21:03

Repository Staff Only: item control page