A Caltech Library Service

SVFX: a machine-learning framework to quantify the pathogenicity of structural variants

Kumar, Sushant and Harmanci, Arif and Vytheeswaran, Jagath and Gerstein, Mark B. (2019) SVFX: a machine-learning framework to quantify the pathogenicity of structural variants. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial.

[img] PDF (Supplemental Figures) - Supplemental Material
Creative Commons Attribution Non-commercial.

[img] MS Excel (Supplemental Tables) - Supplemental Material
Creative Commons Attribution Non-commercial.


Use this Persistent URL to link to this item:


A rapid decline in sequencing cost has made large-scale genome sequencing studies feasible. One of the fundamental goals of these studies is to catalog all pathogenic variants. Numerous methods and tools have been developed to interpret point mutations and small insertions and deletions. However, there is a lack of approaches for identifying pathogenic genomic structural variations (SVs). That said, SVs are known to play a crucial role in many diseases by altering the sequence and three-dimensional structure of the genome. Previous studies have suggested a complex interplay of genomic and epigenomic features in the emergence and distribution of SVs. However, the exact mechanism of pathogenesis for SVs in different diseases is not straightforward to decipher. Thus, we built an agnostic machine-learning-based workflow, called SVFX, to assign a pathogenicity score to somatic and germline SVs in various diseases. In particular, we generated somatic and germline training models, which included genomic, epigenomic, and conservation-based features for SV call sets in diseased and healthy individuals. We then applied SVFX to SVs in six different cancer cohorts and a cardiovascular disease (CVD) cohort. Overall, SVFX achieved high accuracy in identifying pathogenic SVs. Moreover, we found that predicted pathogenic SVs in cancer cohorts were enriched among known cancer genes and many cancer-related pathways (including Wnt signaling, Ras signaling, DNA repair, and ubiquitin-mediated proteolysis). Finally, we note that SVFX is flexible and can be easily extended to identify pathogenic SVs in additional disease cohorts.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Kumar, Sushant0000-0002-2294-3988
Harmanci, Arif0000-0002-9696-1118
Gerstein, Mark B.0000-0002-9746-3719
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. bioRxiv preprint first posted online Aug. 19, 2019. We acknowledge support from the NIH and the AL Williams Professorship funds. We are thankful to the members of the PCAWG SV working group for generating the variant calls. We are also grateful to the Center for Common Disease, and the Genome Sequencing Program consortium members for creating SV calls for the CVD cohort used in this study.
Funding AgencyGrant Number
A. L. Williams Professorship FundsUNSPECIFIED
Record Number:CaltechAUTHORS:20190819-105323235
Persistent URL:
Official Citation:SVFX: a machine-learning framework to quantify the pathogenicity of structural variants. Sushant Kumar, Arif Harmanci, Jagath Vytheeswaran, Mark Gerstein. bioRxiv 739474; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:97998
Deposited By: Tony Diaz
Deposited On:19 Aug 2019 20:58
Last Modified:31 Jan 2020 21:18

Repository Staff Only: item control page