CaltechAUTHORS
  A Caltech Library Service

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Tareen, Ammar and Posfai, Anna and Ireland, William T. and McCandlish, David M. and Kinney, Justin B. (2020) MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475

[img] PDF (December 14, 2020) - Submitted Version
Creative Commons Attribution.

2480Kb
[img] PDF - Supplemental Material
Creative Commons Attribution.

805Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475

Abstract

Multiplex assays of variant effect (MAVEs), which include massively parallel reporter assays (MPRAs) and deep mutational scanning (DMS) experiments, are being rapidly adopted in many areas of biology. However, inferring quantitative models of genotype-phenotype (G-P) maps from MAVE data remains challenging, and different inference approaches have been advocated in different MAVE contexts. Here we introduce a conceptually unified approach to the problem of learning G-P maps from MAVE data. Our strategy is grounded in concepts from information theory, and is based on the view of G-P maps as a form of information compression. We also introduce MAVE-NN, a Python package that implements this approach using a neural network backend. The capabilities and advantages of MAVE-NN are then demonstrated on three diverse DMS and MPRA datasets. MAVE-NN thus fills a major need in the computational analysis of MAVE data. Installation instructions, tutorials, and documentation are provided at https://mavenn.readthedocs.io.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2020.07.14.201475DOIDiscussion Paper
https://mavenn.readthedocs.ioRelated ItemData/Code
ORCID:
AuthorORCID
Ireland, William T.0000-0003-0971-2904
Kinney, Justin B.0000-0003-1897-3778
Alternate Title:MAVE-NN: Quantitative Modeling of Genotype-Phenotype Maps as Information Bottlenecks
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Version 1: July 14, 2020; Version 2: December 14, 2020. The authors thank Jesse Bloom and Peter Koo for providing valuable feedback on the manuscript. This work was supported by NIH grant 1R35GM133777 (awarded to JBK), NIH Grant 1R35GM133613 (awarded to DMM), an Alfred P. Sloan Research Fellowship (awarded to DMM), a grant from the CSHL/Northwell Health partnership, and funding from the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. Availability of data and materials: • Project: mavenn; • Documentation: mavenn.readthedocs.io; • Programming language: Python; • Installation: pip install mavenn; • License: MIT; • Restrictions on use by non-academics: None. The authors declare that they have no competing interests. Author's contributions: JBK, AT, and DMM conceived the project. AT and JBK wrote the software. AT tested the software and released it as a python package on PYPI. AT, DMM, and JBK wrote the manuscript. WTI wrote a preliminary version of the software. AP performed the gauge fixing analysis. All authors contributed to aspects of the analyses.
Funders:
Funding AgencyGrant Number
NIH1R35GM133777
NIH1R35GM133613
Alfred P. Sloan FoundationUNSPECIFIED
Northwell HealthUNSPECIFIED
Cold Spring Harbor LaboratoryUNSPECIFIED
Subject Keywords:multiplex assay of variant effect; neural networks; deep mutational scanning; massively parallel reporter assay; global epistasis; mutual information
Record Number:CaltechAUTHORS:20200716-073040475
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475
Official Citation:MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Ammar Tareen, Anna Posfai, William Thornton Ireland, David Martin McCandlish, Justin Block Kinney. bioRxiv 2020.07.14.201475; doi: https://doi.org/10.1101/2020.07.14.201475
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:104394
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:16 Jul 2020 16:09
Last Modified:15 Dec 2020 21:11

Repository Staff Only: item control page