CaltechAUTHORS
  A Caltech Library Service

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Tareen, Ammar and Kooshkbaghi, Mahdi and Posfai, Anna and Ireland, William T. and McCandlish, David M. and Kinney, Justin B. (2020) MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475

[img] PDF (June 27, 2021) - Submitted Version
Creative Commons Attribution.

3MB
[img] PDF - Supplemental Material
Creative Commons Attribution.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475

Abstract

Multiplex assays of variant effect (MAVEs) are diverse techniques that include deep mutational scanning (DMS) experiments on proteins and massively parallel reporter assays (MPRAs) on cis-regulatory sequences. MAVEs are being rapidly adopted in many areas of biology, but a general strategy for inferring quantitative models of genotype-phenotype (G-P) maps from MAVE data is lacking. Here we introduce a conceptually unified approach for learning G-P maps from MAVE datasets. Our strategy is grounded in concepts from information theory, and is based on the view of G-P maps as a form of information compression. We also introduce MAVE-NN, an easy-to-use Python package that implements this approach using a neural network backend. The ability of MAVE-NN to infer diverse G-P maps—including biophysically interpretable models—is demonstrated on DMS and MPRA data in a variety of biological contexts. MAVE-NN thus provides a unified solution to a major outstanding need in the MAVE community.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2020.07.14.201475DOIDiscussion Paper
https://mavenn.readthedocs.ioRelated ItemData/Code
ORCID:
AuthorORCID
Ireland, William T.0000-0003-0971-2904
Kinney, Justin B.0000-0003-1897-3778
Alternate Title:MAVE-NN: Quantitative Modeling of Genotype-Phenotype Maps as Information Bottlenecks
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Version 1: July 14, 2020; Version 2: December 14, 2020; Version 3: June 27, 2021. This work was supported by NIH grant 1R35GM133777 (awarded to JBK), NIH Grant 1R35GM133613 (awarded to DMM), an Alfred P. Sloan Research Fellowship (awarded to DMM), a grant from the CSHL/Northwell Health partnership, and funding from the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. Author contributions: AT, WTI, DMM, and JBK conceived the project. AT and JBK wrote the software with assistance from AP and MK. WTI and JBK wrote a preliminary version of the software. AT, MK, and JBK performed the data analysis. AT, DMM, and JBK wrote the manuscript with contributions from MK and AP. The authors declare that they have no known conflicts of interest.
Funders:
Funding AgencyGrant Number
NIH1R35GM133777
NIH1R35GM133613
Alfred P. Sloan FoundationUNSPECIFIED
Northwell HealthUNSPECIFIED
Cold Spring Harbor LaboratoryUNSPECIFIED
Subject Keywords:multiplex assay of variant effect; neural networks; deep mutational scanning; massively parallel reporter assay; global epistasis; mutual information
Record Number:CaltechAUTHORS:20200716-073040475
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200716-073040475
Official Citation:MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Ammar Tareen, Mahdi Kooshkbaghi, Anna Posfai, William T. Ireland, David M. McCandlish, Justin B. Kinney. bioRxiv 2020.07.14.201475; doi: https://doi.org/10.1101/2020.07.14.201475
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:104394
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:16 Jul 2020 16:09
Last Modified:06 Jul 2021 19:31

Repository Staff Only: item control page