Dingle, Kamaludin and Novev, Javor K. and Ahnert, Sebastian E. and Louis, Ard A. (2022) Predicting phenotype transition probabilities via conditional algorithmic probability approximations. Journal of The Royal Society Interface, 19 (197). Art. No. 20220694. ISSN 1742-5662. PMCID PMC9748496. doi:10.1098/rsif.2022.0694. https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8
![]() |
PDF
- Published Version
Creative Commons Attribution. 919kB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8
Abstract
Unravelling the structure of genotype–phenotype (GP) maps is an important problem in biology. Recently, arguments inspired by algorithmic information theory (AIT) and Kolmogorov complexity have been invoked to uncover simplicity bias in GP maps, an exponentially decaying upper bound in phenotype probability with the increasing phenotype descriptional complexity. This means that phenotypes with many genotypes assigned via the GP map must be simple, while complex phenotypes must have few genotypes assigned. Here, we use similar arguments to bound the probability P(x → y) that phenotype x, upon random genetic mutation, transitions to phenotype y. The bound is P(x → y) ≾ 2^(-aK(y|x)-b), where K(y|x) is the estimated conditional complexity of y given x, quantifying how much extra information is required to make y given access to x. This upper bound is related to the conditional form of algorithmic probability from AIT. We demonstrate the practical applicability of our derived bound by predicting phenotype transition probabilities (and other related quantities) in simulations of RNA and protein secondary structures. Our work contributes to a general mathematical understanding of GP maps and may facilitate the prediction of transition probabilities directly from examining phenotype themselves, without utilizing detailed knowledge of the GP map.
Item Type: | Article | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||
ORCID: |
| ||||||||||||
Additional Information: | © 2022 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. This project was partially supported by Gulf University for Science and Technology under project code: ISG—Case (grant no. 263301) and a Summer Faculty Fellowship (both awarded to K.D.). This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant no. EP/T022159/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk). Data accessibility. The data for the proteins analysis is available from the public repository Protein Data Bank (PDB) with ID: 6WS6. For the RNA analysis, we did not use natural data, rather we generated random sequences. Code is available from the electronic supplementary material [88]. | ||||||||||||
Funders: |
| ||||||||||||
Issue or Number: | 197 | ||||||||||||
PubMed Central ID: | PMC9748496 | ||||||||||||
DOI: | 10.1098/rsif.2022.0694 | ||||||||||||
Record Number: | CaltechAUTHORS:20230111-282624100.8 | ||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8 | ||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||
ID Code: | 118779 | ||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||
Deposited By: | Research Services Depository | ||||||||||||
Deposited On: | 09 Feb 2023 18:23 | ||||||||||||
Last Modified: | 23 Mar 2023 21:23 |
Repository Staff Only: item control page