CaltechAUTHORS
  A Caltech Library Service

Predicting phenotype transition probabilities via conditional algorithmic probability approximations

Dingle, Kamaludin and Novev, Javor K. and Ahnert, Sebastian E. and Louis, Ard A. (2022) Predicting phenotype transition probabilities via conditional algorithmic probability approximations. Journal of The Royal Society Interface, 19 (197). Art. No. 20220694. ISSN 1742-5662. PMCID PMC9748496. doi:10.1098/rsif.2022.0694. https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8

[img] PDF - Published Version
Creative Commons Attribution.

919kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8

Abstract

Unravelling the structure of genotype–phenotype (GP) maps is an important problem in biology. Recently, arguments inspired by algorithmic information theory (AIT) and Kolmogorov complexity have been invoked to uncover simplicity bias in GP maps, an exponentially decaying upper bound in phenotype probability with the increasing phenotype descriptional complexity. This means that phenotypes with many genotypes assigned via the GP map must be simple, while complex phenotypes must have few genotypes assigned. Here, we use similar arguments to bound the probability P(x → y) that phenotype x, upon random genetic mutation, transitions to phenotype y. The bound is P(x → y) ≾ 2^(-aK(y|x)-b), where K(y|x) is the estimated conditional complexity of y given x, quantifying how much extra information is required to make y given access to x. This upper bound is related to the conditional form of algorithmic probability from AIT. We demonstrate the practical applicability of our derived bound by predicting phenotype transition probabilities (and other related quantities) in simulations of RNA and protein secondary structures. Our work contributes to a general mathematical understanding of GP maps and may facilitate the prediction of transition probabilities directly from examining phenotype themselves, without utilizing detailed knowledge of the GP map.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1098/rsif.2022.0694DOIArticle
http://www.ncbi.nlm.nih.gov/pmc/articles/pmc9748496/PubMed CentralArticle
https://resolver.caltech.edu/CaltechAUTHORS:20230322-366976000.11Related ItemDiscussion Paper
ORCID:
AuthorORCID
Dingle, Kamaludin0000-0003-4423-3255
Novev, Javor K.0000-0001-9757-5967
Ahnert, Sebastian E.0000-0003-2613-0041
Louis, Ard A.0000-0002-8438-910X
Additional Information:© 2022 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. This project was partially supported by Gulf University for Science and Technology under project code: ISG—Case (grant no. 263301) and a Summer Faculty Fellowship (both awarded to K.D.). This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant no. EP/T022159/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk). Data accessibility. The data for the proteins analysis is available from the public repository Protein Data Bank (PDB) with ID: 6WS6. For the RNA analysis, we did not use natural data, rather we generated random sequences. Code is available from the electronic supplementary material [88].
Funders:
Funding AgencyGrant Number
Gulf University for Science and Technology263301
Engineering and Physical Sciences Research Council (EPSRC)EP/T022159/1
Science and Technology Facilities Council (STFC)UNSPECIFIED
Issue or Number:197
PubMed Central ID:PMC9748496
DOI:10.1098/rsif.2022.0694
Record Number:CaltechAUTHORS:20230111-282624100.8
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20230111-282624100.8
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:118779
Collection:CaltechAUTHORS
Deposited By: Research Services Depository
Deposited On:09 Feb 2023 18:23
Last Modified:23 Mar 2023 21:23

Repository Staff Only: item control page