A Caltech Library Service

Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

Lou, Hao and Schwartz, Moshe and Bruck, Jehoshua and Farnoud (Hassanzadeh), Farzad (2020) Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems. IEEE Transactions on Information Theory, 66 (5). pp. 3171-3186. ISSN 0018-9448. doi:10.1109/TIT.2019.2946846.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that k-mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper
Lou, Hao0000-0002-6133-2987
Schwartz, Moshe0000-0002-1449-0026
Bruck, Jehoshua0000-0001-8474-0812
Farnoud (Hassanzadeh), Farzad0000-0002-8684-4487
Additional Information:© 2019 IEEE. Manuscript received November 28, 2018; revised June 25, 2019; accepted September 28, 2019. Date of publication October 10, 2019; date of current version April 21, 2020. This work was supported in part by the United States–Israel Binational Science Foundation (BSF) under Grant 2017652 and in part by NSF under Grant CCF-1816409, Grant CCF-1755773, Grant CCF-1816965, and Grant CCF-1717884. This article was presented in part at ISIT 2018 and ISIT 2015.
Funding AgencyGrant Number
Binational Science Foundation (USA-Israel)2017652
Subject Keywords:String-duplication systems, substitution mutation, entropy
Issue or Number:5
Record Number:CaltechAUTHORS:20191004-142813980
Persistent URL:
Official Citation:H. Lou, M. Schwartz, J. Bruck and F. Farnoud, "Evolution of k-Mer Frequencies and Entropy in Duplication and Substitution Mutation Systems," in IEEE Transactions on Information Theory, vol. 66, no. 5, pp. 3171-3186, May 2020, doi: 10.1109/TIT.2019.2946846.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:99095
Deposited By: Tony Diaz
Deposited On:04 Oct 2019 21:38
Last Modified:16 Nov 2021 17:43

Repository Staff Only: item control page