of 9
Physica D 463 (2024) 134160
Available online 12 April 2024
0167-2789/© 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC license (
http://creativecommons.org/licenses/by-
nc/4.0/
).
Contents lists available at
ScienceDirect
Physica D
journal homepage:
www.elsevier.com/locate/physd
Simplicity bias, algorithmic probability, and the random logistic map
Boumediene Hamzi
a
,
b
,
, Kamaludin Dingle
c
a
Department of Computing and Mathematical Sciences, Caltech, CA, USA
b
The Alan Turing Institute, London, UK
c
Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Kuwait
A R T I C L E I N F O
Communicated by V.M. Perez-Garcia
Keywords:
Random dynamical systems
Algorithmic probability
Simplicity bias
Time series
Machine learning
A B S T R A C T
Simplicity bias
is an intriguing phenomenon prevalent in various input–output maps, characterized by a
preference for simpler, more regular, or symmetric outputs. Notably, these maps typically feature high-
probability outputs with simple patterns, whereas complex patterns are exponentially less probable. This bias
has been extensively examined and attributed to principles derived from algorithmic information theory and
algorithmic probability. In a significant advancement, it has been demonstrated that the renowned logistic map
+1
=
휇푥
(1 −
)
, a staple in dynamical systems theory, and other one-dimensional maps exhibit simplicity
bias when conceptualized as input–output systems. Building upon this work, our research delves into the
manifestations of simplicity bias within the random logistic map, specifically focusing on scenarios involving
additive noise. This investigation is driven by the overarching goal of formulating a comprehensive theory for
the prediction and analysis of time series.
Our primary contributions are multifaceted. We discover that simplicity bias is observable in the random
logistic map for specific ranges of
and noise magnitudes. Additionally, we find that this bias persists even with
the introduction of small measurement noise, though it diminishes as noise levels increase. Our studies also
revisit the phenomenon of noise-induced chaos, particularly when
= 3
.
83
, revealing its characteristics through
complexity-probability plots. Intriguingly, we employ the logistic map to illustrate a paradoxical aspect of data
analysis: more data adhering to a consistent trend can occasionally lead to
reduced
confidence in extrapolation
predictions, challenging conventional wisdom.
We propose that adopting a probability-complexity perspective in analyzing dynamical systems could
significantly enrich statistical learning theories related to series prediction and analysis. This approach not
only facilitates a deeper understanding of simplicity bias and its implications but also paves the way for novel
methodologies in forecasting complex systems behavior, especially in scenarios dominated by uncertainty and
stochasticity.
1. Introduction
Dynamical systems appear in many fields including financial, en-
gineering, meteorological, medical, and other areas. Because of this,
developing methods to predict or understand the behavior and trajecto-
ries of dynamical systems is an important area of applied mathematics
research. Historically, such predictions have been based on modeling
the relevant system (e.g., with ODEs). More recently, however, machine
learning advances have given tools for the prediction and analysis of
dynamical systems. The machine learning approach instead relies on
directly extracting patterns from data rather than detailed modeling of
the underlying mechanics [
1–19].
It follows that an interesting question is the extent to which machine
learning methods apply to dynamical systems analysis and prediction.
Corresponding author at: Department of Computing and Mathematical Sciences, Caltech, CA, USA.
E-mail addresses:
boumediene.hamzi@gmail.com
(B. Hamzi),
dingle.k@gust.edu.kw
(K. Dingle).
Because information theory and machine learning are very closely con-
nected [
20], this motivates exploring the interface between information
theory and dynamical systems research. With this in mind, here we
expand on the interface between
algorithmicinformationtheory
[21–23]
(AIT), and specifically
algorithmic probability
[21,24–26], and dynami-
cal systems. Our current study builds on the work of Ref. [
27], in which
various 1D maps from chaos theory were numerically examined from
these perspectives.
Random dynamical systems have been investigated for some decades
[28]. Initially, the field of dynamical systems began with works on
deterministic systems, but following this randomly perturbed dynamics
became a focus area, and indeed random dynamical systems are a
natural way to model physical and other systems (e.g., [
29]). Random
dynamical systems also provide a rich area of mathematics for study,
https://doi.org/10.1016/j.physd.2024.134160
Received 31 December 2023; Received in revised form 3 April 2024; Accepted 4 April 2024
Physica D: Nonlinear Phenomena 463 (2024) 134160
2
B. Hamzi and K. Dingle
and they can yield behaviors that are qualitatively different from
deterministic dynamical systems, e.g. via noise-induced chaos [30].
Numerous studies of
simplicitybias
have been made in input–output
maps, in which a general inverse relationship between the complexity
of outputs and their respective probabilities has been observed [31,
32]. This work is based on information theory and computation re-
sults and was derived from arguments inspired by AIT, and can be
viewed as an attempt to apply algorithmic probability in real-world
settings [31]. The simplicity bias upper bound [31] gives a way to
bound the probability
(
)
of observing output pattern
based essen-
tially only on estimating the Kolmogorov complexity of the pattern
in
. Simplicity bias analysis has been applied in many contexts,
including machine learning and deep neural networks [33–39]. Sim-
plicity bias and algorithmic probability are closely related to Occam’s
razor [40], a fundamental basis of scientific reasoning and model
selection, that simpler theories/models should be preferred over more
complex theories/models provided they explain the data equally well.
The goal of this paper is to bridge the gap between machine learning
and algorithmic information theory using dynamical systems, particu-
larly focusing on the concept of simplicity bias within random logistic
maps. We aim to extend the understanding of simplicity bias to encom-
pass the inherently stochastic nature of real-world systems as modeled
by random dynamical systems. Our approach is twofold: first, to elu-
cidate the manifestation of simplicity bias in the random logistic map
under various conditions and noise levels, thereby providing insights
into the behavior of these systems; second, to leverage these insights
to improve the theoretical foundations and practical applications of
machine learning in analyzing and predicting dynamical systems.
We posit that the intersection of dynamical systems, algorithmic
information theory, and machine learning represents a fertile ground
for advancing our understanding of complex systems. By studying the
random logistic map – a prototypical example in dynamical systems
theory – we aim to uncover new principles and methods that can
inform both the development of machine learning algorithms and the
theoretical underpinnings of algorithmic information theory. In doing
so, we hope to provide a more robust framework for series prediction
and analysis, particularly in contexts where data exhibit complex,
chaotic, or noisy behavior.
Ultimately, this paper seeks to not only advance the theoretical
discourse in these domains but also to provide a robust set of tools
and insights that can be employed in practical applications. Whether
it be in financial modeling, climate prediction, or understanding bio-
logical systems, the implications of this work aim to be far-reaching,
contributing to better predictive models and a deeper understanding of
the simplicity and complexity inherent in natural and artificial systems.
In this work, we extend the applications of simplicity bias to ran-
dom dynamical systems, and in particular the (digitized) trajectories
of the logistic map with additive noise [41]. Our work is part of
a broader research project into the relevance of simplicity bias and
algorithmic probability to dynamical systems and time series forecast-
ing [27,42,43], and indeed to mathematics and science more generally.
Furthermore, this study should be viewed in the same context as that
presented in Ref. [44], where Martin-Löf randomness was employed to
examine Brownian motions, thereby demonstrating the utility of Algo-
rithmic Information Theory (AIT) tools in analyzing random dynamical
systems.
Most results in AIT, and the original algorithmic probability formu-
lation, are given in terms of binary strings. While AIT is not restricted
only to applications involving binary strings, applications involving
binary strings are in a sense more natural and arguably easier to
implement. Hence, in this study, we will digitize trajectories so as to
study binary string trajectories (Cf. [27,43,45]). We begin by studying
simplicity bias in the random logistic map and then investigate the
effects of adding measurement noise also. The following section looks at
complexity-probability graphs and simplicity bias in the noise-induced
regime of the logistic map. Finally, we look at the logistic map in the
context of the counter-intuitive claim of induction based on algorithmic
probability, that sometimes more data, all of which follows the same
trend, can result in
less
confidence in our ability to extrapolate the
trend.
2. Backgroundandproblemsetup
2.1. Backgroundtheoryandpertinentresults
2.1.1. AITandKolmogorovcomplexity
At the intersection of computer science and information theory lies
algorithmic information theory
[21–23] (AIT). AIT concerns the infor-
mation content of individual objects, such as binary strings, integers,
and discrete geometries [40]. The central quantity of AIT is
Kolmogorov
complexity
,
(
)
, which quantifies the shortest length of a program to
produce some object
. Alternatively
(
)
can be thought of as the size
in bits of the compressed version of
(assuming a perfect compressor).
An object
which has a complex or random pattern is hard to compress,
and so has a high Kolmogorov complexity, while an object
containing
simple repeating patterns like
= 01010 ... 10101
is highly compressible
and so has a low Kolmogorov complexity value.
Because we will not be using or invoking detailed results from AIT,
we will not go into details here giving formal definitions and proofs.
There are many standard texts that the interested reader can refer to if
needed, e.g., Refs. [40,46–48].
2.1.2. Thecodingtheoremandalgorithmicprobability
An important result in AIT is the coding theorem [25], which estab-
lishes a connection between
(
)
and probability prediction estimates.
It states that
(
) = 2
(
)+
(1)
(1)
where
(
)
is the probability that an output
is generated by a generic
computing device (technically a prefix optimal universal Turing ma-
chine) when fed with a random binary program. Thus, high complexity
outputs have exponentially low probability, and simple outputs with
small
(
)
values must have high probability.
(
)
is also known as the
algorithmic probability
of
(see Ref. [26] for an overview). AIT results
are often difficult to apply directly in real-world contexts, due to a
number of issues including the fact that
(
)
is formally uncomputable.
2.1.3. Thesimplicitybiasbound
While the original theory regarding algorithmic probability is beau-
tiful and powerful, as mentioned, it is not easy to directly apply the
theory to practical real-world problems. However, several studies have
tried to apply numerical approximations of the theory to concrete
problems [31,42,49]. Such applications have led to the observation of a
phenomenon called
simplicitybias
[31]. Simplicity bias has three levels
of meaning, or we could say, precision:
(i) A loose general inverse relation between probability and complex-
ity
(ii) A scatter plot of complexity vs. probability showing a roughly
linear upper bound decay in log probability with increasing com-
plexity
(iii) A complexity-probability plot as in (ii) but with the linear upper
bound following the simplicity bias bound Eq. (2) (see below)
with
≈ 1
If any of these are observed in a complexity-probability scatter plot,
then it is considered as simplicity bias. The simplicity bias bound was
presented in Ref. [31], and has the form
(
)
2
̃
(
)−
(2)
where
(
)
is the probability of observing output
on random choice
of inputs, and
̃
(
)
is the estimated Kolmogorov complexity of the
Physica D: Nonlinear Phenomena 463 (2024) 134160
3
B. Hamzi and K. Dingle
output
. The bound says that complex outputs from input–output
maps have lower probabilities, while high-probability outputs must be
simpler. The constants
푎 >
0
and
can be fit with sampling and often
even predicted without recourse to sampling [31]. Typically we assume
that
≈ 1
, which is the theoretical prediction. However, sometimes
in practice, it can be hard to correctly estimate
due to insufficient
sampling. We will also assume throughout this work that
= 0
in
Eq. (2), which is a default assumption as argued and discussed in
Ref. [31].
There is also a conditional version of the simplicity bias equa-
tion [50]. Because Eq. (2) is an upper bound, it allows for output
patterns to fall below the bound. See Refs. [32,51] for more on these
types of patterns.
Simplicity bias and the related bound have been found to apply in
many systems including RNA shapes [31,38], protein shapes [50,52],
models of plant growth [31], ordinary differential equation solution
profiles [31], finite state transducers [32], natural time series [43],
and others. In these systems, the probability of different output shapes,
upon a random sampling of inputs, was non-trivially bounded and
thereby predicted by Eq. (2).
The main motivations for studying simplicity bias are (a) uncovering
a ‘universal’ property of input–output maps; (b) identifying a basic and
general mechanism for the appearance of simplicity and symmetry in
nature; and (c) exploring a fundamentally different way to predict the
probability of output shapes as compared to standard methods like
frequency sampling. Simplicity bias means that some predictions about
output probabilities can be made directly by estimating the complexity
of outputs, largely independently of the underlying system mechanisms
and without having to make frequency counts.
2.2. Digitizedmaptrajectories
AIT results, including algorithmic probability, are typically given in
terms of random inputs or ‘programs’ which are computed to produce
output strings or patterns. Further, these results of AIT are normally
given in terms of binary strings. While dynamical systems may not
appear to fit this input–output framework, they can be viewed as input–
output functions if the initial value and parameters are considered input
‘programs’ which are then computed to generate ‘output’ dynamical
system trajectories. As above, because output pattern complexities and
probabilities are more straightforwardly calculated if outputs are binary
strings, we will digitize the real-valued trajectories into binary strings.
In Ref. [27] we explored simplicity bias in the standard determinis-
tic logistic map with equation
+1
=
휇푥
(1 −
)
(3)
where the inputs are the pair of values
0
(0.0, 1.0) and
(0.0, 4.0],
and
= 1
,
2
,
3
,
...
,푛
. Here, we will extend the analysis to the logistic
map with added noise, as studied by Sato et al. [53]. This random
logistic map has the form
+1
=
휇푥
(1 −
) +
(4)
where
is an i.i.d. random number sampled uniformly from
[−
휖,휖
]
,
so that
determines the magnitude of the noise.
The random logistic map can be viewed as an input–output map
because the values of
and
0
are like ‘inputs’ which are computed (in
addition to the noise) to produce a binary sequence trajectory ‘outputs’.
In more detail, the map is first iterated to obtain a sequence of
real values
1
,푥
2
,푥
3
,
...
,푥
in [0,1]. Similar to the field of symbolic
dynamics [54], the real-valued trajectory is digitized to become a
binary string output sequence
by applying a threshold, writing 1
if
0
.
5
and 0 if
<
0
.
5
[27,45,55]. Hence a binary sequence
output
of
bits is generated for each input pair
and
0
, and random
realization of
for
= 1
,
2
,
3
,
...
,푛
.
For example, some choice of
,
0
, and
after iterating Eq. (4) with
= 1
,
2
,
...
,
25
might yield the real-valued trajectory
1
,푥
2
,
...
,푥
25
= 0
.
12
,
0
.
47
,
0
.
66
,
...
,
0
.
21
,
0
.
05
,
0
.
78
,
0
.
97
which after digitization becomes the binary string
= 001 ... 0011
(5)
Following Ref. [27], we will use
= 25
, but see the same reference
for the impact of larger or smaller
. In brief,
should be large
enough to yield many different patterns and complexity values, but
small enough so that decent frequency and probability estimates can
be made for up to
2
different strings without the need for excessive
sampling.
Simulations will proceed by fixing some choice of
and
, and then
randomly sampling
0
∈ (0
.
0
,
1
.
0)
, and iterating for
= 25
steps. Each
sample will yield a binary string
∈ {0
,
1}
, i.e., a binary string of
length
. By counting the relative frequency of strings
, we can find
(
)
, the probability of each string
. The estimated complexity
̃
(
)
of
each string can be calculated using a compression-based metric, based
on the Lempel–Ziv 1976 complexity measure [56] (see Appendix for
more details on the complexity measure). Using the values of
(
)
and
̃
(
)
for each binary string, we make a scatter plot of these
quantities and examine the connection (if any) between complexity and
probability.
3. Results
3.1. Simplicitybiasintherandomlogisticmap
In this study, we are primarily interested in
-values for which the
trajectories of the deterministic map converge quite quickly to fairly
trivial fixed points or periods. If instead
≈ 4
.
0
then it is known that
there is no simplicity bias in trajectories [27], so adding noise will
only serve to increase the complexity and variability of the trajectories,
making any kind of bias and therefore simplicity bias, not appear.
Note that we use the word ‘bias’ to describe a strongly non-uniform
(low entropy) probability distribution, such that certain binary string
outputs are much more likely the others.
It is known that if
∈ [0
.
0
,
1
.
0]
, then the trajectories of the
deterministic logistic map tend to 0 [57]. Because we are adding noise
to the trajectories, the trajectories will be prevented from becoming
arbitrarily close to zero. If the noise is small enough, then
≈ 0
<
0
.
5
so that the resulting digitized binary string (after truncating at 0.5) will
be
=
0000
. If the noise is larger, then it may be that
>
0
.
5
for some
, so that more varied patterns including 1s can emerge. For
∈ (1
.
0
,
3
.
0]
the trajectories of the deterministic logistic map tend to a
fixed point with value
1−(1∕
)
[58]. If
= 0
.
0
, then for any initial value
of
0
the trajectories will tend to be either
= ... 000
or
= ... 111
depending on
, which are trivial patterns. Not only are they trivial,
they are not varied enough to have different complexity and probability
values, and so the corresponding complexity-probability plot is trivial.
Therefore, simplicity bias cannot emerge in this scenario, at least if we
ignore the initial transient dynamics (but see also Ref. [27] for more on
this). However, if we add noise then different and non-trivial patterns
can emerge.
The parameter values used in the numerical experiments are
= 0
.
0
,
1.0, 2.0, and 3.0 in combination with
= 0
.
125
, 0.25, 0.375, and 0.5.
For many of these parameter combinations, we can expect no simplicity
bias in the binary trajectories, due to the following: If the noise
is large, then the trajectory is essentially a random erratic pattern
across the interval [0,1] with little correlation between adjacent bits,
yielding a roughly uniform distribution over strings. Therefore, there
is unlikely to be bias for some specific patterns, and consequently no
bias for simple patterns. On the other hand, if
is very small, then the
‘kicks’ imposed by the noise will not be large enough to prevent the
trajectories from converging. As examples, if
= 1
.
5
and
≈ 0
, then
Physica D: Nonlinear Phenomena 463 (2024) 134160
4
B. Hamzi and K. Dingle
Fig. 1.
Simplicity bias in the digitized random logistic map. Each blue data point corresponds to a different binary digitized trajectory
of length 25 bits. The black line is the
upper bound of Eq.
( 2) with
= 1
as a default prediction. (a)
= 3
.
0
and
= 0
.
125
including transient dynamics; (b)
= 1
.
0
and
= 0
.
375
including transient dynamics; and (c)
= 1
.
0
and
= 0
.
375
excluding transient dynamics.
the trajectory values
will fluctuate around the value
1 − (1∕
) =
0
.
33
<
0
.
5
, yielding a binary trajectory
=
0000
. If
= 2
.
0
, then
the fixed point would be
1 − (1∕
) = 0
.
5
, so even small fluctuations
around this value would yield almost random binary strings
. From
this reasoning, we can conclude that to observe simplicity bias in the
random logistic map requires having a balance of noise that is not too
large as to overly randomize the trajectories, and not too small as to
allow the trajectories to converge to a fixed point with only modest
fluctuations around
1 − (1∕
)
.
Turning to numerical experiments, we make complexity-probability
plots for different values of
and noise magnitude values
. We sepa-
rately analyze the cases of including initial transient dynamics before
the trajectories settle, and ignoring the transient dynamics by excluding
the first 50 iterations of the map. Ignoring the initial iterations is com-
mon practice in the mathematical physics community [
58]. However,
from the machine learning perspective, the initial transient dynamics
where a larger fraction of the space is sampled is more important for
statistical learning. So we will consider both regimes.
Fig. 1 shows some examples of simplicity bias. The different binary
string outputs that appeared in sampling are shown as blue dots. The
black line bound of Eq.
( 2) is also depicted with
= 1
(the default
prediction).
Fig. 1 (a) shows
= 3
.
0
and
= 0
.
125
, and the initial
dynamics are included. There is simplicity bias because we see that
some low complexity patterns have high probability. Further, we see
a roughly linear upper bound decay in the probabilities, as complexity
̃
(
)
increases. However, the gradient appears to be less steep than
the
= 1
.
0
default prediction suggests. The slope of the upper bound
is
0.27. The reason that the slope is not accurate may be due to
undersampling because estimating the slope
assumes that nearly all of
the possible strings have appeared in the sample, which may be overly
coarse. Another reason may be due to the introduction of random noise
(see next section for more on this). Despite these considerations, it is
still clear that simplicity bias is observed. Without the additive noise,
the trajectory would have converged to
1 − 1∕
= 0
.
67
and shown no
simplicity bias.
In Fig. 1 (b) we see data for
= 1
.
0
and
= 0
.
375
, and the initial
dynamics are included. Again there is clear simplicity bias. Without the
additive noise, the trajectory would have converged to
1−1∕
= 0
.
0
and
shown no simplicity bias. In addition to the inverse relation between
probability and complexity, even the gradient of the decay of the upper
bound is not strongly divergent from the black line default prediction.
The slope of the upper bound is
0.32. In
Fig. 1 (c) we see data again for
= 1
.
0
and
= 0
.
375
, but this time after ignoring the first 50 iterations
(to exclude the transient dynamics). Here we see simplicity bias in the
form of an inverse relation between probability and complexity. Having
said that, the simplicity bias in panel (c) is not as pronounced as in
panel (b). The slope of the upper bound is
0.37.
For each complexity value, we see outputs with a range of probabil-
ities. That is, there are some patterns that are simple, yet have very low
probability values. These low complexity, low probability outputs have
been explored in detail earlier [
31,51]. However, it is known that these
low complexity, low probability outputs collectively do not account for
much of the probability mass [
32].
In conclusion, we see that for some parameter values, simplicity
bias is observed in the random logistic map trajectories, even while
the deterministic counterparts with the same
values would have
converged to fixed points and not shown simplicity bias.
3.2. Addingnoisetothestrings
There are different ways in which noise can be included in the mod-
eling of the random dynamical system. Above, we included additive
noise in the dynamics themselves, to model the scenario in which the
dynamics themselves are not deterministic but perturbed stochastically.
In this section, we now consider another type of noise, measurement
noise, which models the scenario that when measuring the trajectory,
we do so inaccurately.
It is important to study if simplicity bias is maintained under
different types of perturbation, because if it does not, then it may not
be a physically relevant body of theory. This is related to arguments
supporting the study of random dynamical systems, over deterministic
systems, because some results from chaos theory disappear in the pres-
ence of even small noise, suggesting that they are not relevant to the
study of physical dynamical systems (which are inherently stochastic,
typically).
We introduce measurement noise as follows: If
1
2
3
...
is a
trajectory of real-values, then we replace each
with
+
, where
is some small uniform noise in
[−
훿,훿
]
. After adding the noise, we
use the 0.5 threshold as above and digitize the noisy trajectory. We
can then, as above, make a complexity-probability plot and look for
simplicity bias.
In Fig. 2 we see the effects of adding measurement noise. For
illustration purposes, we use the same values of
and
as in Fig. 1 , and
increasing values of
. In Fig. 2 (a), a very small noise level of
= 0
.
01
is used, and hence the simplicity bias is still clearly observed. In (b), a
larger noise level of
= 0
.
17
is used, and we see that the simplicity bias
still remains, but the slope of decay in log probability with increasing
complexity is slightly less steep as compared to (a). Finally, in (c), a
larger noise level of
= 0
.
45
is used, and the bias and simplicity bias
has almost completely disappeared with only a modest slope decay in
log probability. The slopes of the upper bounds are
0.12,
0.10, and
Physica D: Nonlinear Phenomena 463 (2024) 134160
5
B. Hamzi and K. Dingle
Fig. 2.
Simplicity bias with measurement noise. In (a), a very small amount of measurement noise (
= 0
.
01
) is added, and simplicity bias is clear. In (b) a larger amount of noise
is added (
= 0
.
17
), and simplicity bias remains. In (c) a larger value of noise is added (
= 0
.
45
), and the slope becomes much less steep, thus weakening the probability-complexity
connection.
0.04 respectively. In sum, the simplicity bias gradually disappears
with increasing
via the slope gradually becoming more horizontal.
It is noteworthy that in some earlier studies of simplicity bias, the
slope of the decay in log probability was observed to be considerably
less steep than the upper bound of Eq.
( 2) but it was not stated why
this was the case [
32,43]. The results here suggest a simple explanation
for this observation, i.e., noise (in some form) corrupts the simplicity
bias, adjusting the slope.
3.3. Noise-inducedchaosaround
= 3
.
83
For a large fraction of values of
3
.
56994567
...
≈ 3
.
57
(https:
//oeis.org/A098587
), the trajectories of the deterministic logistic map
are chaotic [
57]. However, there are also islands of stability for some
∈ (3
.
56994567
...,
4
.
0]
, in which the dynamics are simple and periodic.
One example is when
= 3
.
83
, which has an attracting period-3
trajectory in the deterministic case. It has been known for many years,
however, that with a small amount of noise added to the random
logistic map, noise-induced chaos can appear [
30,59]. That is, with
= 3
.
83
it is known that
≈ 0
.
00146
is the critical noise amplitude
at which the chaotic attractor appears. At this value, the Lyapunov
exponent becomes positive.
We will study this noise-induced chaos in terms of simplicity bias.
Above, we saw that simple fixed point dynamics combined with addi-
tive noise can lead to simplicity bias in the trajectories. This suggests
that the simple period-3 trajectory when
= 3
.
83
may show simplicity
bias when combined with additive noise of
= 0
.
00146
. In contrast, for
the chaotic regime, the logistic map does not exhibit bias, nor simplicity
bias [
27], because no particular patterns are much more likely (and
therefore no simple patterns, either). Therefore, we will also examine
= 0
.
00146
combined with
= 3
.
82
and
= 3
.
84
which potentially
will not show simplicity bias, assuming that the dynamics are chaotic
for these values.
We perform numerical experiments as above, digitizing the trajec-
tories.
Fig. 3 (a) shows the complexity-probability plot for
= 3
.
82
,
and there is no evidence of simplicity bias, as expected. There is some
bias in the distribution, as inferred from the fact that some binary
strings have probability
≈ 10
−3
, while some others have probability
≈ 10
−5
. However, the connection to complexity is very weak. In (b)
however, with
= 3
.
83
there is clear evidence of simplicity bias, with a
linear upper-bound decay in log probability with increasing complexity.
The slope of the decay is not the same as the black line bound, but
somewhat less steep. In
Fig. 3 (c), interestingly with
= 3
.
84
there is
still simplicity bias, but not quite as clear as in (b) as evidenced by the
less clearly linear upper bound.
While there are already other ways to detect and measure noise-
induced chaos [
30], this perspective of simplicity bias also provides
another perspective.
3.4. Moredatadoesnotalwaysgivemoreconfidenceinpredictions
In the preceding sections, we looked at how simplicity bias, which
is a form of algorithmic probability, can be used to predict or bound
the probability of patterns in time series from the logistic map. Another
application of algorithmic probability, and in fact one of the original
motivations for its development, is for inference or induction of series.
Solomonoff [
21,24] was interested in questions like, given a (finite)
binary series
representing a historical series, what is the best pre-
diction for the next bit in the series? More quantitatively, what is the
probability that the next bit is a zero,
(0
|
)
, and what is the probability
the next bit is a 1,
(1
|
)
? If the underlying process generating the
series is known, e.g., it is a Bernoulli process, then standard statistical
theory can give methods to estimate the probabilities of
(0
|
)
and
(1
|
)
. In many practical situations, however, the underlying process is
not known, which motivates the development of forecasting methods
which do not depend on knowing the process. So, is there a general
way to predict time series, which does not require (strong) assumptions
or knowledge about the process that made the series? Solomonoff
induction [
40] is the method proposed by AIT, which is essentially
prediction based on algorithmic probability.
Hutter [
60] has also argued that algorithmic probability should be
central to artificial intelligence and the theory of machine learning.
The main problem with algorithmic probability, and hence Solomonoff
induction, is that it is formally uncomputable [
40], so cannot be applied
directly in practice (or at least not straightforwardly). Indeed, some
have argued that this makes the theory almost useless [
61]. However,
developing the theory is useful because some aspects can shed light on
real-world problems, and some of the theory can be approximated and
thereby applied.
Here we will consider one rather surprising implication of Solomo-
noff induction: sometimes with
more
data – even if the data points all
conform to one trend – we should be
less
confident in our ability to
predict the series (see Li and Vitanyi [
40] for a mention of this result,
and more in Hutter [
62]). We will see how this result can apply to the
logistic map. To our knowledge, the logistic map example presented
below is the first concrete example illustrating this implication.
3.4.1. Theory
Suppose we observe some binary series, which happens to be just
1 repeated
times, i.e.,
1
= 111
...111. What should be our prediction
for the next bit? And how confident should we be of our prediction?
Laplace famously argued that if there are
1s out of
binary obser-
vations, then we should predict that the next bit is 1 with probability
(1
|
111
...
111) =
+ 1
+ 2
=
+ 1
+ 2
= 1 −
1
+ 2
≈ 1 − 1∕
(6)
Physica D: Nonlinear Phenomena 463 (2024) 134160
6
B. Hamzi and K. Dingle
Fig. 3.
Complexity-probability plots for
≈ 3
.
83
. (a) With
= 3
.
82
there is no simplicity bias; (b) with
= 3
.
83
there is clear simplicity bias with a roughly linear upper bound
decay; and (c) with
= 3
.
84
there is still simplicity bias, but the linear upper bound decay is more noisy than with
= 3
.
83
.
where we have set
=
because all observations are ‘1’. According
to Laplace then, if we observe more data (larger
) all of which
conforms to the same trend (all 1s), then we should have greater
confidence that the next bit will also be 1 (because
1 − 1∕
is closer to
one). This reasoning also accords with common-sense statistics. How-
ever, using a general form of inference based on Solomonoff induction
and algorithmic probability, we do not always arrive at the same
conclusion.
Solomonoff induction (or prediction) is based on imagining that a
computer has been fed a random input program, and it produces an
infinite binary string output. We observe the first
bits of the output as
a binary string
, and we try to predict the probability that the next bit
is a 0 or a 1. Technically this computer should be a monotone universal
Turing machine, but we will not explore the details of this because they
are not essential to understanding our example; see [
26,40,62 ] for more
details. We will write
(
)
for the probability that the infinite string
begins with
, and write the conditional probability as
(
+1
|
1
2
3
...
) =
(
1
2
3
...
+1
)
(
1
2
3
...
)
(7)
Essentially,
(
+1
|
1
2
3
...
)
gives the conditional probability pre-
diction based on compressing the sequence
1
2
3
...
+1
, and as-
signs the predicted probabilities of
(0
|
1
2
3
...
)
and
(1
|
1
2
3
...
)
based on how well the following bit (0 or 1) fits the preceding se-
quence, as measured by how well the following bit (0 or 1) compresses
the sequence.
Using algorithmic probability now as a method for induction or
prediction, the probability that the next bit in the series after observing
= 111 ... 111
is 1 is given by [
62]
(1
|
111
...
111) = 1 − 2
(
)+
(1)
(8)
For most
, or typical
, it is known that the complexity of an integer
is given by
(
) ≈ log
2
(
)
to first order [
40], so that
(1
|
111
...
111) = 1 − 2
(
)+
(1)
∼ 1 − 2
− log
2
(
)
= 1 − 1∕
(9)
roughly agreeing with Laplace.
A difference in predictions between the Laplacian and algorithmic
probability approaches can arise when certain values of
are encoun-
tered, which are ‘special’, and therefore (algorithmically) ‘simple’. Such
values of
are those that can be described succinctly and unambigu-
ously. For example, if
= 2
2
2
2
2
≈ 2 × 10
19728
is written out in full
it has nearly 20,000 digits (in base 10), while at the same time we
can describe it very succinctly in just a few characters using a stacked
exponential. Therefore we say that
is simple, and the Kolmogorov
complexity of this value
is low. As another example of ‘special’ or
‘simple’ integers, consider the factorial
= (8!)! ≈ 3 × 10
168186
. This
integer has nearly 170,000 digits if written out explicitly in full, yet can
be described exactly in just a few characters. There are many ‘special’
and algorithmically simple numbers, but nonetheless, they are rare in
the sense that a vanishingly small fraction of integers are special or
simple [
40].
For such ‘special’ or ‘simple’ numbers
, the Kolmogorov complexity
is small. It follows that for a simple
which has
(
)
log
2
(
)
, the
predicted probability for the next bit is
(1
|
111
...
111) ≈ 1 − 2
(
)+
(1)
<
1 − 1∕
(10)
which can be substantially different from Laplace’s estimate for large
enough
. This is not to say that algorithmic probability would predict a
0 in place of a 1 as the next bit, but instead, it means that the prediction
that the next bit is 1 – for this
– is less confident (further from 1)
than it would be for a typical
. This is a kind of quantification of the
intuition that typically we do not expect there to be a change in a series
after a certain pattern has been followed for a long time (e.g., 1s to
change to 0s), but on ‘special occasions’ when
is somehow special,
then it is more likely that something will ‘happen’ (i.e., a change).
Rather surprisingly, what this means also is that for some few spe-
cial values of
, more data does not necessarily mean higher confidence
in our predictions: consider
is some large random integer such that
(
) ≈ log
2
(
)
, while
is some other simple integer such that
> 푛
.
It follows that
(
)
log
2
(
)
<
log
2
(
)
(1
|
1
)
> 푀
(1
|
1
)
(11)
Thus, even though we have observed
more 1s repeated, we are
less
confident in our prediction that the next bit is 1, because
(1
|
1
)
is closer to 1 than
(1
|
1
)
.
3.4.2. Example
We will now consider how this somewhat counterintuitive result
from algorithmic probability theory applies to the logistic map. We will
construct an example of when the trend of just 0s changes after a long
time and becomes a string of 1s at an algorithmically simple value of
.
Suppose we choose
= 2
.
5
,
= 0
.
0
, and
(
0
)
−1
= 2
2
2
2
2
so that
0
10
−19728
, then this series begins with
0
<
0
.
5
. Because
= 2
.
5
, the fixed
point which the trajectory approaches is
1−1∕
= 1−1∕2
.
5 = 0
.
6
>
0
.
5
.
Therefore, eventually, the digitized series will change from 0s to 1s.
Call
the integer at which the series
0000
...
011111
transitions from 0s
to 1s.
Because
0
≈ 0
and
= 0
.
0
, we have
(1−
) ≈ 1
for
≈ 1
so that the
logistic equation
( 3) simplifies to
+1
≈ 2
.
5
, or
+1
≈ 2
.
5
0
. This
implies that when
≈ 0
the
terms grow exponentially fast initially
(but slows as
gets larger). Even with exponential growth,
must be
Physica D: Nonlinear Phenomena 463 (2024) 134160
7
B. Hamzi and K. Dingle
a large number because
0
is so small, indeed because
10
19728
≈ 2
.
5
49575
,
the number of iterations
required to reach the 0.5 threshold is at least
of the order of 50,000.
So, having observed a series of
0s, according to Laplace’s induc-
tion method, the probability that the next bit is also 0 will be roughly
estimated as
(0
|
000
...
000) =
+ 1
+ 2
=
+ 1
+ 2
= 1−1∕
>
1−1∕50000 = 0
.
99998
(12)
so that
(1
|
000
...
000)
<
0
.
00002
. Therefore, the sudden change in the
logistic map from 0s to 1s at
would come as rather a ‘surprise’ for
Laplace.
In contrast, using algorithmic probability we have the following
reasoning: Because
0
is an algorithmically simple (small) number, and
is a simple number also, and the logistic equation (3) can be written
in only a few characters also, and because the value of
can be derived
via a simple function of these values (e.g., by iterating the logistic map
to find the value of
), then this implies that
is a special number
and
(
)
is small.
1
Therefore we have
(
)
<
log
2
(
)
(0
|
000
...
000) = 1 − 2
(
)+
(1)
<
1 − 1∕
(13)
so that
(1
|
000
...
000)
>
1∕
, so that the change from 0s to 1s is less of
a ‘surprise’ if using the algorithmic probability approach. It follows also
that we are less confident about predicting the next bit in the sequence
1
= 111 ... 111
, as compared to predicting the next bit in the sequence
111 ... 111
where the length of the sequence is some algorithmically
random but slightly smaller value than
.
Note that the algorithmic probability estimate does not give an exact
figure for the probability of changing to 1s after the 0s (which would
depend on the specific computer or language used), but it does reveal
that the probability prediction can be significantly different when
is simple compared to typical values of
. See the Appendix for an
example that does not depend on the specific computer or language
used.
4. Discussion
Arguments inspired by algorithmic information theory (AIT) have
been used to predict the occurrence of
simplicity bias
in many real-
world input–output maps, where complex patterns have exponentially
low probabilities, and high probability patterns are simple [31]. This
phenomenon has been observed in a very wide range of systems,
including RNA shapes [31,38], protein shapes [50,52], models of plant
growth [31], ordinary differential equation solution profiles [31], fi-
nite state transducer [32], natural time series [43], deep neural net-
works [33], 1D dynamical systems [27], and others. Relatedly, Zenil
and Delahaye [42] numerically studied the frequencies by which very
short (
5 bits) binary string motifs appear in nature, namely in hard
drives, DNA fragments, and real-world image data. They found a cor-
relation between these short binary string motifs, and the frequencies
with which the same motifs appeared from generic computation de-
vices (e.g., cellular automata). Their investigation showed that some
simpler binary strings occurred with higher probability, hence a form
of simplicity bias in natural data.
In this work, we have numerically investigated the presence of
simplicity bias in digitized trajectories of the random logistic map.
This current work sits in the context of a wider research program
of investigating the ‘universal’ presence of simplicity bias in input–
output maps, which can aid in an understanding of the origins of
symmetry and simplicity in nature, and also serve the practical aim of
aiding
apriori
probability predictions, based on directly measuring the
complexity of patterns (instead of e.g., sampling).
1
Technically, the value of the complexity
(
)
will depend on the specific
computer of language used, but we ignore this for this example. Later we give
an example which does not depend on these.
Our main conclusions are that (i) we observe simplicity bias in
the random logistic map for some restricted set of parameter values;
(ii) adding measurement noise to trajectories gradually erases sim-
plicity bias by flattening the slope in complexity-probability plots;
(iii) the regime of noise-induced chaos when
= 3
.
83
shows-up in
complexity-probability plots, and (iv) algorithmic probability-based in-
duction shows how sometimes in the logistic map trajectories, counter-
intuitively more data does not improve our confidence in extrapolating
the trend.
The presence of a bias towards simplicity in this example of the
logistic map with additive noise could be explained in a mechanistic
fashion, and indeed many of the other examples of simplicity bias may
admit mechanistic explanations based on the details of the specific
way inputs are assigned to outputs. Despite this, we propose that
it is beneficial to explain the observed simplicity bias from another
perspective entirely, that of information theory and the simplicity bias
bound Eq. (2). This latter perspective forms a unifying theory to unite
the disparate observations in different maps. These different types of
explanations are complementary.
It is noteworthy that digitized logistic map trajectories have been
studied in terms of Lempel–Ziv complexity by Kaspar and Schus-
ter [55], and in this sense is related to our work. Despite this, their
work is fundamentally different from ours because it is not concerned
with simplicity bias or in estimating the probability of different outputs.
Also, Zenil et al. [63] have studied discrete dynamical systems of
cellular automata from the perspective of algorithmic information the-
ory, including reconstructing the space–time evolution of the systems.
However, their work is different several respects, including that they
did not look at the (random) logistic map in terms of simplicity bias.
The theory of algorithmic probability was developed several decades
ago [21,24,25,64], and is important and potentially widely applicable
in science and mathematics [26,40,60,65,66]. Applications of the field
have been held back, partly at least, by the difficulty of applying the
theory due to the incomputability of important quantities. However, it
has been argued that algorithmic probability predictions should ‘‘serve
as ‘gold standards’ that practitioners should aim at ... [but which] have
to be (crudely) approximated in practice’’ [62]. The simplicity bias
bound, including the results here, can be seen in this light. Further,
many other researchers have built on and applied the compression-
learning-prediction link [67]. Indeed, very recently it has been argued
that even the currently highly influential ‘‘large language models
[e.g., ChatGPT] are powerful general-purpose predictors and that the
compression viewpoint provides novel insights’’ [68]. It remains to be
seen in the near future whether this intriguing compression approach
provides valuable outcomes for machine learning.
In future work, we suggest continuing the application of (approx-
imations to) algorithmic probability to develop novel approaches to
probability predictions and advance statistical and machine learning
methods.
5. Conclusion
In this study, we have demonstrated the presence of simplicity
bias within the random logistic map, affirming its role across a range
of dynamic systems. We observed that simplicity bias persists within
specific parameter regimes but is gradually diminished by measurement
noise, particularly within noise-induced chaos regimes. These findings
not only reinforce the intricacies of dynamical systems but also em-
phasize the counterintuitive nature of algorithmic probability-based
predictions.
Our research underscores the value of interpreting simplicity bias
through an information-theoretic lens, offering a unifying framework
that transcends specific system mechanics. This perspective aligns with
broader efforts to employ algorithmic probability as a theoretical stan-
dard in understanding complex systems, despite its practical challenges.
As we look forward, the application of algorithmic probability and
related concepts remain a promising frontier for advancing prediction
Physica D: Nonlinear Phenomena 463 (2024) 134160
8
B. Hamzi and K. Dingle
and analysis methods in various fields. By continuing to bridge theory
with practical applications, we aim to deepen our understanding of the
principles that govern simplicity and complexity in the natural world.
CRediTauthorshipcontributionstatement
Boumediene Hamzi:
Methodology, Investigation, Conceptualiza-
tion.
Kamaludin Dingle:
Methodology, Investigation, Conceptualiza-
tion.
Declarationofcompetinginterest
The authors declare that the work presented in this paper is original
and has not been published elsewhere in any form or language (par-
tially or in full), except in abstract format for conferences. This work
is the result of our research conducted primarily at the Gulf University
for Science and Technology and has been partially supported by project
code: ISG Case 9. We have no conflicts of interest to disclose and
affirm that all sources used have been appropriately credited follow-
ing ethical research practices. Furthermore, we agree to the terms of
submission and publication in the journal to which we are submitting,
and we confirm that all co-authors have approved the manuscript for
submission.
Dataavailability
No data was used for the research described in the article.
Acknowledgments
BH thanks Prof. Jeroen Lamb (Imperial College London) for use-
ful discussions about random dynamical systems that inspired the
work in this paper. BH acknowledge support from the Jet Propulsion
Laboratory, USA, California Institute of Technology, USA, under a
contract with the National Aeronautics and Space Administration and
from Beyond Limits (Learning Optimal Models) through CAST (The
Caltech Center for Autonomous Systems and Technologies). KD thanks
Muhammad Alaskandarani for useful discussions and work on the early
parts of this work. This work has been partially supported by the Gulf
University for Science and Technology, including by project code: ISG
Case 9.
AppendixA. Estimatingpatterncomplexity
To estimate complexity, we follow Ref. [31] and use
퐿푍
(
) =
{
log
2
(
)
,
= 0
or
1
log
2
(
)[
(
1
...푥
) +
(
...푥
1
)]∕2
,
otherwise
(14)
where
(
)
comes from the 1976 Lempel and Ziv complexity mea-
sure [56], and where the simplest strings
0
and
1
are separated out
because
(
)
assigns complexity
= 1
to the string 0 or 1, but
complexity 2 to
0
or
1
for
2
, whereas the true Kolmogorov
complexity of such a trivial string actually scales as
log
2
(
)
for typical
because one only needs to encode
. Having said that, the minimum
possible value is
(
) ≈ 0
for a simple set, and so e.g. for binary
strings of length
, we can expect
0
(
)
bits. Because for a
random string of length
the value
퐿푍
(
)
is often much larger than
,
especially for short strings, we scale the complexity so that
in Eq. (2)
is set to
= 1
via
̃
(
) = log
2
(
)
퐿푍
(
) − min
(
퐿푍
)
max
(
퐿푍
) − min
(
퐿푍
)
(15)
where
is the maximum possible number of output patterns in the
system, and the min and max complexities are over all strings
which the map can generate.
̃
(
)
is the approximation to Kolmogorov
complexity that we use throughout. This scaling results in
0
̃
(
)
which is the desirable range of values.
Appendix B. Comparison of Laplace and algorithmic probability
forinduction
Here we give another example in which Laplace’s induction method
gives different conclusions as compared to induction based on algorith-
mic probability.
In the logistic map, assume that
= 2
.
5
and
= 0
.
0
. As in the main
text, if we assume that
0
≈ 0
, then the digitized trajectory will begin as
zeros, and then change to 1s as the trajectory approaches
1−1∕
= 0
.
6
>
0.5. Define
0
to be the value such that the series of 0s changes to 1s
after
zeros are observed, where
is some positive integer. Because
the logistic map can be described in a few bits, and
= 2
.
5
and
= 0
.
0
can also be described in a few bits, then the largest contribution to the
complexity of the series comes from the value of
. We have
(
) =
(
) +
(1) ≈ log
2
(
)
(16)
for typical and large
. Therefore the algorithmic probability prediction
that the next bit after the zeros will be 1 is roughly
2
− log
2
(
)
∼ 1∕
.
In contrast, after observing
zeros, the Laplacian prediction for
the probability that the next bit is 1 is
1∕
which is substantially
smaller. Therefore the observation of a 1 is more ‘surprising’ from the
perspective of the Laplacian method.
References
[1] R. González-García, R. Rico-Martínez, I.G. Kevrekidis, Identification of distributed
parameter systems: A neural net based approach, Comput. Chem. Eng. 22 (1998)
S965–S968, European Symposium on Computer Aided Process Engineering-8.
[2] Ashesh Chattopadhyay, Pedram Hassanzadeh, Krishna V. Palem, Devika Subra-
manian, Data-driven prediction of a multi-scale lorenz 96 chaotic system using a
hierarchy of deep learning methods: Reservoir computing, ANN, and RNN-LSTM,
2019, CoRR abs/1906.08829.
[3] Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz, Discovering governing
equations from data by sparse identification of nonlinear dynamical systems,
Proc. Natl. Acad. Sci. 113 (15) (2016) 3932–3937.
[4] Jaideep Pathak, Zhixin Lu, Brian R. Hunt, Michelle Girvan, Edward Ott, Us-
ing machine learning to replicate chaotic attractors and calculate Lyapunov
exponents from data, Chaos 27 (12) (2017) 121102.
[5] A. Nielsen, Practical Time Series Analysis: Prediction with Statistics and Machine
Learning, O’Reilly Media, 2019.
[6] Alan A. Kaptanoglu, Kyle D. Morgan, Chris J. Hansen, Steven L. Brunton, Physics-
constrained, low-dimensional models for magnetohydrodynamics: First-principles
and data-driven approaches, Phys. Rev. E 104 (1) (2021) 015206.
[7] J. Nathan Kutz, Steven L. Brunton, Parsimony as the ultimate regularizer for
physics-informed machine learning, Nonlinear Dynam. (2022).
[8] B. Haasdonk, B. Hamzi, G. Santin, D. Wittwar, Kernel methods for center
manifold approximation and a weak data-based version of the center manifold
theorems, Physica D (2021).
[9] P. Giesl, B. Hamzi, M. Rasmussen, K. Webster, Approximation of Lyapunov
functions from noisy data, J. Comput. Dyn. (2019) https://arxiv.org/abs/1601.
01568.
[10] Boumediene Hamzi, Houman Owhadi, Learning dynamical systems from data: A
simple cross-validation perspective, part I: Parametric kernel flows, Physica D
421 (2021) 132817.
[11] Boumediene Hamzi, Fritz Colonius, Kernel methods for the approximation of
discrete-time linear autonomous and control systems, SN Appl. Sci. 1 (7) (2019)
674.
[12] Stefan Klus, Feliks Nuske, Boumediene Hamzi, Kernel-based approximation of
the Koopman generator and Schrödinger operator, Entropy 22 (2020) https:
//www.mdpi.com/1099-4300/22/7/722.
[13] Stefan Klus, Feliks Nüske, Sebastian Peitz, Jan-Hendrik Niemann, Cecilia
Clementi, Christof Schütte, Data-driven approximation of the Koopman generator:
Model reduction, system identification, and control, Physica D 406 (2020)
132416.
[14] Romeo Alexander, Dimitrios Giannakis, Operator-theoretic framework for fore-
casting nonlinear time series with kernel analog techniques, Physica D 409
(2020) 132520.
[15] Andreas Bittracher, Stefan Klus, Boumediene Hamzi, Peter Koltai, Christof
Schutte, Dimensionality reduction of complex metastable systems via kernel
embeddings of transition manifold, 2019, https://arxiv.org/abs/1904.08622.
[16] Jake Bouvrie, Boumediene Hamzi, Empirical estimators for stochastically forced
nonlinear systems: Observability, controllability and the invariant measure, in:
Proc. of the 2012 American Control Conference, 2012, pp. 294–301, https:
//arxiv.org/abs/1204.0563v1.
Physica D: Nonlinear Phenomena 463 (2024) 134160
9
B. Hamzi and K. Dingle
[17] Jake Bouvrie, Boumediene Hamzi, Kernel methods for the approximation of
nonlinear systems, SIAM J. Control Optim. (2017) https://arxiv.org/abs/1108.
2903.
[18] Jake Bouvrie, Boumediene Hamzi, Kernel methods for the approximation of
some key quantities of nonlinear systems, J. Comput. Dyn. 1 (2017) http:
//arxiv.org/abs/1204.0563.
[19] Boumediene Hamzi, Christian Kuehn, Sameh Mohamed, A note on kernel
methods for multiscale systems with critical transitions, Math. Methods Appl.
Sci. 42 (3) (2019) 907–917.
[20] David J.C. MacKay, Information Theory, Inference, and Learning Algorithms,
Copyright Cambridge University Press, 2003.
[21] R.J. Solomonoff, A preliminary report on a general theory of inductive inference
(revision of report V-131), Contract AF 49 (639) (1960) 376.
[22] A.N. Kolmogorov, Three approaches to the quantitative definition of information,
Probl. Inf. Transm. 1 (1) (1965) 1–7.
[23] Gregory J. Chaitin, A theory of program size formally identical to information
theory, J. ACM 22 (3) (1975) 329–340.
[24] Ray J. Solomonoff, A formal theory of inductive inference. Part I, Inf. Control 7
(1) (1964) 1–22.
[25] L.A. Levin, Laws of information conservation (nongrowth) and aspects of the
foundation of probability theory, Probl. Peredachi Inform. 10 (3) (1974) 30–35.
[26] Marcus Hutter, Shane Legg, Paul M.B. Vitanyi, Algorithmic probability,
Scholarpedia 2 (8) (2007) 2572.
[27] Kamaludin Dingle, Mohammad Alaskandarani, Boumediene Hamzi, Ard A. Louis,
Exploring simplicity bias in 1D dynamical systems, 2024, arXiv preprint arXiv:
2403.06989.
[28] Ludwig Arnold, Christopher KRT Jones, Konstantin Mischaikow, Geneviève
Raugel, Ludwig Arnold, Random Dynamical Systems, Springer, 1995.
[29] Kamaludin Dingle, Jeroen S.W. Lamb, Joan-Andreu Lázaro-Camí, Knudsen’s law
and random billiards in irrational triangles, Nonlinearity 26 (2) (2012) 369.
[30] G. Mayer-Kress, H. Haken, The influence of noise on the logistic model, J. Stat.
Phys. 26 (1981) 149–171.
[31] Kamaludin Dingle, Chico Q. Camargo, Ard A. Louis, Input–output maps are
strongly biased towards simple outputs, Nat. Commun. 9 (1) (2018) 761.
[32] Kamaludin Dingle, Guillermo Valle Pérez, Ard A. Louis, Generic predictions of
output probability based on complexities of inputs and outputs, Sci. Rep. 10 (1)
(2020) 1–9.
[33] Guillermo Valle-Perez, Chico Q. Camargo, Ard A. Louis, Deep learning generalizes
because the parameter-function map is biased towards simple functions, 2018,
arXiv preprint arXiv:1805.08522.
[34] Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio,
Vladimir Mikulik, Ard A. Louis, Neural networks are a priori biased towards
boolean functions with low entropy, 2019, arXiv preprint arXiv:1909.11522.
[35] Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom, Simplicity bias
in transformers and their ability to learn sparse boolean functions, 2022, arXiv
preprint arXiv:2211.12316.
[36] Greg Yang, Hadi Salman, A fine-grained spectral perspective on neural networks,
2019, arXiv preprint arXiv:1907.10599.
[37] Santiago Hernández-Orozco, Hector Zenil, Jürgen Riedel, Adam Uccello, Nar-
sis A Kiani, Jesper Tegnér, Algorithmic probability-guided machine learning on
non-differentiable spaces, Front. Artif. Intell. 3 (2021) 567356.
[38] Kamaludin Dingle, Pau Batlle, Houman Owhadi, Multiclass classification utilising
an estimated algorithmic probability prior, Physica D 448 (2023) 133713.
[39] Chris Mingard, Henry Rees, Guillermo Valle-Pérez, Ard A. Louis, Do deep neural
networks have an inbuilt Occam’s razor? 2023, arXiv preprint arXiv:2304.06670.
[40] M. Li, P.M.B. Vitanyi, An Introduction to Kolmogorov Complexity and Its
Applications, Springer-Verlag New York Inc, 2008.
[41] Thai Son Doan, Maximilian Engel, Jeroen S.W. Lamb, Martin Rasmussen, Hopf
bifurcation with additive noise, Nonlinearity 31 (10) (2018) 4567.
[42] Hector Zenil, Jean-Paul Delahaye, On the algorithmic nature of the world, in:
Information and Computation: Essays on Scientific and Philosophical Understand-
ing of Foundations of Information and Computation, World Scientific, 2011, pp.
477–496.
[43] Kamaludin Dingle, Rafiq Kamal, Boumediene Hamzi, A note on a priori
forecasting and simplicity bias in time series, Physica A 609 (2023) 128339.
[44] Kelty Ann Allen, Martin-Löf Randomness and Brownian Motion (Ph.D. the-
sis), University of California, Berkeley, Berkeley, 2014, Available at https://
escholarship.org/uc/item/20072582.
[45] Ali Kanso, Nejib Smaoui, Logistic chaotic maps for binary numbers generations,
Chaos Solitons Fractals 40 (5) (2009) 2557–2568.
[46] C.S. Calude, Information and Randomness: An Algorithmic Perspective, Springer,
2002.
[47] P. Gács, Lecture Notes on Descriptional Complexity and Randomness, Boston
University, Graduate School of Arts and Sciences, Computer Science Department,
1988.
[48] Alexander Shen, Vladimir A. Uspensky, Nikolay Vereshchagin, Kolmogorov
Complexity and Algorithmic Randomness, vol. 220, American Mathematical
Society, 2022.
[49] Hector Zenil, Fernando Soler-Toscano, Kamaludin Dingle, Ard A. Louis, Correla-
tion of automorphism group size and topological properties with program-size
complexity evaluations of graphs and complex networks, Physica A 404 (2014)
341–358.
[50] Kamaludin Dingle, Javor K. Novev, Sebastian E. Ahnert, Ard A. Louis, Pre-
dicting phenotype transition probabilities via conditional algorithmic probability
approximations, J. R. Soc. Interface 19 (197) (2022) 20220694.
[51] Mohammad Alaskandarani, Kamaludin Dingle, Low complexity, low probability
patterns and consequences for algorithmic probability applications, Complexity
2023 (2023).
[52] Iain G. Johnston, Kamaludin Dingle, Sam F. Greenbury, Chico Q. Camargo,
Jonathan P.K. Doye, Sebastian E. Ahnert, Ard A. Louis, Symmetry and simplicity
spontaneously emerge from the algorithmic nature of evolution, Proc. Natl. Acad.
Sci. 119 (11) (2022) e2113883119.
[53] Yuzuru Sato, Thai Son Doan, Jeroen S.W. Lamb, Martin Rasmussen, Dynamical
characterization of stochastic bifurcations in a random logistic map, 2018, arXiv
preprint arXiv:1811.03994.
[54] Douglas Lind, Brian Marcus, An Introduction to Symbolic Dynamics and Coding,
Cambridge University Press, 1995.
[55] F. Kaspar, H.G. Schuster, Easily calculable measure for the complexity of
spatiotemporal patterns, Phys. Rev. A 36 (2) (1987) 842.
[56] A. Lempel, J. Ziv, On the complexity of finite sequences, IEEE Trans. Inf. Theory
22 (1) (1976) 75–81.
[57] Boris Hasselblatt, Anatole Katok, A First Course in Dynamics: With a Panorama
of Recent Developments, Cambridge University Press, 2003.
[58] Arno Berger, Chaos and Chance: An Introduction to Stochastic Apects of
Dynamics, Walter de Gruyter, 2001.
[59] James Patrick Crutchfield, J. Doyne Farmer, Bernardo A. Huberman, Fluctuations
and simple chaotic dynamics, Phys. Rep. 92 (2) (1982) 45–82.
[60] Marcus Hutter, Universal Artificial Intelligence: Sequential Decisions Based on
Algorithmic Probability, Springer Science & Business Media, 2004.
[61] Sven Neth, A dilemma for solomonoff prediction, Philos. Sci. 90 (2) (2023)
288–306.
[62] Marcus Hutter, On universal prediction and Bayesian confirmation, Theoret.
Comput. Sci. 384 (1) (2007) 33–48.
[63] Hector Zenil, Narsis A. Kiani, Francesco Marabita, Yue Deng, Szabolcs Elias,
Angelika Schmidt, Gordon Ball, Jesper Tegner, An algorithmic information
calculus for causal discovery and reprogramming systems, Iscience 19 (2019)
1160–1172.
[64] Ray Solomonoff, Complexity-based induction systems: Comparisons and
convergence theorems, IEEE Trans. Inf. Theory 24 (4) (1978) 422–432.
[65] Paul M.B. Vitányi, Similarity and denoising, Phil. Trans. R. Soc. A 371 (1984)
(2013) 20120091.
[66] Hector Zenil, Liliana Badillo, Santiago Hernández-Orozco, Francisco Hernández-
Quiroz, Coding-theorem like behaviour and emergence of the universal
distribution from resource-bounded algorithmic probability, Int. J. Parallel
Emergent Distrib. Syst. 34 (2) (2019) 161–180.
[67] Pieter Adriaans, Learning as data compression, in: Computation and Logic in
the Real World: Third Conference on Computability in Europe, CiE 2007, Siena,
Italy, June 18-23, 2007. Proceedings 3, Springer, 2007, pp. 11–24.
[68] Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim
Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew
Aitchison, Laurent Orseau, et al., Language modeling is compression, 2023, arXiv
preprint arXiv:2309.10668.