of 7
Supplementary materials for the manuscript:
Inverse methods for consistent quantification of seafloor anoxia using uranium
isotope data from marine sediments
Michael A. Kipp and Fran
ç
ois L. H. Tissot
Assessing convergence of the MCMC routine
1)
In order to assess the ability of an MCMC routine to optimize free parameters, one
must
first
decide on
the number of free parameters to include. Here
it can be useful to consider
model
collinearity
and
paramet
er
identifiability
. Models with high
collinearity
have many
covariant
parameters
. This poses an issue for the MCMC routine, since changes in certain
parameters can be largely compensated by changes in others; the net effect is that these cases
make converg
ence on the “best fit” solution very inefficient
(
i.e.
, poor
identifiability
of optimal
parameter values)
.
We used the
collin
function in the FME package in R (
Soetaert and Petzoldt, 2010
) to
assess collinearity in our model
. This
follow
s
the approach of Brun et al. (
2001
, their Eq. 13
)
,
who
defined a collinearity index (
g
K
) that describes the ability of other model parameters to
compensate changes in a single parameter value. For instance,
g
K
= 10 indicates that if a single
parameter valu
e is altered, the model output can be matched to within 1/10 = 10% by
compensating changes in other parameters
.
While no
g
K
value specifically precludes convergence,
higher values imply more difficult convergence
(the threshold for parameter
identifiabilit
y
is
often found to be in the range of
g
K
= 5
-
20)
.
As expected, we find that with increasing
m
values
(Eq. 8, main text), the
collinearity
of our model increases (
Fig. S1
).
It is therefore recommended
to start with low
m
values
(fewer free parameters)
for a given dataset, assess convergence, and
iteratively increase
m
as possible and as is warranted by the data.
Figure
S
1
. Model collinearity as function of
m
value.
Higher
m
values result in higher coll
inearity and poorer identifiability.
2
4
6
8
10
5
10
15
20
m
collinearity (
γ
K
)
2)
Havin
g used the
collinearity
assessment above to develop an approach to selecting the
number of free parameters, the
second
step in building the MCMC routine is
to determine the
number of steps
(
n
iter
)
a
walker
needs to
take
to
converge on
the optimal parameter values
.
There
is a vast literature on this topic (reviewed in
Cowles and Carlin, 1996
) that includes several indices
for assessing
whether
a walker (also c
alled a “chain”)
has converged upon the stationary
(target)
distribution
.
Here we
consider
perhaps the most widely
-
used
convergence
metric:
the
potential
scale reduction factor
(
Gelman and Rubin, 1992
).
Gelman and Rubin (
1992
, their Eq. 20
)
defined
a
poten
tial scale reduction factor (psrf,
also called the Gelman
-
Rubin statistic or Rhat)
that can determine whether a walker (or “chain”)
has converged by comparing the variance within a walker to that between walkers. This approach
therefore requires that at le
ast two walkers be deployed in parallel. If the walkers have converged
on the stationary distribution, they will have nearly identical variance, such that psrf is near unity.
As in all cases, no single psrf value guarantees convergence, but the authors of
this diagnostic
typically view psrf values < 1.2 as suggesting convergence (
Kass et al., 1998
).
We monitored psrf
during all MCMC runs
.
We found that when psrf was too high
(>
>
1.2)
, increasing
n
iter
was typically
necessary, along with tuning of the size of proposed “jumps” in parameter values
(
p
step
)
and the
frequency of updates
(
p
update
)
in proposed jumps using the covariance matrix (after
Haario et al.,
1999; 2001
).
Another statistic worth tracking
when trying to get the walkers to converge is the
acceptance fraction, i.e.,
the fraction of proposed
steps that are accepted as new parameter
values. It has been demonstrated that accepting ~20% of runs is most efficient in a wide range of
MCMC applicati
ons (
e.g.,
Geyer and Thom
pson, 1995; Gelman et al., 1996
).
We therefore viewed
acceptance of <10% or >50% of runs as suggestive of slow convergence, and accordingly updated
p
step
and/or
p
update
to bring the value closer to ~20% as possible and/or necessary
. In most cases,
a psrf value close to unity co
-
occurred with an optimized acceptance fraction.
Ultimately, we
found
that psrf < 1.1 could
typically
be achieved with
n
iter
= 10
5
-
10
6
when
m
£
10.
3)
Having determined that the walkers
went long enough to
converge on
the
target
distribution
, the
third
step in developing the MCMC routine is the decide how many steps to keep
from each walker. This is primarily a computational issue; more steps would give
a larger sample
size and thus more precise MCMC estimate, though with the cost of extra computational
resources
(
i.e.
, time to execute a longer walk, and memory to manipulate large matrices of
posterior samples)
. A relevant metric for deciding the number o
f samples to keep is the
integrated autocorrelation time
(IAT).
The IAT is a measure of the inefficiency of the MCMC
method
, describing the number of MCMC samples (or steps in the random walk) required to
generate one independent sample
. The MCMC method is
not perfectly efficient because the
samples are
in fact
not independent. The MCMC
-
derived error on our estimates therefore scales
as the inverse of the IAT, where IAT = 1 would mean that there is no MCMC
-
derived error
(
e.g.
,
Sherlock et al. 2010; their Eq
. 6
)
. In practice, it is prudent to
let
each walker go for >10
autocorrelation times
(
e.g.
,
p. 380 in
MacKay, 2003
)
. This
gives
a
relative error
of a few percent
in the estimates derived from the MCMC approach
often smaller than the uncertainty ranges
co
nstrained in the MCMC analysis
meaning that we aren’t inhibited from drawing meaningful
quantitative conclusions from our MCMC outputs.
Multiple authors describe methods for
estimating
the IAT (
e.g.
,
Sherlock et al., 2010;
Soetaert and Petzoldt, 2010
;
Foreman
-
Mackey et
al., 2013
). Here we use the
IAT
function from the LaplacesDemon package in R. We
monitored
IAT for
all
MCMC runs
, finding
that for
most
datasets, IAT was
£
100
steps
.
This means that
keeping 1000 steps would
sufficient
ly
minimize MCMC
-
der
ived error
(
i.e.,
yielding ~10 or more
effectively independent samples)
.
The walkers can go for longer and more samples can be
retained, but as we ultimately want to conduct many parallel walks to propagate uncertainty in
the isotopic mass balance terms (s
ee below), we want
our runs
to be as concise as possible. So
we typically retained the final 1000 steps from each walker once it was demonstrated that the
walkers had converged on the stationary distribution.
In practice, we did this by setting
n
burn
-
in
to
1000 less than
n
iter
. This also allows the adaptive MCMC routine of the FME package to update
the propos
ed steps
using the covariance matrix up until the point that we retain runs; after the
burn
-
in period, the proposals are not adjusted, meaning that the
convergence metrics described
above can be applied (whereas they cannot be applied while the proposals are subject to
change).
4)
Finally,
we can consider the number of walkers (
n
walker
s
) to deploy in parallel.
If all we
care about is obtaining a large number of samples from the stationary distribution, the most
computationally efficient method would be to let a single walker converge on the target
distribution and then retain as many samples as wanted in order t
o minimize MCMC
-
derived
error (
p. 381; MacKay, 2003
). This means that only once does a walker need to take the time to
converge on the target distribution. However, with a single chain it is not possible to assess
convergence using a metric like the psrf.
Proponents of the psrf thus often deploy a few walkers
(3
-
5) and compare variance between walkers via psrf to demonstrate convergence (
Kass et al.,
1998
). This is only slightly less computationally efficient than the first method, but has the added
benefit
of a more rigorous convergence assessment. Additionally, if computations for each
walker are conducted in parallel, then no extra time is needed. A final end
-
member is to deploy
many parallel chains, running only long enough to converge and then retain a
sufficient number
of samples, and combining all the samples to comprise the final posterior distribution. One
advantage of this method is that samples from different chains will likely be less correlated than
those from a single, long chain. In our case, a
further advantage is that we can randomly sample
from slightly different values in the constant terms of the isotopic mass balance in each of the
parallel chains. This enables us to propagate the uncertainty on these terms into our inversions.
We therefor
e took the approach of first tuning the MCMC routine for convergence using a few
chains and holding mass balance terms constant. Then having found the conditions necessary for
convergence, we deployed
³
100
walkers in parallel, each sampling from the uncert
ainty range
surrounding each parameter in
Appendix A.
In doing so, many chains had a harder time
converging and required a higher
n
iter
value. In the end, though, this large sample set was
compiled to comprise the total posterior distribution. As a final s
tep, the forward model was then
run 1000 times while sampling from this posterior distribution in order to generate the time
series estimates and confidence intervals in the figures in the main text.
Age uncertainty
In addition to
propagating uncertainty on the parameters in the isotopic mass balance,
we can account for uncertainty in sample ages
. While the studies examined here do not report
age uncertainties, we took a conservative approach by assuming a 20% uncertainty (1
s
) in th
e
age difference between adjacent samples, such that the law of superposition is not violated. We
compared the MCMC inversion results with and without age uncertainty for the Frasnian
-
Famennian dataset of Song et al. (
2017
), shown in Fig. 6b in the main te
xt, as this dataset had the
largest time intervals between points (in denser datasets, the uncertainty would be negligible).
We found that accounting for this potential age uncertainty had a negligible effect on the
inferred history of anoxia (
Fig. S2
). W
e therefore infer that our inversion results presented in the
main text, which do not account for age uncertainty, are likely robust. We note, however, that
future studies can account for this effect when reporting isotopic data from samples with poorly
-
co
nstrained relative ages.
Figure
S
2
. MCMC inversion results for Frasnian
-
Famennian dataset of Song et al. (2017), conducted (a) without and (b) with
age uncertainty.
The inferred trajectory of seafloor anoxia is quite similar in the two cases, suggesting that this potential
uncertainty does
not undermine the reconstructions.
-0.9
-0.6
-0.3
0.0
0.2
0.4
0.6
0
1
2
3
4
-0.9
-0.6
-0.3
0.0
0.2
0.4
0.6
0
1
2
3
4
f
anox
δ
238
U (‰)
f
anox
δ
238
U (‰)
Time [Myr]
Time [Myr]
a
b