of 47
A Simple Modeling Framework For Prediction In The Human
Glucose-Insulin System
D. J. Albers,
1,2
, M. E. Levine,
3
, M. Sirlanci,
3
, and A. M. Stuart,
§
3
1
Department of Pediatrics, Division of Informatics and Data Science, University of
Colorado Medicine, Aurora, CO 80045
2
Department of Biomedical Informatics, Columbia University, New York, NY 10032
3
Department of Computing and Mathematical Sciences, California Institute of
Technology, Pasadena, CA 91125
Abstract
In this paper, we build a new, simple, and interpretable mathematical model to describe the human
glucose-insulin system. Our ultimate goal is the robust control of the blood glucose (BG) level of
individuals to a desired healthy range, by means of adjusting the amount of nutrition and/or external
insulin appropriately. By constructing a simple yet flexible model class, with interpretable parameters,
this general model can be specialized to work in different settings, such as type 2 diabetes mellitus
(T2DM) and intensive care unit (ICU); different choices of appropriate model functions describing
uptake of nutrition and removal of glucose differentiate between the models. In addition to data-
driven decision-making and control, we believe the model is also useful for the basic quantification
of endocrine physiology. In both cases, the available data is sparse and collected in clinical settings,
major factors that have constrained our model choice to the simple form adopted.
The model has the form of a linear stochastic differential equation (SDE) to describe the evolution
of the BG level. The model includes a term quantifying glucose removal from the bloodstream through
the regulation system of the human body, and another two terms representing the effect of nutrition
and externally delivered insulin. The stochastic fluctuations encapsulate model error necessitated
by the simple model form and enable flexible incorporation of data. The parameters entering the
equation must be learned in a patient-specific fashion, leading to personalized models. We present
numerical results on patient-specific parameter estimation and future BG level forecasting in T2DM
and ICU settings. The resulting model leads to the prediction of the BG level as an
expected value
accompanied by a band around this value which accounts for
uncertainties
in the prediction. Such
predictions, then, have the potential for use as part of control systems which are robust to model
imperfections and noisy data. Finally, a comparison of the predictive capability of the model with
two different models specifically built for T2DM and ICU contexts is also performed.
david.albers@cuanschutz.edu
mlevine@caltech.edu
sirlanci@caltech.edu
§
astuart@caltech.edu
1
arXiv:1910.14193v2 [q-bio.QM] 24 Aug 2020
1 Introduction
1.1 Background
Broadly speaking mathematical models of human physiology may serve one of two purposes: elucida-
tion of the detailed mechanisms which comprise the complex systems underlying observed physiology;
or prediction of outcomes from the complex system, for the purposes of medical intervention to ame-
liorate undesirable outcomes. In principle these two objectives interact: a model which explains the
detailed mechanisms, if physiologically accurate and compatible with observed data, will of course be
good for prediction. However, in many complex human physiological systems, basic understanding
of the mechanisms involved is still developing. It is arguably the case that simple, interpretable,
models which are trained on observed data may suffice for the purpose of prediction and control,
in situations where the detailed underlying physiological mechanisms are not yet fully understood.
Additionally, human physiological data are often sparse, implying a limited capacity for resolving
high-fidelity physiological details and unique physiological inference. The human glucose-insulin sys-
tem provides an important example of a system where these issues come in to play because in most
settings, insulin—representing two of three dominant state variables—is never measured. Moreover,
depending on the purpose of the study and the structure and size of the available data, one needs
to decide the required level of model fidelity, [43]. Here, because we aim to resolve the mean and
variance of glucose dynamics, we ignore higher-fidelity physiology and build a lower-fidelity model in
a way that it includes common, clinically-relevant inputs. The objective of this paper is to introduce
a simple modeling framework for the human glucose-insulin system, which is interpretable, can be
trained on and largely limited to
observed
data and then has predictive capability, in a predictive test
data sense.
The model developed here is, in some sense, a radical departure from the norm. This is because
it proposes a closed model for blood glucose dynamics; meaning, the model includes parameters and
states limited to those either directly related to, or directly inferable from, commonly available data,
rather than the model including a full set of features believed to drive the system. The model treats
all unmeasured states that interact with blood glucose as either representable by inferred parameters
or as noise; in particular insulin levels are modeled through parametric dependence, which can be
modified to incorporate measured insulin delivery, whilst noise in the model allows for imprecision
arising from this simple approach. In this way, the primary hypothesis of this paper is to investigate
if a glycemic model, which does not include equations describing the evolution of interstitial and
plasma insulin levels in the body, can be used to represent and accurately forecast blood glucose while
retaining some physiological fidelity. The motivation for this formulation is built on both inference
and the scientific process. In general, lacking measurements for quantities related to dominant states
in the system, such as insulin, causes identifiability failure and hence instability in the inference. To
avoid this situation we represent insulin implicitly through parametric dependence within the equation
for glucose, with noise introduced to allow for inadequacies in this representation, instead of modeling
it explicitly via a state. Our results demonstrate that this approach is viable, as our model is able
to forecast glucose, by many metrics, as well or better than many accepted models which explicitly
model insulin.
In terms of flexibility and interpretability of inference, there is a trade-off between data assimilation
(DA) and machine learning (ML) approaches. In ML, we project data onto a complex, highly flexible
function space. However, the resulting algorithms often fail to be directly interpretable, limiting their
potential for use, and creating the possibility of unwanted and unexplainable effects. On the other
hand, in DA, we project data onto a model that is usually interpretable and this may be used to
design away unwanted and unexplainable effects. Our approach combines the data-driven nature, and
flexibility, of ML algorithms, whilst retaining some the interpretability, and benefits that stem from
it, that arise from DA.
2
1.2 Literature Review
Depending on their particular purposes, researchers have developed various mathematical models
ranging from extremely simple to extremely complex to describe glucose metabolism in humans. Some
of these models are developed to describe a very specific system, for example, a particular biological
function of the pancreas [13, 34, 77], whereas others have been developed to predict hypoglycemia
[31, 61, 78], glucose control [11, 14, 24, 30, 70, 89, 91], or disease pathogenesis [32, 35, 36]. Some
of these are continuous-time models in the form of ordinary differential equations (ODEs) whilst
others use ML: in [13, 34, 77], the authors develop system of ODEs to model the phenomenon they
investigate while efforts directed at inference tasks such as predicting hypoglycemia, ML approaches
are more common. For example, to predict hypoglycemia, in [78], various classification methods, such
as random forest, support vector machine,
k
-nearest neighbors, are used. For the same purpose, in
[61] a simple model, consisting of a logistic function, based on the mean and standard deviation of BG
measurements, is developed, and in [31], a linear mixed effects model is used. In [88], an ML approach
is used to investigate the variability in postprandial BG levels to identical meals based on several
covariates such as blood parameters, dietary habits, and gut microbiota over a course of one week
time period for 800 individuals. With this method, the authors successfully predicted postprandial
blood glucose levels in a personalized manner and estimated personalized diets that could reduce the
postprandial blood glucose for these patients. Moreover, another class of ML models that are used for
analysis and prediction are the ARIMA (auto-regressive integrated moving average) models. In [59],
authors use an ARMAX model, which consists of an ARMA (auto-regressive moving average) model
combined with exogenous input model to analyze the data collected from T2DM patients. In [87], an
ARIMA model is used for forecasting BG levels, whereas in [60] a SARIMA (seasonal ARIMA) model
is used for the same purpose. On the other hand, for glucose control, a wide range of approaches are
used, such as systems of ODEs [11], combination of systems of ODEs and partial differential equations
(PDEs) [30], stochastic differential equations (SDEs) [89], and various network models [14, 91]. There
is also research to understand how T2DM patients and their health care providers reason about self-
collected data, how this type of data should be used to make accurate future BG level forecasting,
and what kind of strategies should be designed to maximize the usefulness of self-monitored data,
[57, 58]. In [23], a mobile application is developed for post-meal BG level forecasting with self-tracked
data. It is also tested on two populations; one consisting of individuals who are educated about
T2DM and familiar with mobile applications whereas the other consisting of individuals with low-
literacy and limited mobile app usage. Both population find the mobile app useful in managing their
disease. Another study investigating the reasoning mechanism of self-management strategies of T2DM
patients, which is reported in [22], shows the importance of effective visualizations in self-management
of T2DM. On the other hand, in [6], a DA framework is used for future BG level forecasting based on
self-collected data. The results obtained with the DA approach is compared with other approaches
such as Gaussian process model regression and with the prediction of expert diabetes educators. In
another work, [7], the DA framework was developed as a computational method, which combines ML
and mechanistic modeling to forecast future BG levels and to infer T2DM phenotypes. On the other
hand, in [33], a two compartmental relatively simple ODE model is used to analyze continuous glucose
measurement (CGM) data collected from T2DM patients to investigate possible improvements that
could be achieved with this type of data in the T2DM setting.
One of the earliest quantitative models was Bergman’s Minimal model [12]. This ODE model was
developed with the purpose of estimating insulin sensitivity in the intravenous glucose-tolerance test
(IVGTT) setting. Another widely-known ODE model was developed by Sturis et al. [77] with the
aim of elucidating the cause of the ultradian (long-period) oscillations of insulin and glucose. The
concluding hypothesis of the Sturis et al. [77] paper was that production of glycemic oscillations
required a model with three states beyond those described in the minimal model, interstitial insulin,
plasma insulin, and a delayed effect of the liver’s release of glucose. There are alternative hypotheses
for sources of ultradian oscillations due to other factors, e.g., [55], but in all cases a full description
of the ultradian oscillations and their careful validation with data remains an open question. One
3
important theme in all of the papers that address ultradian oscillations points to a mathematical
feature that does seem to be required to produce ultradian glycemic oscillations: a delayed response
in glucose use and production. For example, routine numerical bifurcation analysis of the Sturis et
al. [77] model reveals that oscillatory dynamics appear via a—presumably Hopf—bifurcation due
to variation in parameters that control the time-delay governing hepatic glucose production. This
important observation led to the introduction of delay differential equations (DDEs) to model the
long-period oscillations of insulin and glucose. Others have developed models driven more strongly
by clinically-minded motivations. In [35], authors use a mathematical model to understand why it is
easier to prevent type 2 diabetes mellitus (T2DM) than to cure it; the model is based on a negative
feedback mechanism within a system of nonlinear ODEs, as an update to the pioneering model [80],
to describe
β
-cell mass, glucose, and insulin dynamics. Because they describe a specific aspect of
the system, these models are relatively simple and easy to interpret. There are models, on the
other hand, that aim to describe more complicated systems, with multiple and multi-scale interacting
components, such as the events that occur during oral glucose ingestion [21, 50]. These models are
comprised of a large number of ODEs or DDEs with many model parameters. Such models, which
describe events in a very detailed manner, are typically used to understand the physiological network
that results in the emergent behavior of the system. There are also machine learning models that could
be used for the same purposes. Time series analysis could be used to identify and exploit systemic
patterns in the data, and these patterns used to provide models for forecasting. In [1, 8] information-
theoretic methods were developed to understand and model phenotypic [4, 5] and health care process
(measurement process driven) [42, 44] differences observable from clinically collected glucose data.
These efforts led to [72] where a nonlinear computational model was used to estimate unmeasured
variables and unknown model parameters based on sparse measurement data for glycemic management
in ICU context. More directed at autoregressive modeling, in [16], using autocorrelation analysis of
continuous BG data, it is shown that future BG levels can be predicted by using recent time-history
of the glucose. Following this result, simple autoregressive modeling is used for forecasting, based
on continuous BG data collected from either type 1 or type 2 diabetic individuals, in [28, 74]. On
the other hand, in [45], linear regression model is used to examine the roles of insulin-dependent and
insulin-independent factors in glucose tolerance based on data collected during IVGTT.
The models cited in the previous paragraph are often developed in order to understand disease
pathogenesis, e.g., of T2DM, where the system generally changes much more slowly than state variables
evolve in time. In settings where the system evolves very quickly, e.g., in the intensive care unit
(ICU) setting, a number of additional hurdles appear for using the resulting computational models
for glycemic prediction and control. These include the wide variation in clinical response within and
between patients, and resultant concerns about safety issues. In [90], a model consisting of fractional
polynomials and interaction terms is used with a linear regression approach to determine the initial
insulin dose setting. In [82] Bergman’s Minimal Model is modified so that it applies beyond the initial
goal of explaining IVGTTs, and is instead applicable in the ICU setting. In this context it is referred
to as the ICU Minimal Model (ICU-MM). In [52], a control mechanism framework is developed that
uses two compartment glucose-insulin model accounting for time-varying parameters.
There is also a large body of research developed for similar purposes in type 1 diabetes mellitus
(T1DM) context, often under the umbrella of artificial pancreas/beta-cell research. Researchers have
long recognized the importance of efficient mathematical algorithm describing the glucose-insulin reg-
ulation, [17, 18, 20, 63, 64]. Indeed, in [17] and [63], mathematical model is explicitly counted as
one of the main components of artificial pancreas development. In all these works and many oth-
ers, [26, 27, 49, 79], that are focused on closed-loop-control of BG levels of T1DM patients, the
amount of insulin to be infused via a pump is decided based on mainly the CGM. Hence, the math-
ematical models mostly describe the glucose-insulin regulation without incorporating any nutrition
information. However, since nutrition is one of the main drivers of the BG level, developing models
accounting for nutrition factor and using these models within the human-in-the-loop control systems
for tight glycemic management in T1DM setting may have the potential to advance current glycemic
4
management results and reduce the possibility of diabetes-related complications.
In all of the models discussed above, parameters enter and play an important role in making
predictions with the model, yet may be difficult to determine, as they are not directly measurable;
furthermore, their values will vary from one patient to another. There are two overarching approaches
to learning about parameters:
optimization
in which a model-data mismatch is minimized to deter-
mine parameters [25]; and the
Bayesian approach
[46] in which the distribution on the parameters,
given the data and given the assumed (noisy) model-data framework, is computed. Generally speak-
ing the Bayesian approach provides more information as it gives not only parameter estimates, but
uncertainties; this can be very useful in biomedical prediction as it may be used to predict, and hence
avoid, rare events with extreme consequences. These advantages are not free, the Bayesian approach
is typically more costly as it requires determining an entire distribution and not just a point. The
two are linked through the negative logarithm of the posterior distribution on the parameters, which
is a penalized model-data mismatch functional. In [82] parameter estimation is performed based on
surgical ICU data using a standard least-squares approach. In [39], building on an adaptive modeling
approach introduced in [82], the authors use the ICU-MM to predict BG levels of patients over a
moving time interval that starts right after the interval used for parameter estimation, employing a
nonlinear least-squares approach. In [10], the authors employ hybrid Newton observer design to pre-
dict BG levels of ICU patients within the framework of the ICU-MM; this is a method introduced in
[15] to obtain numerical solution of a system of nonlinear ODEs through simulation by using Newton’s
method and eliminating the dependence on the exact discrete-time model when this exact model is not
known analytically. Another approach developed in [40], the authors construct a mechanistic model
consisting of five sub-models describing glucose regulation in critical care. They adopt a Bayesian
approach in which the prior (regularizing) distribution is formed based on the both prior expert
knowledge and routinely collected ICU data. By this approach, time-invariant model parameters, as
well as a time-varying model parameter which represents the temporal variation in insulin sensitivity,
are estimated. Another mathematical model consisting of ODEs is developed in [86] in which, with
the exception of the time-varying insulin sensitivity parameter, all the model parameters are fixed at
reasonable values known
a priori
. Then, the insulin sensitivity parameter is estimated online with
a least-squares approach and used to compute future insulin input by minimizing a quadratic cost
function to control future BG levels.
There are also models that are specifically developed or used to control blood glucose levels of
ICU patients in a healthy range. In [67] the authors combine the intensive control insulin-nutrition-
glucose (ICING) model, introduced in [53] with a subcutaneous insulin delivery model for automated
insulin delivery system. In [48], in addition to the models used in [67], the authors add another model
representing the suppression of endogeneous glucose production by insulin, [71], again for purposes
of controlling BG levels of critical care patients. Similarly, in [47], the ICING model is used with
another model respresenting subcutaneous insulin dynamics, [84], in a linear zone model-predictive
control framework. Nonlinear models representing glucose-insulin dynamics are also of interest and
using these models for control purposes requires some modification in the computational methodology.
In [38], a general nonlinear model-predictive control (NMPC) schema is developed to be used for
control of systems governed by nonlinear dynamics. The problem of BG control in ICU is used as an
example to show the effectiveness of this NMPC approach in [38]. On the other hand, it is known
that the glycemic control in the ICU is fundamentally different than the glycemic control in diabetes,
as an ICU patient could experience an hyperglycemic episode without being diabetic. This situation
is caused by the body’s stress response rather than diabetes. Therefore, it could provide valuable
knowledge to understand the pancreatic response of ICU patients. For this purpose, in [65], the
authors construct an algorithm to assess the pancreatic response of critical care patients. They also
show the requirement of incorporating this system in BG control frameworks in in ICU settings. None
of these constructions have been shown to work robustly for data collected in an ICU, and appear to
require continuous glucose monitors (CGM) and/or insulin measurements to function propery; CGMs
are not common and insulin is never measured in ICU settings, posing a substantial barrier to use.
5
An alternative, but related, approach to prediction is to consider BG levels as discrete time series
data and use models such as autoregressive integrated moving average (ARIMA) for prediction. In
[87], the authors use an ARIMA model to predict blood glucose levels and hypoglycemia. For similar
purposes, in [68], authors use an autoregressive AR(1) model. From a perspective in which the aim is
to resolve the mean and variance of blood glucose levels, for the purposes of prediction, the model that
we develop in this paper can be viewed as a generalized AR(1) model. A standard AR(1) model uses
only the BG measurements and provides predictions based on the information obtained from this data.
With the model we develop here, in addition to the blood glucose measurements, we use nutrition
and external insulin data to account for non-stationarity in the environment, and calculate the effects
of nutrition and external insulin on the discrete time-series by means of an underlying continuous
time model. In this sense, we aim to achieve and enhance what an ARIMA process can do by using
an underlying continuous-time mechanistic model to describe known, and measured, environmental
factors. Moreover, by constructively formulating the functional form of a model embedded with
hard-coded physiological mechanisms, we are hoping to provide both a pathway for incorporating
external physiological knowledge—reducing the data required to estimate the model—and a modeling
framework that, being based in physiology, is likely to be more interpretable to humans working in
systems physiology and in a clinic.
All of these different models, and the algorithms that stem from them, demonstrate significant
diversity in the dynamics of the glucose-insulin system between individuals. This highlights the
importance of personalized models for parameter inference, prediction and control, using data collected
from individual patients or perhaps, once more data is gathered, by clustering patients and using
representative parameter settings from the appropriate cluster. Many of the models described have so
many unknown parameters that it is very hard to use them in such realistic settings. Recently, some
researchers have started to address the applicability of the models in the real patient-data setting
[70, 14, 91] and these papers demonstrate some of the difficulties which arise. In this paper, we aim
to develop a simple interpretable mathematical model that, in view of its interpretability, can be used
safely for glucose prediction and control, and in view of its simplicity, can be trained on patient data.
We concentrate primarily on the T2DM and ICU settings, and comment briefly on the relevance of
our modeling approach for type one diabetes (T1DM).
1.3 Our Contribution
We describe a simple, interpretable, modeling framework limited to states and parameters that
are directly observable or inferable from data for prediction within the human glucose-insulin
system, based on a continuous time linear, Gaussian, stochastic differential equation (SDE) for
glucose dynamics, in which the effect of insulin appears parametrically.
We completely describe and detail the inference machinery necessary—in a data assimilation
and inverse problems framework—to estimate a stochastic differential equation model of glucose
dynamics with real-world data.
The framework is sufficiently general to be usable within the ICU, T2DM and potentially T1DM
settings.
We demonstrate, in a train-test set-up, that the models are able to fit individual patients with
reasonable accuracy; both ICU and T2DM data are used. The test framework we use is a
predictive one laying the foundations for future control methodologies.
Comparison of the data fitting for T2DM and ICU patients reveals interesting structural differ-
ences in their glucose regulation.
We make a comparison of the predictive power of our stochastic modeling framework with that
of more sophisticated models developed for both T2DM and the ICU, demonstrating that the
simple stochastic approach is generally more accurate in both settings.
6
1.4 Outline
An outline of the paper is as follows. In Section 2, we introduce the general continuous-time mathe-
matical model that describes the human glucose-insulin system. Then, in Section 3, we introduce the
specific versions of this model relevant in T2DM and ICU settings. The two model classes all derive
from a single general model, and differ according to how nutrition uptake and glucose removal are
represented. In Section 4, we construct the framework for stating the parameter estimation problem
and its solution. In Section 5, we describe the datasets, the experiments we design for parameter
estimation and forecasting, and the methods we use for parameter estimation and forecasting for
the T2DM and ICU settings. Section 6 presents the numerical results on parameter estimation and
forecasting along with some uncertainty quantification (UQ) results separately for T2DM and ICU
settings. Finally, in Section 7, we make some concluding remarks and discuss future directions that
we intend to pursue.
2 Continuous-Time Model
We begin by providing a constructive explanation of the continuos-time model including the model
equations, the description of unknown model parameters and the precise role of each component of
the model. Then, we also provide a detailed conceptual explanation of how a stochastic modeling
approach could be used to represent blood glucose dynamics.
2.1 Components of the Model
In accordance with our goal, which is to develop a highly simplified yet interpretable model, we
work with a forced SDE of Ornstein-Uhlenbeck type to describe glucose evolution, together with an
observation model of linear form, subject to additive Gaussian noise. The Gaussian structure allows
for computational tractability in prediction since probability distributions on the glucose state are
described by Gaussians and hence represented by simply a mean and variance. Note that the protocols
for managing glucose depend on intervals; e.g., a goal may be to keep glucose between 80-150 mg/dl
and interval deviation from this goal, e.g., 151-180 mg/dl, induce changes in the insulin dosage. This
means that decisions are made based on boundaries of glycemic trajectories. Nevertheless, because
glucose oscillates under continuous feeding, clinicians typically aim to ensure that the glycemic mean
does not fall below 60 mg/dl or above 180 mg/dl for any length of time. The intervals are then a proxy
for this balance of managing the mean and protecting against trajectories diverging too high or low at
any time, including between observations. Hence accurately resolving mean and standard deviation in
BG levels is extremely important. Furthermore the Kalman filter may be used to incorporate the data,
and also works entirely within the Gaussian framework; and finally parameter learning, although non-
Gaussian, is well-developed in the Kalman filter setting, both from the fully Bayesian and optimization
(empirical Bayes) perspectives. The Ornstein-Uhlenbeck process has three contributions: a damping
term which drives the BG level towards its base value at a rate which is possibly insulin dependent;
a forcing representing nutritional intake and a white noise contribution, which is used to encapsulate
the high-frequency dynamics as these dynamics are difficult to be resolved with sparse measurements.
The presence of noise in the glucose evolution model, as well as in the data acquisition process, allows
for model error which is natural in view of the the rather simple modeling framework. We obtain the
following model for the evolution of blood glucose
G
(
t
):
̇
G
(
t
) =
γ
(
G
(
t
)
G
b
) +
m
(
t
)
βI
(
t
) +
2
γσ
2
̇
W
(
t
)
.
(2.1)
The BG level at time
t
is measured in the units of mg/dl,
I
(
t
) is the rate of intravenous insulin
(insulin iv) at time
t
in the units of U/min where U is the insulin unit, and
m
(
t
) is the meal function
representing the rate of glucose intake at time
t
in the units of mg/(dl*min). Here,
W
(
t
) is a Brownian
7
motion and describes variations in the BG level
G
(
t
) which are not encapsulated in the simple damped-
driven exponential decay model that we use here.
In our approach, we use the stochastic differential equation as stated in (2.1). In this model,
G
b
represents the basal glucose value in mg/dl units and
γ
is the decay rate of BG level to its basal
value with its own effect in the units of 1/min,
β
is a proportionality constant for the effect of insulin
iv on the BG level in the units of mg/(dl*U), and
σ
is the variance of the oscillations. We will use
simple models for the meal function
m
(
t
) and the insulin delivery function
I
(
t
) that will enable explicit
solution of the continuous time model between events; events in our model are times at which the
meal function or insulin delivery function change discontinuously, or points at which BG is measured.
The precise notation will be given below. The simple structure of Brownian motion
W
(
t
) means
that, when integrated, it will lead to normally distributed random variables in discrete time, with
analytically calculable means and variances. The model is thus highly tractable. The equation given
in (2.1) is analytically solvable, meaning that we can write a formula in continuous time that gives the
BG level at any time
t
, the output of the system, as a function of the nutrition and insulin delivered
up to time
t
, the inputs to the system. For computational purposes, and because data is typically
available in discrete time, we need the discrete-time version of the model (2.1). The time discretization
is defined completely by the dataset in the following sense. Let
{
t
(
m
)
k
}
K
m
k
=1
denote the starting time
of meals in the T2DM setting and the (discontinuous) nutrition level change in the ICU setting. Let
{
t
(
i
)
k
}
K
i
k
=1
denote the (discontinuous) insulin iv level change times in ICU setting. Finally, let
{
t
o
k
}
K
o
k
=1
denote the measurement (observation) times in both settings. We call the re-ordered union of these
sets,
{
t
k
}
N
k
=0
:=
{
t
(
m
)
k
}
K
m
k
=1
∪{
t
(
i
)
k
}
K
i
k
=1
∪{
t
o
k
}
K
o
k
=1
, to be the event times, which naturally identify the
time discretization. In order to make use of analytical tractability of the model (2.1) and account
for the fact that event times occur at discrete, irregular times, we obtain the discrete-time version by
integrating (2.1) over the event-time intervals, [
t
k
,t
k
+1
),
k
= 0
,
1
,...,N
1, and refer to the resulting
model as the event-time model.
We will exhibit the differing versions of the general event-time model for T2DM and ICU settings
in more detail in the following sections, but here we make a general overarching observation about
the form of the event-time model. If we integrate (2.1) over [
t
k
,t
k
+1
), with use of Itˆo formula,
1
we
obtain that
G
(
t
k
+1
) =
G
b
+
e
γh
k
(
G
(
t
k
)
G
b
) +
t
k
+1
t
k
e
γ
(
t
k
+1
s
)
m
(
s
)
ds
t
k
+1
t
k
e
γ
(
t
k
+1
s
)
I
(
s
)
ds
+
σ
1
e
2
γh
k
ξ
k
,
(2.2)
where
h
k
:=
t
k
+1
t
k
and
ξ
k
N
(0
,
1) independent random variables for
k
= 0
,
1
,...,N
1.
2.2 Conceptual Explanation of How the Stochastic Model Represents Glu-
cose Physiology
The model we propose represents blood glucose dynamics broadly with two classes of terms, determin-
istic components and stochastic components. The deterministic terms govern the mean through both
endogenous and exogenous factors. For example, the mean, equilibrium, or homeostasis for blood
glucose, represents one of the endogenous relationships between glucose and insulin through the basal
rate. Similarly, changes in the mean are also controlled by the deterministic component but here due
to exogenous drivers such as nutrition or administered insulin. The stochastic class governs the faster
time-scale fluctuations about the deterministically specified mean. The glycemic fluctuations due to
endogenous factors are assumed to be mean zero, independent of the value of the blood glucose, and
are mean-reverting to the basal rate. In this way, the range of the glycemic fluctuations represent the
endogenous glucose-insulin oscillatory dynamics that are often difficult to resolve given the sampling
rate. Effectively, the glycemic fluctuations due to endogenous insulin, random effects, and moderate
1
equivalent to using integrating factors in this case
8
changes in activity–e.g., walking from a car to a restaurant—are modeled as fast time scale random
fluctuations of glucose.
More explicitly, the deterministic governing terms of the model determine the rate of change of
mean blood glucose level. The first deterministic term,
γ
(
G
(
t
)
G
b
), in the general version of
the model given in (2.1), represents the endogenous regulation; either by consumption or by secreted
insulin, to revert blood glucose level to its person-specific basal glucose value,
G
b
, with a person-specific
rate,
γ
. In the absence of nutrition and exogenous insulin, the basal value represents the mean blood
glucose level that all glycemic trajectories fluctuate around, and are reverted to. The second category
of deterministic terms—including +
m
(
t
)—represents and quantifies the effect of exogenous nutrition
on the rate of change of the blood glucose level. Nutrition increases the rate of change and cause the
blood glucose level to increase as the rate and amount of exogenous nutrition delivery increases. The
third category of terms—including
βI
(
t
)—represents the effect of exogenous insulin on the rate of
change of the blood glucose level. Exogenous insulin drives the rate of glucose use to increase, causing
the blood glucose level to decrease. Both the second and third classes of terms are able to shift the
glycemic equilibrium depending on their amount and the length of time they are present and effective.
The stochastic governing terms of the model determine the fast-time scale fluctuations about the
mean blood glucose level, effectively determining the variance of the glycemic fluctuations about the
mean. These fluctuations are formulated to be continuous in time random processes that are specified
only by amplitudes—their coordinates are variances rather than an exact glucose position—implying
that the true trajectory of the blood glucose level is never resolved but with oscillations or fluctuation
amplitudes. In this way we are assuming that the glycemic fluctuations due to endogenous factors, e.g.,
glycemic fluctuations due to small amounts of movement, are captured by the random terms in the
model. Because we do not have data to resolve the existence or source of these glycemic fluctuations,
we are effectively considering the unresolvable glycemic fluctuations to be random effects. Larger
changes in a mean, e.g., via exercise, would be controlled by the deterministic components of the
model. We do not include a term that could account for more vigorous exercise explicitly because we
do not collect data that would allow such a term to be inferred. It is possible that a combination of
existing deterministic and stochastic parameters, e.g.,
γ
that controls the rate of return to basal, are
enough to fully explain exercise-based variation in blood glucose, but as we do not currently have the
data to evaluate this question, we cannot evaluate success or failure of the model’s ability to account
for exercise.
There is an interplay between the stochastic and deterministic components of the model. For
example, while the movement of the mean is controlled by the deterministic terms, reversion to the
mean is controlled both by deterministic and stochastic terms. Meaning, the ability of the body to
regulate glucose homeostasis by endogenous means via the reversion to the mean is controlled by both
the deterministic and the stochastic terms.
It is clear that this model has limiting factors; the key question is, how do these limiting factors
affect our ability to predict blood glucose and the body’s response to exogenous inputs. Of particular
importance is an understanding of what fidelity is lost, and what features are differently resolvable
with the stochastic and the non-stochastic models.
Note also that from another perspective, the resulting SDE model may also be considered as a more
sophisticated version of the ARIMA-type models (referenced in the earlier literature review) which
are used to describe stochastic processes generally with the aim of analyzing data and/or forecasting
future values. More precisely, the model given in (2.2) can be thought as an ARIMA model with
specific interpretable and time inhomogeneous coefficient structure. Adopting this point of view can
be interpreted as using the simplicity of ARIMA models but formulating the parametric dependence
of the coefficients in such a way that the resulting model is also a representative of the underlying
physiology.
9
3 Event-Time Model
In this section, we show how the general continuous-time model (2.1) takes specific forms in the T2DM
and ICU settings. In Sections 3.1 and 3.2, we define the specific meal function for each case and derive
the final discrete-time model, which we refer to as the event-time model. Note also that the focus
of our data-driven studies here are the ICU and T2DM patients. The development of an analogous
T1DM model is not considered here and is left for future work.
The general model in (2.1) takes different forms when it is stated in different settings. For example,
in the ICU setting, the meal function
m
(
t
) is a piecewise constant function describing a constant rate
of nutrition. It is important to note that because of the way that the nutrition is delivered to the ICU
patients, it is a reasonable modeling choice to use a piecewise constant function, made quantitative
using the raw nutrition data (i.e., the amount of carbohydrates) to model the effect of nutrition on
the blood glucose level. Moreover, as we will explain in more detail in Section 7, attempts at using a
smoother function, to mimic gradual decrease in the effect of nutrition delivery on blood glucose level,
did not yield better forecasting results. On the other hand, in the T2DM setting, our dataset includes
the total glucose amount ingested at each meal, and through initial experiments with simpler models,
we determined that in this context it is important to model the time-history of the resulting glucose
intake as this happens in a more complicated fashion than for ICU patients. To this end we use a
function consisting of the difference of two exponential functions to model the effect of each meal on
the blood glucose level in a realistic manner. That is, for each meal, the effect of the nutrition on
the blood glucose level (proportionally to the carbohydrate amount in the meal) initially increases
monotonically, reaches a peak value after a time on the order of an hour, and then starts to decrease to
an negligible level over a time on the order of hours. This behavior, namely, how long it takes to reach
the peak value and how long it takes to effectively disappear from the system, is characterized by two
unknown model parameters; these are estimated from the data. More details about this function are
given in Section 3.1. Taking into account all of these sorts of differences, we state the “event-time”
version of the continuous general model in (2.1).
3.1 T2DM
In this setting, we will consider a system that is driven by nutrition. Namely, the only input to the
model in its current version is the nutrition. That is, we are developing a T2DM model assuming
that individuals do not inject insulin, since none of the patients in our dataset use subcutaneous
insulin. However, we could easily modify the model to include injectable insulin. Note that besides
incorporating the effect of injectable insulin into the model, we will also modify it to account for the
use of drugs such as metformin. So, in this setting we choose to model glucose removal to occur at a
constant rate
γ
, setting
β
= 0; the variable
I
(
t
), which represents the rate of insulin iv, does not enter
the model at all. On the other hand, the “meal function”,
m
(
t
) in this case requires to make some
additional modeling decisions as briefly described above. In order to model the glucose input from
each meal into the system, we use the difference of two exponential functions. There are two reasons
for making this choice. The first is to describe the uptake of glucose into the bloodstream via ingestion
through the stomach, and the second is to have a smooth function of model parameters from which
we will benefit when performing parameter estimation from data. Note that using a smooth function
to model glucose intake has a substantial impact on inference, and the details about the choice of this
function will be discussed later in Section 7. Now, recall that
{
t
(
m
)
k
}
K
m
k
=1
denotes the starting time of
the meals over the whole time interval that we work on. Then, we define
m
(
t
) =
K
m
k
=1
G
k
c
k
(
e
a
(
t
t
(
m
)
k
)
e
b
(
t
t
(
m
)
k
)
)
χ
[
t
(
m
)
k
,
)
(
t
)
(3.1)
where
G
k
is the total amount of glucose (mg/dl) in the
k
th
meal, and
c
k
is normalizing constant so
that
t
(
m
)
k
(
e
a
(
t
t
(
m
)
k
)
e
b
(
t
t
(
m
)
k
)
)
dt
= 1, for
k
= 1
,
2
,...,K
m
. Also,
χ
A
(
·
) is called the characteristic
10
function and defined as follows
χ
A
(
x
) =
{
1
, x
A,
0
, x /
A.
Therefore, the model in (2.1) becomes
̇
G
(
t
) =
γ
(
G
(
t
)
G
b
) +
K
m
k
=1
G
k
c
k
(
e
a
(
t
t
(
m
)
k
)
e
b
(
t
t
(
m
)
k
)
)
χ
[
t
(
m
)
k
,
)
(
t
) +
2
γσ
2
̇
W
(
t
)
,
(3.2)
in the T2DM setting. In this model, the first term represents body’s own effect to remove insulin from
the bloodstream, the second term represents the effect of nutrition on the rate of change of BG, and
the last term models the oscillations in the BG level. Integrating over [
t
0
,t
], we can write the analytic
solution of this equation as
G
(
t
) =
G
b
+
e
γ
(
t
t
0
)
(
G
(
t
0
)
G
b
)
+
K
m
k
=1
G
k
c
k
(
e
a
(
t
t
(
m
)
k
)
e
γ
(
t
t
(
m
)
k
)
γ
a
e
b
(
t
t
(
m
)
k
)
e
γ
(
t
t
(
m
)
k
)
γ
b
)
χ
[
t
(
m
)
k
,
)
(
t
)
+
t
t
0
e
γ
(
t
s
)
2
γσ
2
dW
(
s
)
.
(3.3)
Note that, in practice, we need to evaluate BG level at specific time points and hence need the
discrete-time model implied by the continuous time representation in (3.3). Now, by integrating (3.2)
over [
t
k
,t
k
+1
) and denoting
g
k
:=
G
(
t
k
) for
k
= 0
,
1
,...,N
, we obtain
g
k
+1
=
G
b
+
e
γh
k
(
g
k
G
b
) +
m
k
+
σ
1
e
2
γh
k
ξ
k
, k
= 0
,
1
,
2
,...,N
1
,
(3.4)
as a special case of (2.2). Also, for any fixed
t
k
, find the meal times
t
(
m
)
j
such that
t
(
m
)
j
t
k
and
denote the index set of these meal times by
I
k
. Then
m
k
in (3.4) becomes
m
k
=
j
I
k
G
j
c
j
(
e
a
(
t
k
+1
t
(
m
)
j
)
e
γh
k
e
a
(
t
k
t
(
m
)
j
)
γ
a
e
b
(
t
k
+1
t
(
m
)
j
)
e
γh
k
e
b
(
t
k
t
(
m
)
j
)
γ
b
)
,
(3.5)
for
k
= 0
,
1
,
2
,...,N
1. Hence, note that in this case, we have five model parameters to be estimated:
G
b
,γ,σ,a,b
. Recall that in this setting,
G
b
represents the basal glucose value that BG level stays
around starting some time after nutrition intake until the next nutrition intake.
γ
represents the
decay rate of BG level to
G
b
after the nutrition intake, and
σ
represents the amplitude of the BG level
oscillations. The parameters
a
and
b
entering the meal function implicitly control the time needed
for the glucose nutrition rate to reach its peak value, and the time needed for this rate to return
back to the vicinity of 0. Because of these simple physiological meanings, the parameters entering the
event-time model are important not only for accurately capturing, and predicting, glucose dynamics
based on data, but also contain implicit information about the health condition of the patient. For
example, the basal glucose value is measured during some tests to check if an individual is healthy,
pre-diabetic, or diabetic.
3.2 ICU
In the ICU setting real data is available for both nutrition and insulin iv. Nutrition is typically
delivered through enteral feeding tube, which is a mechanism to deliver nutrition via a tube entering
to the mouth and running to the gut. On the other hand, 8-10% of the ICU patients are diabetic
and only 5% of those are T1DM patients. However, more than 90% of ICU patients require glycemic
management and 10-20% of them experience a hypoglycemic event over the course of management.
11
Consequently, regardless of being diabetic or non-diabetic, they are typically given insulin iv to control
BG levels. For simplicity, and because it is a reasonable approximation of reality, i.e., insulin iv and
tube feed are delivered at a constant rate we approximate both nutrition and insulin as piecewise
constant functions. Furthermore, in contrast to the situation for T2DM, we do not set
β
= 0 unless
patient is not delivered insulin iv.
In this case
{
t
(
m
)
k
}
K
m
k
=1
represents the change of rate times for nutrition and the meal function
becomes
m
(
t
) =
K
m
k
=1
d
k
χ
[
t
(
m
)
k
,t
(
m
)
k
+1
)
(
t
)
,
(3.6)
where
d
k
is the nutrition rate over the time interval [
t
(
m
)
k
,t
(
m
)
k
+1
),
k
= 1
,...,K
m
, that is directly obtained
from the dataset. Remember that the insulin rate function,
I
(
t
) is also a piecewise constant function,
which we formulate as
I
(
t
) =
K
i
k
=1
i
k
χ
[
t
(
i
)
k
,t
(
i
)
k
+1
)
(
t
)
,
(3.7)
where
i
k
is the rate of insulin over the time interval [
t
(
i
)
k
,t
(
i
)
k
+1
), again obtained directly from the data
set.
Therefore, substituting (3.6) and (3.7) into the general equation (2.1), the ICU version of our
model becomes
̇
G
(
t
) =
γ
(
G
(
t
)
G
b
) +
K
m
k
=1
d
k
χ
[
t
(
m
)
k
,t
(
m
)
k
+1
)
(
t
)
β
K
i
k
=1
i
k
χ
[
t
(
i
)
k
,t
(
i
)
k
+1
)
(
t
) +
2
γσ
2
̇
W
(
t
)
.
(3.8)
In this model, the first term models the glucose removal rate with body’s own effort (
γ
), the second
term shows the effect of nutrition on the BG level, the third term,
βI
(
t
), models the external insulin
effect, and the last term models the oscillations in the BG level.
Now, integrate (3.8) to get the analytical solution for any
t
t
0
as follows
G
(
t
) =
G
b
+
e
γ
(
t
t
0
)
(
G
(
t
0
)
G
b
) +
K
m
k
=1
d
k
t
t
0
e
γ
(
t
s
)
χ
[
t
(
m
)
k
,t
(
m
)
k
+1
)
(
s
)
ds
β
K
i
k
=1
i
k
t
t
0
e
γ
(
t
s
)
χ
[
t
(
i
)
k
,t
(
i
)
k
+1
)
(
s
)
ds
+
2
γσ
2
t
t
0
e
γ
(
t
s
)
dW
(
s
)
.
(3.9)
For the same reasons explained in the T2DM setting, we will put together all event times and
rewrite the meal function,
m
(
t
), and the insulin function,
I
(
t
), with respect to this new discretization
of the time interval. Then, once again, using the same notation, and integrating (3.8) over [
t
k
,t
k
+1
),
we obtain the discrete version as follows:
g
k
+1
=
G
b
+
e
γh
k
(
g
k
G
b
) +
1
γ
(1
e
γh
k
)
d
k
β
1
γ
(1
e
γh
k
)
i
k
+
σ
1
e
2
γh
k
ξ
k
,
k
= 0
,
1
,
2
,...,N
1
,
(3.10)
as another special case of (2.2). Note that in this case, we have four model parameters to estimate:
G
b
,γ,σ,β
. Remember once again,
G
b
is the basal glucose value and
γ
is the decay rate of the BG
level to its basal value, and
σ
is a measure for the magnitude of the BG oscillations. Finally,
β
is another proportionality constant, which is used to scale the effect of insulin iv on the BG rate
change appropriately. These four parameters represent physiologically meaningful quantities that
could properly resolve the mean and variance of the BG level.
12
4 Parameter Estimation
Our goal in this section is to formulate the parameter estimation problem. In Section 4.1, we construct
an overarching Bayesian framework for our parameter estimation problems. We then describe two
solution approaches for this problem: an optimization based approach which identifies the most likely
solution, given our model and data assumptions; and MCMC, which samples the distribution on
parameters, given data, under the same model and data assumptions. These two solution approaches
are detailed in Sections 4.2 and 4.3, respectively.
As shown in detail before, our model takes slightly different forms in the T2DM and ICU settings.
In the former the model parameters to be estimated are
G
b
,γ,σ,a,b
whereas in the latter the unknown
parameters are
G
b
,γ,σ,β
. However, we adopt a single approach to parameter estimation. To describe
this approach we let the vector,
θ
represent the unknown model parameters to be determined from
the data, noting that this is a different set of parameters in each case. Many problems in biomedicine,
and the problems we study here in particular, have both noisy models and noisy data, leading to a
relationship between parameter
θ
and data
y
of the form
y
=
G
(
θ,ζ
)
(4.1)
where unknown
ζ
is a realization of a mean zero random variable, but its value is not known to us.
The objective is to recover
θ
from
y
. We will show how our models of the glucose-insulin system lead
to such a model.
4.1 Bayesian Formulation
The Bayesian approach to parameter estimation is desirable for two primary reasons: first it allows
for seamless incorporation of imprecise prior information with uncertain mathematical model and
noisy data, by adopting a formulation in which all variables have probabilities associated to them;
secondly it allows for the quantification of uncertainty in the parameter estimation. Whilst extraction
of information from the posterior probability distribution on parameters given data is challenging,
stable and practical computational methodology based around the Bayesian formulation has emerged
over the last few decades; see [76]. In this work, we will follow two approaches: (a) obtaining the
maximum a posteriori (MAP) estimator
, which leads to an optimization problem for the most likely
parameter given the data, and (b) obtaining
samples
from the posterior distribution on parameter
given data, using Markov Chain Monte Carlo (MCMC) techniques.
Now let us formulate the parameter estimation problem. Within the event-time framework, let
g
= [
g
k
]
N
k
=0
be the vector of BG levels at event times
{
t
k
}
N
k
=0
, and
y
= [
y
k
]
K
o
k
=1
be the vector of
measurements at the measurement times
{
t
(
o
)
k
}
K
o
k
=1
⊂ {
t
k
}
N
k
=0
. By using the event-time version, and
defining
{
ξ
k
}
N
k
=0
to be independent and identically distributed standard normal random variables, we
see that given the parameters
θ
,
g
has multivariate normal distribution, i.e.,
P
(
g
|
θ
) =
N
(
m
(
θ
)
,C
(
θ
)).
Equivalently,
g
=
m
(
θ
) +
C
(
θ
)
ξ, ξ
N
(0
,I
)
.
(4.2)
Let
L
be a
K
o
×
(
N
+ 1) matrix that maps
{
g
k
}
N
k
=0
to
{
y
k
}
K
o
k
=1
. That is, if a measurement
i
1
,...,K
o
is taken at the event time
t
j
,
j
0
,
1
,...,N
, then the
i
th
row of
L
has all 0’s except the (
j
+ 1)
st
element, which is 1. Adding a measurement noise, we state the observation equation as follows:
y
=
Lg
+
Γ(
θ
)
η, η
N
(0
,I
)
,
(4.3)
where Γ(
θ
) is a diagonal matrix representing the measurement noise. Thus, we obtain the likelihood
of the data, given the glucose time-series and the parameters, namely
P
(
y
|
g,θ
) =
N
(
Lg,
Γ(
θ
))
.
13
However, when performing parameter estimation, we are not interested in the glucose time-series itself,
but only in the parameters. Thus we directly find the likelihood of the data given the parameters
(implicitly integrating out
g
) by combining (4.2) and (4.3) to obtain
y
=
Lm
(
θ
) +
S
(
θ
)
ζ, ζ
N
(0
,I
)
,
(4.4)
where
S
(
θ
) =
LC
(
θ
)
L
T
+ Γ(
θ
). Since
ζ
has multivariate normal distribution, using the properties of
this distribution, we find that given the parameters,
θ
,
y
also has multivariate normal distribution
with mean
Lm
(
θ
) and covariance matrix
S
(
θ
). This is the specific instance of equation (4.1) that
arises for the models in this paper.
We have thus obtained
P
(
y
|
θ
) =
N
(
Lm
(
θ
)
,S
(
θ
)), that is,
P
(
y
|
θ
) =
1
(2
π
)
K
m
det(
S
(
θ
)))
exp
(
1
2
(
y
Lm
(
θ
))
T
S
(
θ
)
1
/
2
(
y
Lm
(
θ
))
)
;
(4.5)
this is the likelihood of the data,
y
, given the parameters,
θ
. Also, since we prefer to use
log(
P
(
y
|
θ
))
rather than directly using
P
(
y
|
θ
) for the sake of computation, we state it explicitly as follows:
log(
P
(
y
|
θ
)) =
K
m
2
log(2
π
) +
1
2
log(det(
S
(
θ
))) +
1
2
(
y
Lm
(
θ
))
T
S
(
θ
)
1
(
y
Lm
(
θ
))
.
(4.6)
Moreover, by using Bayes Theorem, we write
P
(
θ
|
y
) =
P
(
y
|
θ
)
P
(
θ
)
P
(
y
)
P
(
y
|
θ
)
P
(
θ
)
.
(4.7)
Note that the second statement of proportionality follows from the fact that the term,
P
(
y
), on the
denominator is constant with respect to the parameters,
θ
, and plays the role of a normalizing constant.
From another point of view, considering (4.2) and (4.4), we see that given
θ
, (
g,y
) has multivari-
ate normal distribution with mean and covariance matrix that could be computed from the above
equations since, given
θ
, everything is explicitly known. Then, integrating
g
out, in other words,
computing the marginal distribution we obtain the distribution of
y
|
θ
, which corresponds to the one
stated in (4.4).
Now, to define the prior distribution
P
(
θ
) we assume that the unknown parameters are distributed
uniformly across a bounded set Θ and define
P
(
θ
) =
1
|
Θ
|
χ
Θ
(
θ
) =
{
1
|
Θ
|
, θ
Θ
,
0
, θ /
Θ
,
(4.8)
where
χ
Θ
(
·
) is the characteristic function, as defined above and
|
Θ
|
is the volume of the region defined
by Θ. Thus, by substituting the likelihood, (4.5), and the prior distribution, (4.8), into (4.7), we
formulate the posterior distribution as follows
P
(
θ
|
y
) =
1
|
Θ
|
(2
π
)
K
m
det(
S
(
θ
)))
exp
(
1
2
(
y
Lm
(
θ
))
T
S
(
θ
)
1
/
2
(
y
Lm
(
θ
))
)
χ
Θ
(
θ
)
.
(4.9)
Now, we will show how we use this posterior distribution to state the parameter estimation problem.
4.2 Optimization
In this approach, the goal is to determine parameter values,
θ
, which maximize the posterior distri-
bution,
P
(
θ
|
y
) and is called to be the
MAP estimator
. Using the prior distribution as specified above,
the parameter estimation problem becomes
θ
= arg max
θ
P
(
θ
|
y
) = arg max
θ
Θ
P
(
y
|
θ
) = arg min
θ
Θ
log(
P
(
θ
|
y
))
.
(4.10)
14
Then, substituting (4.6) into (4.10), the problem will take the form
θ
= arg min
θ
Θ
||
S
(
θ
)
1
/
2
(
y
Lm
(
θ
))
||
2
+ log(det(
S
(
θ
)))
.
(4.11)
Hence, placing uniform prior distribution turns the problem of finding the MAP estimator into a
constrained optimization problem. To solve this problem, we use built-in
MATLAB
functions, such as
fmincon
and
multistart
.
fmincon
is a gradient-based minimization algorithm for nonlinear func-
tions.
multistart
starts the optimization procedure from the indicated number of starting points
that are picked uniformly over the region defined by the constraints. It uses
fmincon
and other simi-
lar type of algorithms to perform each optimization process independently and provides the one that
achieves the minimum function value among the result of all separate runs. With this approach, we
have the opportunity to compare different optimization procedures that starts from different initial
points. This provides some intuitive understanding of the solution surface and hence the estimated
optimal parameters.
4.3 MCMC
Once an optimal point has been found, we may also employ the Laplace approximation [56, 62]
to obtain a Gaussian approximation to the posterior distribution. The Laplace approximation is a
reasonable approximation in many data rich scenarios in which all parameters are identifiable from the
data, because of the Bernstein Von Mises Theorem [81], which asserts that the posterior distribution
will then be approximately Gaussian, centered near the truth and with variance which shrinks to zero
as more as more data is acquired. However data is not always abundant, and not all parameters
are identifiable even if it is; in this setting sampling the posterior distribution is desirable. MCMC
methods are a flexible set of techniques which may be used to sample from a target distribution,
which is not necessarily analytically tractable, [54, 69]. For example, the distribution
P
(
θ
|
y
) is the
conditional distribution of the random model parameters,
θ
given the data,
y
. Even though we can
explicitly formulate it as in equation (4.7), it is not always an easy task to extract useful quantities,
such as posterior mean and variance, from that formula. In such cases, MCMC techniques are used to
generate random samples from this target distribution and this random sample is used to obtain the
desired information, which could be anything such as the mean, mode, covariance matrix, or higher
moments of the parameters. Moreover, this technique is also very helpful to obtain UQ results for the
estimated parameters.
In order to obtain more extensive knowledge than MAP estimator can provide about the posterior
distribution of parameters given the data,
θ
|
y
, we use MCMC methods as a natural choice to sample
from that distribution. Among different possible algorithms (see [29]), we use the standard random
walk Metropolis-Hastings algorithm. In order to make sure the resulting sample is indeed a good
representer of the posterior distribution, we perform some diagnostics such as checking if chains for
each parameter converged and if they are uncorrelated. Then, after removing the burn-in period,
we compute the mean and the covariance matrix fro the remaining part of the sample. We use the
mean as a point estimator for simulation and forecasting, and the covariance matrix provided valuable
information to quantify uncertainty for the estimated parameters.
In general, it is hard to obtain efficient results with MCMC methods even when sampling from
the joint distribution of four or five parameters, due to the issues such as parameter identification.
Moreover, obtaining accurate results with this approach requires careful choice of starting point and
tuning some other parameters. In general the performance of the algorithm will depend on the initial
point. We tested the use of both random starting points and MAP estimators as starting point. The
former enables us to detect when several modes are present in the posterior distribution; the latter
helps to focus sampling near to the most likely parameter estimate and to quantify uncertainty in it.
However, it is also important to note that using MAP estimator as a starting point is not helpful in
all cases. More precisely, if the MAP estimator is not a global minimum but a local minimum, then
15
the chain could get stuck around this point. Therefore, it requires careful analysis, comparison and
synthesis of the results obtained with these different approaches.
5 Methods, Datasets, and Experimental Design
In this section, we describe the datasets that we have in the T2DM and ICU settings, the experiments
that we design to present our numerical results, and the methods that we follow to perform parameter
estimation and forecasting. Depending on the specifics of each case and to reflect the real-life situation,
we designed slightly different experiments in the T2DM and ICU settings. However, the mathematical
solution approaches for parameter estimation and forecasting stay the same for both settings because
we use similar mechanistic models. In this opening discussion we first describe the features that are
common to both the T2DM and the ICU settings. The two following subsections 5.1, 5.2 then detail
features specific to each of the two cases.
Because we use a linear, Gaussian stochastic differential equation to model the BG level, our
forecast is a Gaussian characterized by its mean and standard deviation. Hence, rather than having
a point estimate for the future BG levels, we obtain a normal random variable for each prediction.
In testing predictions of the model it is natural to check if 1
and 2
stdev intervals around the
respective means capture the true BG levels. Note that the probability of a Gaussian random variable
to take values within 1
and 2
stdev regions around its mean are
68% and
95%, respectively.
We define the observational noise covariance Γ(
θ
), defined in (4.3), to be a diagonal matrix with
form
diag
(Γ(
θ
)) :=
λ
Lm
(
θ
). Whilst we could estimate
λ
alongside
θ
, from the data, we have chosen
a heuristic to set it in advance. Specifically we found that above a value of around 0
.
3 all forecasts
were very noisy and contained little predictive value; on the other hand, below 0
.
3 results appeared
to be fairly robust to the value chosen for
λ
; in all the experiments presented in Section 6 we choose
λ
= 0
.
1.
5.1 T2DM
5.1.1 Model, Parameters, and Dataset
In this setting, we use the model (3.4) with the function
m
k
defined as in (3.5). Hence, there are
five parameters to be estimated: basal glucose value,
G
b
, BG decay rate
γ
, the measure for the
amplitude of BG oscillations,
σ
, and
a
and
b
, which are the parameters implicitly modeling the
time needed for the rate of glucose in the nutrition entering the bloodstream to reach its maximum
value and the total time needed for this rate to decrease back to 0. We assume that the prior
distribution is non-informative and initially the parameters are independent, except for a constraint
on the ordering of
a
and
b
. We determine realistic lower and upper bound values for each of them,
define Θ
:= [0
,
750]
×
[0
,
5]
×
[0
,
100]
×
[0
.
01
,
0
.
05]
×
[0
.
01
,
0
.
05] (in the order of
G
b
,γ,σ,a,b
), and then
define Θ from Θ
by adding the constraint
a < b.
We thereby form the prior distribution as defined
in (4.8). Recall that these bounds define the constraints employed when we define the parameter
estimation problem in the optimization setting for the MAP point. The set Θ is determined from
clinical and physiological prior knowledge, and by simulating the model (3.2) and requiring realistic
BG levels. Data are collected from three different T2DM patients. For each patient the dataset
consists of the meal times, the glucose amount in the meal and BG measurements along with the
measurement times. More detailed information on the dataset such as number of measurements,
recorded meals, and mean glucose value over training, testing or over entire data sets can be found in
Table 5.1.
5.1.2 Parameter Estimation and Uncertainty Quantification
We perform parameter estimation for three patients separately. First, we estimate parameters by using
data over four consecutive, non-overlapping time intervals with optimization and MCMC approaches.
16
Besides estimated values, we also provide UQ results. In the optimization setting, we use the Laplace
approximation as discussed at the start of subsection 4.3. The optimal parameters determine the
mean of the Gaussian approximation, and the inverse of the Hessian matrix becomes the covariance
matrix, providing the tools for UQ. In the MCMC approach, we use the resulting random samples for
UQ.
5.1.3 Forecasting
We adopt a train-test set-up as follows. Since the health conditions of the T2DM patients are unlikely
to change over time intervals that are on the order of days, we design an experiment in which we use
one week of data for training the patient-specific parameters. Then, we use the estimated parameters
to form a patient-specific model and use this model to forecast BG levels for the following three weeks,
using the known glucose input through the meals; this leads to a three-week testing phase. From a
practical patient-centric point of view this leads to a setting in which forecasting BG levels for the
following three weeks requires patients to collect BG data for only one week in every month, and
then the patient-specific model will be able to capture their dynamics and provide forecasts based on
nutrition intake data over the rest of the month.
5.2 ICU
5.2.1 Model, Parameters, and Dataset
In the ICU setting, we use the model (3.10), and there are now four parameters to be estimated: basal
glucose value,
G
b
, BG decay rate,
γ
, the parameter used to quantify the amplitude of the oscillations
in the BG level,
σ
, and a proportionality constant,
β
to scale the effect of insulin IV on the BG level.
Similar to what we did in the T2DM setting, we find realistic lower and upper bounds for the unknown
parameter values and set Θ := [0
,
750]
×
[0
.
02
,
1]
×
[0
,
100]
×
[20
,
110] to obtain the prior distribution
as defined in (4.8). In this case, we impose two further linear constraints, namely
G
b
3
.
5
β <
115
and
β
1110
γ <
10. These constraints are imposed to ensure that the model predictions remaining
biophysically plausible, and are determined simply by forward simulation of the SDE model; the
resulting inequality constraints do not overly constrain the parameters in that good fits can be found
Patient ID
patient 1
patient 2
patient 3
Total # glucose measurement
304
211
91
Total # meals recorded
122
76
46
Total # days measured
26.6
27.67
28.12
Mean measured glucose
113
±
25
127
±
32
124
±
26
Training set: # glucose measurement
80
53
29
Training set: # meals recorded
26
18
15
Training set: # days measured
7.02
7
7.05
Training set: mean measured glucose
112
±
25
116
±
28
125
±
24
Testing set: # glucose measurement
224
158
31
Testing set: # meals recorded
96
58
62
Testing set: # days measured
19.58
20.67
21.07
Testing set: mean measured glucose
113
±
25
130
±
33
123
±
27
Table 5.1: Information about the data set that is used in the T2DM setting, which is collected from
three different T2DM patients. Note that there is a considerable variability between the data collection
behaviour of each patient, which is also reflected in the number of recorded measurements and meals.
Also, recall that we intentionally used one week of data for training and the following three week of
data for testing.
17
which satisfy these constraints, and yet they yield more realistic BG level behavior than solutions
found without them. Thus as in the T2DM case, we the bounds and constraints chosen are based on
physiological knowledge and requiring simulated BG levels resulting from values within the region Θ
to be realistic.
In this case, the dataset consists of the rate of glucose in the nutrition and the rate of insulin
infusion along with the times at which there is a rate change. It also contains the BG measurements
and the measurement times. Summary statistics about the data set that is used in the ICU setting
can be found in Table 5.2. Note that in this case, we used all available data for each patient to perform
parameter estimation and forecasting, and all three ICU patients are non-T2DM.
Patient ID
patient 4
patient 5
patient 6
Total # glucose measurement
177
204
271
Total # days measured
13.99
16.8
24.48
Mean measured glucose
141
±
18
151
±
32
151
±
43
Training set: average # glucose measurement
14.13
13.5
14.07
Testing set: average # glucose measurement
1
1
1
Table 5.2: Information about the dataset that is used in the ICU setting, collected from three ICU
patients who are not T2DM. Because of the experiment we designed the training sets are moving with
by overlapping with each other. So, we provide average number of glucose measurements over these
moving windows. Also, since we forecast until the next measurement time following the training time
window, each testing set contains only one glucose measurement. Other information that is included
in Table 5.1, but not here, such as mean measured glucose over training set(s) is neither meaningful
nor helpful in this setting.
5.2.2 Parameter Estimation and Uncertainty Quantification
We use both the optimization and MCMC approaches for parameter estimation in a patient-specific
manner, in this setting, too. However, for UQ, we use only MCMC to estimate the posterior mean
and variance on the parameter; this is because there were cases where it was not appropriate to use
the Laplace approximation, something that will be explained in more detail in Section 6.2.
5.2.3 Forecasting
Patients in the ICU exhibit BG time-series that are very different from T2DM patients; in particular
the time-series is often non-stationary in complex ways and on different time scales. On slower time
scales, patients eventually leave the ICU because their health either improves or declines. But there
can be fast time scale changes too due to interventions and/or sudden health-related events, such as
a stroke. These health changes will lead to changes in the best-fit parameters of the model; in other
words the patient-specific model itself may change abruptly, in contrast to the T2DM case where
changes in the best-fit parameters typically occurs on a much longer time-scale, and reflects gradual
changes in health condition. To avoid compensating for different values of parameters over longer
time intervals, and to make more accurate predictions, we use only one day of data for parameter
estimation in the ICU. Moreover, to construct an experiment that reflects real-life scenarios, we need
be able to estimate the model parameters with smaller size datasets than in the T2DM case, because of
the imperative of regular intervention within the ICU setting, typically on a time-scale of hours. As a
consequence our train-test set-up in this case differs quantitatively from the T2DM case. The training
sets for each patient consist of approximately one day of data over a moving time intervals, with end
points chosen to be BG measurement times. Thus, the time windows are obtained by moving the left
end point to the next BG measurement time and choosing its right end point with the constraint that
it contains approximately one day of data and the new time window is not contained in the previous
18
one. So, in this case, there is a large overlap between the consecutive time windows of the training
sets.
On the other hand, because of rapidly changing conditions, forecast of BG levels needs only to be
accurate over shorter time-scales, too. Indeed, in general, it is important to know glycemic dynamics
on the order of hours (not days) to manage the glycemic response of patients. So, for each training
set, the left end point of the time window of the corresponding testing set is chosen to be the right
end point of the time window of the patient’s training set. Then, we choose the right end point of
the test set to be the next BG measurement time. We follow the same procedure over the moving
time intervals to the end of the whole dataset for each patient. Note that from a practical point of
view, this experiment exhibits a real life situation in which we use only one day of data for parameter
estimation and then perform forecasting for the next few hours based on the estimated parameters.
Such a set-up would be desirable as a support to glycemic management of these patients.
5.3 Evaluation Metrics
Before having a closer look at the numerical results in the next section, let us give the definitions of
the statistics that will be used to evaluate and compare the forecasting capability of the models. Let
{
y
i
}
N
i
=1
denote the true BG measurements over the predefined testing time window for an experiment.
Let
{
ˆ
y
i
}
N
i
=1
denote the forecast at the true measurement time points obtained by a model. Note that
if the model is a stochastic one,
{
ˆ
y
i
}
N
i
=1
represents the mean of the model output, while it is simply the
model output for a deterministic ODE model. When a stochastic model is used, it is natural to obtain
a confidence interval as this may be obtained as a direct consquence of the fact that the model output
is in the form of a random variable; such a model output cannot be obtained for an ODE type of a
model when parameters are learned through optimization. However, by using appropriate parameter
and state estimation techniques, it may again be possible to obtain a similar kind of confidence interval
for the model output which is in the form of a point-estimate. When we have probabilistic forecasts
we let
{

i
}
N
i
=1
denote the corresponding standard deviation for each forecast at the true measurement
points so that we can form 1- and 2-stdev bands as [ˆ
y
i

i
,
ˆ
y
i
+

i
]
N
i
=1
and [ˆ
y
i
2

i
,
ˆ
y
i
+ 2

i
]
N
i
=1
,
respectively. Then, for each model, we can compute the percentage of true measurements,
{
y
i
}
N
i
=1
,
that are captured in their respective 1- and 2-stdev bands. These two percentages will be two of the
evaluation tools that will be used in evaluation below. In addition, in some cases, we will use standard
measures such as mean-squared error (MSE), root-mean-squared error (RMSE) and mean percentage
error (MPE), which are computed as follows.
MSE
=
1
N
N
i
=1
(
y
i
ˆ
y
i
)
2
, RMSE
=
1
N
N
i
=1
(
y
i
ˆ
y
i
)
2
, MPE
=
N
i
=1
|
y
i
ˆ
y
i
|
y
i
100
.
6 Numerical Results
In this section we present numerical results concerning the simple, yet interpretable, model introduced
in this paper; we refer to this as the minimal stochastic glucose (MSG) model from now on for ease
of exposition. The two primary conclusions are that:
we can achieve good accuracy forecasting future BG levels in both the T2DM and ICU settings,
and the uncertainty bands with which we equip our forecasts play an important role in this
regard;
we can learn a substantial amount about the interpretable parameters within the models, with
possible clinical uses deriving from the parameter estimates, and from tracking them over time,
again using the uncertainty measures that accompany them as measures of confidence.
19