\mathrm{M}
\mathrm{U}\mathrm{L}\mathrm{T}\mathrm{I}\mathrm{S}\mathrm{C}\mathrm{A}\mathrm{L}\mathrm{E}
\mathrm{M}
\mathrm{O}\mathrm{D}\mathrm{E}\mathrm{L}
. \mathrm{S}
\mathrm{I}\mathrm{M}\mathrm{U}\mathrm{L}
.
©
2023 \mathrm{S}\mathrm{o}\mathrm{c}\mathrm{i}\mathrm{e}\mathrm{t}\mathrm{y} \mathrm{f}\mathrm{o}\mathrm{r} \mathrm{I}\mathrm{n}\mathrm{d}\mathrm{u}\mathrm{s}\mathrm{t}\mathrm{r}\mathrm{i}\mathrm{a}\mathrm{l} \mathrm{a}\mathrm{n}\mathrm{d} \mathrm{A}\mathrm{p}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{e}\mathrm{d} \mathrm{M}\mathrm{a}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{m}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{c}\mathrm{s}
\mathrm{V}\mathrm{o}\mathrm{l}. 21, \mathrm{N}\mathrm{o}. 2, \mathrm{p}\mathrm{p}. 641--679
LEARNING MARKOVIAN HOMOGENIZED MODELS IN
VISCOELASTICITY
*
KAUSHIK BHATTACHARYA
\dagger
, BURIGEDE LIU
\ddagger
, ANDREW STUART
\S
,
AND
MARGARET TRAUTNER
\S
Abstract.
Fully resolving dynamics of materials with rapidly varying features involves expensive
fine-scale computations which need to be conducted on macroscopic scales. The theory of homoge-
nization provides an approach for deriving effective macroscopic equations which eliminates the small
scales by exploiting scale separation. An accurate homogenized model avoids the computationally
expensive task of numerically solving the underlying balance laws at a fine scale, thereby render-
ing a numerical solution of the balance laws more computationally tractable. In complex settings,
homogenization only defines the constitutive model implicitly, and machine learning can be used
to learn the constitutive model explicitly from localized fine-scale simulations. In the case of one-
dimensional viscoelasticity, the linearity of the model allows for a complete analysis. We establish
that the homogenized constitutive model may be approximated by a recurrent neural network that
captures the memory. The memory is encapsulated in the evolution of an appropriate finite set of
hidden variables, which are discovered through the learning process and dependent on the history of
the strain. Simulations are presented which validate the theory. Guidance for the learning of more
complex models, such as arise in plasticity, using similar techniques, is given.
Key words.
homogenization, operator learning, viscoelasticity, surrogate modeling, machine
learning
MSC codes.
74D05, 74Q15, 35J47, 65M32, 74S30
DOI.
10.1137/22M1499200
1. Introduction.
Many problems in continuum mechanics lead to constitutive
laws which are history dependent. This property may be inherent to physics beneath
the continuum scale (for example, in plasticity [35, 36]) or may arise from homoge-
nization of rapidly varying continua [3, 28] (for example, in the Kelvin--Voigt (KV)
model of viscoelasticity [11]). When history dependence is present, Markovian mod-
els that capture the history dependence are desirable for both interpretability and
computability. In some cases theory may be used to justify Markovian models which
capture this history dependence, but in many cases data plays a central role in find-
ing such models. The goal of this paper is to study data-driven methods to learn
Markovian models for history dependence and to provide theoretical underpinnings
for understanding them.
*
Received by the editors May 31, 2022; accepted for publication (in revised form) February 2,
2023; published electronically May 19, 2023. The U.S. Government retains a nonexclusive, royalty-
free license to publish or reproduce the published form of this contribution, or allow others to do
so, for U.S. Government purposes. Copyright is owned by SIAM to the extent not limited by these
rights.
https://doi.org/10.1137/22M1499200
Funding:
The work of the first, second, and third authors was sponsored by the U.S. Army
Research Laboratory and was accomplished under Cooperative Agreement W911NF-12-2-0022. The
work of the fourth author was funded by the Department of Energy Computational Science Graduate
Fellowship under award DE- SC002111.
\dagger
Mechanical and Civil Engineering, California Institute of Technology, Pasadena, CA 91125 USA
(bhatta@caltech.edu).
\ddagger
Department of Engineering, University of Cambridge, Cambridge, UK (bl377@eng.cam.ac.uk).
\S
Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125
USA (astuart@caltech.edu, mtrautne@caltech.edu).
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
641
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
642
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
The paper [20] adopted a data-driven learning approach to uncovering history-
dependent homogenized models arising in crystal plasticity. However, the resulting
constitutive model is not causal and instead learns causality approximately from com-
putations performed at the level of the cell problem. The paper [21] introduces a
different approach, learning
causal
constitutive models of plasticity. In order to give
rigorous underpinnings to the empirical results therein, the present work is devoted
to studying the methodology from [21] in the setting of linear one-dimensional vis-
coelasticity. Here we can use theoretical understanding to justify and validate the
methodology; we show that machine-learned homogenized models can accurately ap-
proximate the dynamics of multiscale models at much cheaper evaluation cost. We
obtain insight into desirable choice of training data to learn the homogenized consti-
tutive model, and we study the effect of the multiple local minimizers which appear
in the underlying optimization problem. Furthermore, the rigorous underpinnings
enable us to gain insight into how to test model hypotheses. We demonstrate that
hypothesizing the correct model leads to robustness with respect to changes in time
discretization in the causal model: the model can be trained at one time-step and
used at others, and the model can be trained with one time-integration method and
used with others. In contrast, hypothesizing an incorrect model leads to intolerable
sensitivity with respect to the time-step. Thus training at one time-step and testing
at other levels of resolution provides a method for testing model form hypotheses.
We work primarily with the one-dimensional KV model for viscoelasticity for which
the constitutive model depends only on strain and strain rate. We will also briefly
touch upon the standard linear solid (SLS) model for which the constitutive relation
depends only on the strain and the strain history and perform numerical experiments
in a one-dimensional elasto-viscoplastic material; in so doing we show that the ideas
presented extend beyond the specifics of the one-dimensional KV setting.
In subsection 1.1 we describe the overarching mathematical framework adopted,
and subsection 1.2 contains a detailed literature review. This is followed, in subsection
1.3, by a statement of our contributions and an overview of the paper. In subsection
1.4 we summarize notation used throughout the remainder of the paper.
1.1. Setup.
Consider the problem of material response on an arbitrary spatial
domain
\scrD \subset
\BbbR
d
where the material properties vary rapidly within the domain. We
denote by
u
\epsilon
\in
\BbbR
d
the displacement, where
\epsilon
: 0
< \epsilon
\ll
1 denotes the scale of the
material fluctuations. Denote by
\scrT
= (0
,T
) the time domain of interest. We consider
continuum models which satisfy dynamical equations of the form
\rho \partial
2
t
u
\epsilon
=
\nabla \cdot
\sigma
\epsilon
+
f,
(
x,t
)
\in \scrD \times \scrT
,
(1.1a)
\sigma
\epsilon
(
x,t
) = \Psi
\dagger
\epsilon
\bigl(
\nabla
u
\epsilon
(
x,t
)
,\partial
t
\nabla
u
\epsilon
(
x,t
)
,
\{ \nabla
u
\epsilon
(
x,\tau
)
\}
\tau
\in
\scrT
,x,t
\bigr)
,
(
x,t
)
\in \scrD \times \scrT
,
(1.1b)
u
\epsilon
=
u
\ast
, \partial
t
u
\epsilon
=
v
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(1.1c)
u
\epsilon
= 0
, x
\in
\partial
\scrD
,
(
x,t
)
\in
\partial
\scrD \times \scrT
.
(1.1d)
From these equations we seek
u
\epsilon
:
\scrD \times \scrT \mapsto \rightarrow
\BbbR
d
. Equation (1.1a) is the balance
equation with inertia term
\rho \partial
2
t
u
\epsilon
for known parameter
\rho
\in
\BbbR
+
, resultant stress term
\nabla \cdot
(
\sigma
\epsilon
) where
\sigma
\epsilon
\in
\BbbR
d
\times
d
is the internal stress tensor, and known external forcing
f
\in
\BbbR
d
; equations (1.1c) and (1.1d) specify the initial and boundary data for the
displacement. Equation (1.1b) is the constitutive law relating properties of the strain
\nabla
u
\epsilon
to the stress
\sigma
\epsilon
via map \Psi
\dagger
\epsilon
. In this paper we will consider this model with inertia
(
\rho >
0) and without inertia (
\rho
\equiv
0).
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
643
1.1.1. Constitutive model.
The constitutive model is defined by \Psi
\dagger
\epsilon
:
\BbbR
d
\times
d
\times
\BbbR
d
\times
d
\times
C
(
\scrT
;
\BbbR
d
\times
d
)
\times \scrD \times \scrT \rightarrow
\BbbR
d
\times
d
.
It takes time
t
as input, which enables the
stress at time
t
to be expressed only in terms of the strain history up to time
t
,
\{ \nabla
u
\epsilon
(
x,\tau
)
\}
t
\tau
=0
, and not on the future of the strain for
\tau
\in
(
t,T
]
.
It takes
x
as input
to allow for material properties which depend on the rapidly varying
x/\epsilon
; it is also
possible to allow for material properties which exhibit additional dependence on the
slowly varying
x
, but we exclude this case for simplicity.
Such constitutive models include a variety of plastic, viscoelastic, and viscoplas-
tic materials. In this paper we focus mainly on the one-dimensional KV viscoelas-
tic setting in order to highlight ideas; in this case \Psi
\dagger
\epsilon
is independent of the history
of the strain. However, we will briefly demonstrate that similar concepts relating
to the learning of constitutive models also apply to the case of an SLS, for which
\Psi
\dagger
\epsilon
is independent of the strain rate but does depend on the history of the strain;
the SLS contains the KV and Maxwell models of viscoelasticity as special cases.
We also provide a demonstrative application of our methodology to the setting of
elasto-viscoplasticity. Furthermore, the paper [21] includes empirical evidence that
similar concepts relating to the learning of constitutive models also apply in higher
dimensions.
1.1.2. Homogenized constitutive model.
The goal of homogenization is to
find constitutive models which eliminate small-scale dependence. To this end, we first
discuss the form of a general homogenized problem: we seek the equation satisfied
by
u
0
, an appropriate limit of
u
\epsilon
as
\epsilon
\rightarrow
0
.
Then map \Psi
\dagger
0
defines the constitutive
relationship in a homogenized model for
u
0
of the form
\rho \partial
2
t
u
0
=
\nabla \cdot
\sigma
0
+
f,
(
x,t
)
\in \scrD \times \scrT
,
(1.2a)
\sigma
0
(
x,t
) = \Psi
\dagger
0
\bigl(
\nabla
u
0
(
x,t
)
,\partial
t
\nabla
u
0
(
x,t
)
,
\{ \nabla
u
0
(
x,\tau
)
\}
\tau
\in
\scrT
,t
\bigr)
,
(
x,t
)
\in \scrD \times \scrT
,
(1.2b)
u
0
=
u
\ast
, \partial
t
u
0
=
v
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(1.2c)
u
0
= 0
,
(
x,t
)
\in
\partial
\scrD \times \scrT
.
(1.2d)
The key property of this homogenized model is that parameter
\epsilon
no longer appears.
Furthermore, since we assumed that the multiscale model material properties depend
only on the rapidly varying scale
x/\epsilon
and not on
x
, we have that \Psi
\dagger
0
does not depend
explicitly on
x
; it does, however, still have spatial dependence through the local values
of strain, strain rate, and strain history: \Psi
\dagger
0
:
\BbbR
d
\times
d
\times
\BbbR
d
\times
d
\times
C
(
\scrT
;
\BbbR
d
\times
d
)
\times \scrT \rightarrow
\BbbR
d
\times
d
.
If the homogenized model is identified correctly, then dynamics under the multi-
scale model \Psi
\dagger
\epsilon
, i.e.,
u
\epsilon
, can approximated by dynamics under the homogenized model
\Psi
\dagger
0
, i.e.,
u
0
. This potentially facilitates cheaper computations since length-scales of
size
\epsilon
need not be resolved. We observe, however, that for KV viscoelasticity, the ho-
mogenized model contains history dependence (memory) even though the multiscale
model does not. Markovian history dependence is desirable for two primary reasons:
first, Markovian models encode conceptual understanding, representing the history
dependence in a compact, interpretable form; second, Markovian expression reduces
computational cost from
\scrO
(
| \scrT |
2
) in the general memory case to
\scrO
(
| \scrT |
) in the Mar-
kovian case. In the general media setting, for a multitude of models in viscoelasticity,
viscoplasticity, and plasticity, the homogenized model will depend on the memory in
a non-Markovian manner. However, it is interesting to determine situations in which
accurate Markovian approximations can be found.
1.1.3. Markovian homogenized constitutive model.
We will seek to iden-
tify
hidden variables
\xi
, closely related to the
internal variables
used in the mechanics
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
644
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
literature [21], and functions
\scrF
,
\scrG
such that, for
B
\in
C
(
\scrT
;
\BbbR
d
\times
d
), \Psi
\dagger
0
can be approxi-
mated by
\Psi
0
\bigl(
B
(
t
)
,\partial
t
B
(
t
)
,
\{
B
(
\tau
)
\}
\tau
\in
\scrT
,t
\bigr)
=
\scrF
(
B
(
t
)
,\partial
t
B
(
t
)
,\xi
(
t
))
,
(1.3a)
\partial
t
\xi
(
t
) =
\scrG
(
\xi
(
t
)
,B
(
t
))
,
(1.3b)
\xi
(0) = 0
.
(1.3c)
Note that
\xi
carries the history dependence on
B
through its Markovian evolution. We
assume that
\xi
\in
C
(
\scrT
;
\BbbR
r
) for some integer
r
and hence that
\scrF
:
\BbbR
d
\times
d
\times
\BbbR
d
\times
d
\times
\BbbR
r
\rightarrow
\BbbR
d
\times
d
and that
\scrG
:
\BbbR
r
\times
\BbbR
d
\times
d
\rightarrow
\BbbR
r
.
In dimension
d>
1 there will be further symmetries
that should be built into the model, but as the concrete analysis in this paper is in
dimension
d
= 1 we will not detail these symmetries here [34].
In general such a Markovian model can only
approximate
the true model, and
the nature of the physics leading to a good approximation will depend on the specific
continuum mechanics problem. To determine
\scrF
and
\scrG
in practice we will parame-
terize them as neural networks, which enables us to use general purpose optimization
software to determine suitable values of the parameters. Within computational im-
plementations of the learned homogenized models, the neural networks
\scrF
and
\scrG
act
pointwise in time to generate the stress and time derivatives of the hidden variables
at each time-step. In doing so we identify an operator class \Psi
0
(
\cdot
;
\theta
) and parameter
space \Theta such that, for some judiciously chosen
\theta
\ast
\in
\Theta , \Psi
0
(
\cdot
;
\theta
\ast
)
\approx
\Psi
\dagger
0
.
In this paper we will concentrate on justifying a Markovian homogenized approxi-
mation in the context of one-dimensional KV viscoelasticity. Our justification will use
theory that is specific to one-dimensional linear viscoelasticity, and we demonstrate
that the approach also works for the general SLS, which includes the KV model as
a particular limit. Furthermore, the paper [21] contains evidence that the ideas we
develop apply beyond the confines of one-dimensional linear viscoelasticity and into
nonlinear plasticity in higher spatial dimensions.
The specific property of one-dimensional viscoelasticity that we exploit to under-
pin our analysis (and which applies to the SLS and therefore also to the KV model) is
that, for piecewise-constant media, the homogenized model has a memory term which
can be represented in a Markovian way. Therefore, to justify our strategy of approxi-
mating by Markovian models we will do the following: first, approximate the rapidly
varying medium by a piecewise-constant rapidly varying medium; second, homoge-
nize this model to find a Markovian description; and, finally, demonstrate how the
Markovian description can be learned from data at the level of the unit cell problem,
using neural networks. For more general problems we anticipate a similar justification
holding, but with different specifics leading to the existence of good approximate Mar-
kovian homogenized models. The benefit of the one-dimensional viscoelastic setting
is that, through theory, we obtain underpinning insight into the conceptual approach
more generally, and in particular for plasticity; this theory underpins the numerical
experiments which follow.
1.2. Literature review.
The continuum assumption for physical materials ap-
proximates the inherently particulate nature of matter by a continuous medium and
thus allows the use of partial differential equations to describe response dynamics.
We refer the reader to [13, 37, 12] for a general background. In continuum mechanics,
the governing equations are derived by combining universal balance laws of physics
(balance of mass, momenta, and energy) with a constitutive relation that describes
the properties of the material being studied. This is typically specified as the relation
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
645
between a dynamic quantity like stress or energy and kinematic quantities like strain
and its history. The constitutive relation of many materials are history dependent,
i.e., the state of stress at an instant depends on the history of deformation. It is
common in continuum mechanics to incorporate this history dependence through the
introduction of internal variables, which are referred to as hidden variables in com-
puter science and throughout this paper. We refer the reader to [32] for a systematic
formulation of internal variable theories.
Of particular interest in this work are viscoelastic materials. We refer the reader
to [7, 18] for a general background. In viscoelastic materials, the state of stress at
any instant depends on the strain and its history. There are various models where
the stress depends only on strain and strain rate (Kelvin--Voigt), internal variables
(standard linear solids), convolution kernels, and fractional time derivatives.
While constitutive laws were traditionally determined empirically, more recently
there has been a systematic attempt to understand them from more fundamental
solids, and this has given rise to a rich activity in multiscale modeling of materials
[10, 39, 41]. Materials are heterogeneous on various length (and time) scales, and it is
common to use different theories to describe the behavior at different scales [30]. The
goal of multiscale modeling of materials is to use this hierarchy of scales to understand
the overall constitutive behavior at the scale of applications. The hierarchy of scales
includes a number of continuum scales. For example, a composite material is made of
various distinct materials arranged at a scale that is small compared to the scale of
application but large enough compared to an atomistic/discrete scale, so the behavior
is adequately described by continuum mechanics. Or, for example, a polycrystal is
made of a large collection of grains (regions of identical anisotropic material but with
differing orientation) that are small compared to the scale of application but large
enough for a continuum theory. Homogenization theory leverages the assumption of
the separation of scales to average out the effects of fine-scale material variations. To
estimate macroscopic response of heterogeneous materials, asymptotic expansion of
the displacement field yields a set of boundary value problems whose solution produces
an approximation that does not depend on the microscale [3, 33]. The fundamentals of
asymptotic homogenization theory are well established [28, 8, 1]. Milton [23] provides
a comprehensive survey of the effective or homogenized properties.
Homogenization in the context of viscoelasticity was initiated by Sanchez-Palencia
[33, Chapter 6], who pointed out that the homogenization of a KV model leads to a
model with fading memory. Further discussion of homogenization theory in (thermo-)
viscoelasticity can be found in Francfort and Suquet [11], and a detailed discussion
of the overall behavior including memory in Brenner and Suquet [5]. A broader
discussion of homogenization and memory effects can be found in Tartar [40]. It
is now understood that homogenization of various constitutive models gives rise to
memory.
As noted above, according to homogenization theory, the macroscopic behavior
depends on the solution of a boundary value problem at the microscale. Evaluating the
macroscopic behavior by the solution of a boundary value problem computationally
leads to what has been called computational micromechanics [43]. These often involve
periodic boundary conditions, and fast Fourier transform--based methods have been
widely used since Moulinec and Suquet [26] (see [24, 25] for recent summaries). While
these enable us to compute the macroscopic response for a particular deformation
history, one needs to repeat the calculation for all possible deformation histories.
Therefore, recent work in the mechanics literature addresses the issue of learn-
ing homogenized constitutive models from computational data [27, 42, 20, 21] or
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
646
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
experimental data [2]. This learning problem requires determination of maps that
take as inputs functions describing microstructural properties and leads into the topic
of operator learning.
Neural networks on finite-dimensional spaces have proven effective at solving long-
standing computational problems. A natural question which arises from this success
is whether neural networks can be used to solve partial differential equations (PDEs).
From a theoretical standpoint, the classical notion of neural networks as maps be-
tween finite-dimensional spaces is insufficient. Indeed, a differential operator is a map
between infinite-dimensional spaces. Similarly to the case of finite-dimensional maps,
universal approximation results for nonlinear operator learning have been developed
[6]. In one approach to operator learning, model reduction methods are applied to the
operator itself. In this setting, a low-dimensional approximation space is assumed,
and the operator approximation is constructed via regression using the latent basis
[29, 31]. In a second method, classical dimension reduction maps are applied to the
input and output spaces, and an approximation map is learned between these finite-
dimensional spaces [4]. An important result of this method is its mesh-invariance
property. Data for any operator-learning problem must consist of a finite discretiza-
tion of the true input and output functions. One side effect of this practical fact is
that some existing methods for operator learning depend critically on the choice of
discretization. Mesh-invariant methods are desirable for both practical and theoreti-
cal purposes. Practically, it would be expensive to train a new model to accommodate
a finer data resolution. From a theoretical standpoint, since the true map between
infinite-dimensional spaces is inherently resolution independent, the operator approx-
imation ought to have this property as well. Mesh invariance allows the operator
approximation to be applied in the use of various numerical approximation methods
for PDEs, which is of particular importance when the operator is being used as a
surrogate model for the overall PDE system [19]. Invariance with respect to time
discretization is also leveraged in work on data-driven learning of PDEs in [17, 16].
Surrogate modeling bypasses expensive simulation computations by replacing part
of the PDE with a neural network. In several application settings, including fluid flows
and solid mechanics computations, surrogate modeling has met empirical success in
approximating the true solution [38, 15]. This work continues to use ideas from
physics-informed machine learning; in [15] in particular, the differential operator is
incorporated into the cost function via automatic differentiation. Other work in sur-
rogate modeling for solid mechanics proposes a hierarchical network architecture that
mimics the heterogeneous material structure to yield an approximation to the ho-
mogenized solution [22]. The work of [14] also leverages automatic differentiation and
approaches the problem of surrogate modeling of constitutive laws via a parameter
identification learning perspective. In our work, we consider parameterized models
for the purpose of justifying theory, but the learning framework does not rely on
parameter identification.
In this paper we use a recurrent neural network (RNN) as a surrogate model for
the constitutive relation on the microscale. Our RNN architecture takes the form of
two feed-forward neural networks: one which computes the time derivative of hidden
variables, and one which outputs the stress pointwise in space and time. In this
manner, the history dependence is contained entirely in the hidden variables rather
than directly in the neural network. This leads to an interpretable model. The RNN
can then be used to evaluate the forward dynamic response on the microscale cells,
whose results are combined with traditional numerical approximation methods to
yield the macroscale response. Furthermore, the RNN that we train at a particular
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
647
time discretization is also accurate when used at other time discretizations if the
correct model form is proposed. However, the RNN is trained for a particular choice
of material parameters, and the resulting model cannot be used at different material
parameter values; simultaneously learning material parameter dependence is left for
future work.
1.3. Our contributions and paper overview.
Our contributions are as
follows:
1. We provide theoretical underpinnings for the discovery of Markovian homoge-
nized models in viscoelasticity and plasticity; the methods use data generated
by solving cell problems to learn constitutive models of the desired form.
2. We prove that in the one-dimensional KV setting, any solution of the multi-
scale problem can be approximated by the solution of a homogenized problem
with Markovian structure.
3. We prove that the constitutive model for this Markovian homogenized system
can be approximated by an RNN operator learned from data generated by
solving the appropriate cell problem.
4. We provide simulations which numerically demonstrate the accuracy of the
learned Markovian model.
5. We offer guidance for the application of this methodology, beyond the setting
of one-dimensional viscoelasticity, into multidimensional plasticity.
In section 2, we formulate the KV viscoelastic problem and its homogenized solu-
tion. In section 3, we present our main theoretical results, addressing contributions 1,
2, and 3; these are in the setting of one-dimensional KV viscoelasticity. We prove that
solution of the multiscale problem can be approximated by solution of a homogenized
Markovian memory-dependent model that does not depend on small scales, and we
prove that an RNN can approximate the constitutive law for this homogenized prob-
lem. Section 4 contains numerical experiments which address contributions 4 and 5;
the start of that section details the findings.
1.4. Notation.
We define notation that will be useful throughout the paper.
Recall that
\scrT
= (0
,T
) is the time domain of interest, and in the one-dimensional
setting we let
\scrD
= [0
,L
] be the spatial domain. Let
\langle \cdot
,
\cdot \rangle
and
\| \cdot \|
denote the standard
inner product and induced norm operations on the Hilbert space
L
2
(
\scrD
;
\BbbR
). Addition-
ally, let
\| \cdot \|
\infty
denote the
L
\infty
(
\scrD
;
\BbbR
) norm.
It will also be convenient to define the
\xi
-dependent quadratic form
(1.4)
q
\xi
(
u,w
) :=
\int
\scrD
\xi
(
x
)
\partial u
(
x
)
\partial x
\partial w
(
x
)
\partial x
dx
for arbitrary
\xi
\in
L
\infty
(
\scrD
; (0
,
\infty
)); furthermore we define
(1.5)
\xi
+
:= ess sup
x
\in \scrD
\xi
(
x
)
<
\infty
and
(1.6)
\xi
-
:= ess inf
x
\in \scrD
\xi
(
x
)
\geq
0
.
In this paper we always work with
\xi
such that
\xi
-
>
0. Under these assumptions
q
\xi
(
\cdot
,
\cdot
) defines an inner product, and we can define the norm
\|
u
\|
2
H
1
0
,\xi
:=
q
\xi
(
u,u
)
from it; note also that we may define a norm on
H
1
0
(
\scrD
;
\BbbR
) by
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
648
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
\|
u
\|
2
H
1
0
:=
q
1
(
u,u
)
,
where
1
(
\cdot
) is the function in
L
\infty
(
\scrD
; (0
,
\infty
)) taking value 1 in
D
a.e. The resulting
norms are all equivalent on the space
H
1
0
(
\scrD
;
\BbbR
); this is a consequence of the following
lemma.
Lemma 1.1.
For any
\xi
1
,\xi
2
\in
L
\infty
(
\scrD
; (0
,
\infty
))
satisfying properties
(1.5)
and
(1.6)
,
the norms
\|
u
\|
H
1
0
,\xi
1
and
\|
u
\|
H
1
0
,\xi
2
are equivalent in the sense that
\xi
-
2
\xi
+
1
\|
u
\|
2
H
1
0
,\xi
1
\leq \|
u
\|
2
H
1
0
,\xi
2
\leq
\xi
+
2
\xi
-
1
\|
u
\|
2
H
1
0
,\xi
1
.
Proof.
For
i
= 1
,
2
,
\xi
-
i
\int
1
0
\bigm|
\bigm|
\bigm|
\bigm|
\partial u
\partial x
\bigm|
\bigm|
\bigm|
\bigm|
2
dx
\leq
\int
1
0
\xi
i
(
x
)
\bigm|
\bigm|
\bigm|
\bigm|
\partial u
\partial x
\bigm|
\bigm|
\bigm|
\bigm|
2
dx
\leq
\xi
+
i
\int
1
0
\bigm|
\bigm|
\bigm|
\bigm|
\partial u
\partial x
\bigm|
\bigm|
\bigm|
\bigm|
2
dx.
The result follows.
We denote by
V
the Hilbert space
H
1
0
(
\scrD
;
\BbbR
) noting that, as a consequence of the
preceding lemma, we may use
q
\xi
(
u,w
) as the inner product on this space for any
\xi
satisfying (1.5) and (1.6). We also define
\scrZ
=
L
\infty
(
\scrT
;
L
2
(
\scrD
;
\BbbR
))
,
\scrZ
2
=
L
2
(
\scrT
;
L
2
(
\scrD
;
\BbbR
))
with norms
\|
r
\|
\scrZ
= ess sup
t
\in \scrT
(
\|
r
(
\cdot
,t
)
\|
)
,
\|
r
\|
\scrZ
2
=
\Bigl(
\int
\scrT
0
\|
r
(
\cdot
,t
)
\|
2
dt
\Bigr)
1
2
.
We note that
\scrZ
is continuously embedded into
\scrZ
2
.
2. One-dimensional Kelvin--Voigt viscoelasticity.
This paper is focused on
one-dimensional KV viscoelasticity because the model is amenable to rigorous analy-
sis. The resulting analysis sheds light on the learning of constitutive models more
generally. In section 2.1 we present the equations for the model in an informal fashion
and in a weak form suitable for analysis; in section 2.2 we homogenize the model and
define the operator defining the effective constitutive model.
2.1. Governing equations and weak form.
The one-dimensional KV model
for viscoelasticity postulates that stress is affine in the strain and strain rate, with
affine transformation dependent on the (typically spatially varying) material proper-
ties. For a multiscale material varying with respect to
x/\epsilon
we thus have the following
definition of \Psi
\dagger
\epsilon
from (1.1), in the one-dimensional KV model:
\sigma
\epsilon
=
E
\epsilon
\partial
x
u
\epsilon
+
\nu
\epsilon
\partial
2
xt
u
\epsilon
,
where
E
\epsilon
(
x
) =
E
(
x
\epsilon
) and
\nu
\epsilon
(
x
) =
\nu
(
x
\epsilon
) are rapidly varying material elasticity and
viscosity, respectively. Both
E
and
\nu
are assumed to be 1-periodic. Then equations
(1.1) without inertia (
\rho
\equiv
0) become
-
\partial
x
\bigl(
E
\epsilon
\partial
x
u
\epsilon
+
\nu
\epsilon
\partial
2
xt
u
\epsilon
\bigr)
=
f,
(
x,t
)
\in
\partial
\scrD \times \scrT
,
(2.1a)
u
\epsilon
(0
,x
) =
u
\ast
, \partial
t
u
\epsilon
(0
,x
) =
v
\ast
, x
\in
\partial
\scrD
,
(2.1b)
u
\epsilon
(
x,t
) = 0
, x
\in
\partial
\scrD
.
(2.1c)
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
649
Any classical solution to equations (2.1) will also solve the corresponding
weak form
:
find
u
\epsilon
\in \scrC
(
\scrT
;
V
) such that
(2.2)
q
\nu
\epsilon
(
\partial
t
u
\epsilon
,\varphi
) +
q
E
\epsilon
(
u
\epsilon
,\varphi
) =
\langle
f,\varphi
\rangle
for all test functions
\varphi
\in
V
.
2.2. Homogenization.
In the inertia-free setting
\rho
= 0 we perform homoge-
nization to eliminate the dependence on the small scale
\epsilon
in (2.1). First, we take the
Laplace transform of (2.1), which, for Laplace parameter
s
and with the hat symbol
denoting Laplace transform, gives
-
\partial
x
((
E
\epsilon
+
\nu
\epsilon
s
)
\partial
x
\widehat
u
\epsilon
) =
\widehat
f, x
\in \scrD
,
\widehat
u
\epsilon
(
x,s
) = 0
, x
\in
\partial
\scrD
.
The initial condition
u
\epsilon
(0
,x
) =
u
\ast
is applied upon Laplace inversion. Since
\epsilon
\ll
1, we
may apply standard techniques from multiscale analysis [3, 28] and seek a solution in
the form
\widehat
u
\epsilon
=
\widehat
u
0
+
\epsilon
\widehat
u
1
+
\epsilon
2
\widehat
u
2
+
\cdot \cdot \cdot
.
For convenience, define
\widehat
a
(
s,y
) =
E
(
y
) +
\nu
(
y
)
s
. Note that
\widehat
a
(
s,
\cdot
) is 1-periodic. The
leading order term in our approximation,
\widehat
u
0
, solves the following uniformly elliptic
PDE with Dirichlet boundary conditions:
(2.3a)
-
\partial
x
(
\widehat
a
0
(
s
)
\partial
x
\widehat
u
0
) =
\widehat
f
for
x
\in \scrD
,
(2.3b)
\widehat
u
0
= 0 for
x
\in
\partial
\scrD
.
Here the coefficient
\widehat
a
0
is given by
\widehat
a
0
(
s
) =
\int
1
0
(
\widehat
a
(
s,y
) +
\widehat
a
(
s,y
)
\partial
y
\chi
(
y
))
dy
and
\chi
(
y
) : [0
,
1]
\rightarrow
\BbbR
satisfies the
cell problem
(2.4a)
-
\partial
y
(
\widehat
a
(
s,y
)
\partial
y
\chi
(
y
)) =
\partial
y
\widehat
a
(
s,y
)
,
(2.4b)
\chi
is 1-periodic
,
\int
1
0
\chi
(
y
)
dy
= 0
.
Using this, the coefficient
\widehat
a
0
can be computed explicitly as the harmonic average of
the original coefficient
\widehat
a
[28]:
\widehat
a
0
(
s
) =
\bigl\langle
\widehat
a
(
s,y
)
-
1
\bigr\rangle
-
1
=
\biggl(
\int
1
0
dy
s\nu
(
y
) +
E
(
y
)
\biggr)
-
1
,
(2.5)
where
\langle \cdot \rangle
denotes spatial averaging over the unit cell.
Equations (2.3) indicate that the homogenized map \Psi
\dagger
0
appearing in (1.2) is, for
one-dimensional linear viscoelasticity, defined from
(2.6)
\Psi
\dagger
0
\bigl(
\partial
x
u
0
(
x,t
)
,\partial
2
xt
u
0
(
x,t
)
,
\{
\partial
x
u
0
(
x,\tau
)
\}
\tau
\in
\scrT
,t
\bigr)
=
\scrL
-
1
\Bigl(
\widehat
a
0
(
s
)
\partial
x
\widehat
u
0
\Bigr)
;
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
650
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
here
\scrL
-
1
denotes the inverse Laplace transform. Note that (2.5) shows that
\widehat
a
0
grows
linearly in
s
\rightarrow \infty
, and computing the constant term in a regular power series expan-
sion at
s
=
\infty
shows that we may write
\widehat
a
0
=
\nu
\prime
s
+
E
\prime
+
\widehat
\kappa
(
s
)
,
where
\widehat
\kappa
(
s
) decays to 0 as
s
\rightarrow \infty
. Here
\nu
\prime
=
\Bigl(
\int
1
0
1
\nu
(
y
)
dy
\Bigr)
-
1
, E
\prime
=
\Bigl(
\int
1
0
E
(
y
)
\nu
(
y
)
2
dy
\Bigr) \Big/ \Bigl(
\int
1
0
1
\nu
(
y
)
dy
\Bigr)
2
.
Details are presented in Appendix B.1. Laplace inversion of
\widehat
a
0
(
s
)
\partial
x
\widehat
u
0
then yields the
conclusion that
\Psi
\dagger
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
.t
;
\theta
) =
E
\prime
\partial
x
u
0
(
t
) +
\nu
\prime
\partial
2
xt
u
0
(
t
)
+
\int
t
0
\kappa
(
t
-
\tau
)
\partial
x
u
0
(
\tau
)
d\tau .
(2.7)
Remark
2.1. When
\rho
= 0, the homogenized solution provably approximates
u
\epsilon
in
the
\epsilon
\rightarrow
0 limit; see Theorem 3.7. However, although we derived it with inertia set
to zero, the homogenized solution given by (2.8) is also valid when the inertia term
\rho \partial
2
t
u
\epsilon
generates contributions which are
\scrO
(1) with respect to
\epsilon .
The homogenized PDE for one-dimensional viscoelasticity follows by combining
equations (1.2) with (2.7) to give
\rho \partial
2
t
u
0
=
\nabla \cdot
\sigma
0
+
f,
(
x,t
)
\in \scrD \times \scrT
,
(2.8a)
\sigma
0
(
t
) =
E
\prime
\partial
x
u
0
(
t
) +
\nu
\prime
\partial
2
xt
u
0
(
t
) +
\int
t
0
\kappa
(
t
-
\tau
)
\partial
x
u
0
(
\tau
)
d\tau
(
x,t
)
\in \scrD \times \scrT
,
(2.8b)
u
0
=
u
\ast
, \partial
t
u
0
=
v
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(2.8c)
u
0
= 0
,
(
x,t
)
\in
\partial
\scrD \times \scrT
.
(2.8d)
The price paid for homogenization is dependence on the strain history. We will
show in the next section, however, that we can approximate the general homogenized
map with one in which the history dependence is expressed in a Markovian manner.
3. Main theorems: Statement and interpretation.
In this section we pres-
ent theoretical results of three types. First, in subsection 3.1, we show that the
solution
u
\epsilon
to (2.1) is Lipschitz when viewed as a mapping from the unit cell ma-
terial properties
E
(
\cdot
)
,\nu
(
\cdot
) in
L
\infty
into
\scrZ
; hence, an
\scrO
(
\delta
) approximation of
E,\nu
by
piecewise-constant functions leads to an
\scrO
(
\delta
) approximation of
u
\epsilon
.
Second, in sub-
section 3.2, we demonstrate that the homogenized model based on piecewise-constant
material properties can be represented in a Markovian fashion by introducing
hidden
variables
; hence, combining with the first point, we have a mechanism to approxi-
mate
u
\epsilon
by solving a Markovian homogenized model. Third, in subsection 3.3, we
show the existence of neural networks which provide arbitrarily good approximation
of the constitutive law arising in the Markovian homogenized model; this suggests a
model class within which to learn homogenized, Markovian constitutive models from
data. Subsection 3.4 establishes our framework for the optimization methods used to
learn such constitutive models; this framework is employed in section 4.
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
651
Assumptions
3.1. We will make the following assumptions on
E
,
\nu
, and
f
through-
out:
1.
f
\in
L
2
(
\scrD
;
\BbbR
) for all
t
\in
\scrT
; thus
\|
f
\|
\scrZ
<
\infty
;
2.
E
+
,\nu
+
<
\infty
, and
E
-
,\nu
-
>
0
.
Note that
E
+
=
E
+
\epsilon
and
\nu
+
=
\nu
+
\epsilon
, so we will drop the
\epsilon
superscript in this
notation.
3.1. Approximation by piecewise-constant material.
Consider (2.1) with
continuous material properties
E
and
\nu
. We show in Theorem 3.4 that we can ap-
proximate the solution
u
\epsilon
to this system by a solution
u
PC
\epsilon
which solves (2.1) with
suitable piecewise-constant material properties
E
PC
and
\nu
PC
, in such a way that
u
\epsilon
and
u
PC
\epsilon
are close. To this end we make precise the definition of piecewise-constant
material properties.
Definition 3.2
(piecewise constant).
A material is piecewise constant on the
unit cell with
L
pieces if the elasticity function
E
(
y
)
and the viscosity function
\nu
(
y
)
both take constant values on
L
intervals
[0
,a
1
)
,
[
a
1
,a
2
)
,...,
[
a
L
-
1
,
1]
. In particular,
E
(
y
)
and
\nu
(
y
)
have discontinuities only at the same
L
-
1
points in the unit cell. We
use the terminology
L
-piecewise constant to specify the number of pieces.
Remark
3.3. The situation in which
E
(
y
) and
\nu
(
y
) have discontinuities at different
values of
y
\in
(0
,
1) can be reduced to the case in Definition 3.2 by increasing the value
of
L
.
Theorem 3.4
(piecewise-constant approximation).
Let
E
and
\nu
be piecewise-
continuous functions, with a finite number of discontinuities, satisfying Assumptions
3.1
; let
u
\epsilon
be the corresponding solution to
(2.1)
. Then, for any
\delta >
0
, there exist
piecewise-constant
E
PC
and
\nu
PC
(in the sense of Definition
3.2
) such that solution
u
PC
\epsilon
of equations
(2.1)
with these material properties satisfies
\|
u
PC
\epsilon
-
u
\epsilon
\|
\scrZ
<\delta .
Note that Theorem 3.4 is stated in the setting of no inertia. The proof depends
on the following lemma; proof of both the theorem and the lemma may be found in
Appendix A.1. We observe that, since the Lipschitz result is in the
L
\infty
norm with
respect to the material properties, it holds with constant
C
independent of
\epsilon
, in the
case of interest where the material properties vary rapidly on scale
\epsilon .
Lemma 3.5
(Lipschitz solution).
Let
u
i
be the solution to
-
\partial
x
\bigl(
E
i
\partial
x
u
i
+
\nu
i
\partial
2
xt
u
i
\bigr)
=
f,
(
x,t
)
\in
\partial
\scrD \times \scrT
,
(3.1)
u
i
(
x,t
) =
u
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(3.2)
u
i
(
x,t
) = 0
,
(
x,t
)
\in
\partial
\scrD \times \scrT
,
(3.3)
associated with material properties
E
i
,
\nu
i
for
i
\in \{
1
,
2
\}
, and forcing
f
, all satisfying
Assumptions
3.1
. Then
\|
u
1
-
u
2
\|
\scrZ
\leq
C
(
\|
\nu
1
-
\nu
2
\|
\infty
+
\|
E
1
-
E
2
\|
\infty
)
for some constant
C
\in
\BbbR
+
dependent on
f,E
+
i
,E
-
i
,\nu
+
i
,
\nu
-
i
, and
L
and independent
of
\epsilon .
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
652
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
3.2. Homogenization for piecewise-constant material.
We show in Theo-
rem 3.6 that for piecewise-constant material properties
E
(
\cdot
) and
\nu
(
\cdot
), the homogenized
map \Psi
\dagger
0
given in (2.7) can be written explicitly with a finite number of parameters,
and in particular the memory is expressible in a Markovian form. This Markovian
form implicitly defines a finite number of hidden variables.
Theorem 3.6
(existence of exact parametrization).
Let
\Psi
\dagger
0
be the map from
strain history to stress in the homogenized model, as defined by
(2.7)
, in a piecewise-
constant material with
L
\prime
+ 1
pieces. Define
\Psi
PC
0
:
\BbbR
2
\times
C
(
\scrT
;
\BbbR
)
\times \scrT \times
\Theta
\rightarrow
\BbbR
by
\Psi
PC
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
;
\theta
) =
E
0
\partial
x
u
0
(
t
) +
\nu
0
\partial
2
xt
u
0
(
t
)
-
L
0
\sum
\ell
=1
\xi
\ell
(
t
)
,
(3.4a)
\partial
t
\xi
\ell
(
t
) =
\beta
\ell
\partial
x
u
0
(
t
)
-
\alpha
\ell
\xi
\ell
(
t
)
, \xi
\ell
(0) = 0
, \ell
\in \{
1
,...,L
0
\}
,
(3.4b)
with parameter space
(3.5)
\Theta =
\Bigl(
E
0
\in
\BbbR
+
, \nu
0
\in
\BbbR
+
, L
0
\in
\BbbZ
+
, \alpha
0
\in
\BbbR
L
0
+
, \beta
0
\in
\BbbR
L
0
\Bigr)
.
Then, under Assumptions
3.1
, there exists
\theta
\ast
\in
\Theta
with
(
E
0
,\nu
0
,L
0
,\alpha
0
,\beta
0
) =
(
E
\prime
,\nu
\prime
,L
\prime
,\alpha ,\beta
)
such that
\Psi
\dagger
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
) = \Psi
PC
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
;
\theta
\ast
)
for all
u
0
\in \scrC
2
(
\scrD \times
\scrT
;
\BbbR
)
and
t
\in \scrT
.
The proof of the above theorem may be found in Appendix A.2. The parameters
E
0
,
\nu
0
,
\alpha
0
, and
\beta
0
are determined via an appropriate decomposition of
\widehat
a
0
in (2.5);
details are in the proof. In particular,
E
0
and
\nu
0
are homogenized elasticity and
viscosity coefficients, respectively,
\alpha
0
are decay rates for the hidden variables
\xi
, and
\beta
0
are coefficients for each decay term. Note that the model in equations (3.4) is
Markovian. Furthermore, although the model in (3.4) requires an input of
t
for
evaluation, the spatial variable
x
only enters implicitly through the local values of
\partial
x
u
0
and
\partial
2
xt
u
0
; the model acts pointwise in space. Thus we have not included
x
explicitly in the theorem statement, for economy of notation. In what follows it is
useful to define
u
PC
0
to be the solution to the following system defined with constitutive
model \Psi
PC
0
:
\rho \partial
2
t
u
PC
0
-
\partial
x
\sigma
0
=
f,
(
x,t
)
\in \scrD \times \scrT
,
(3.6a)
\sigma
0
(
t
) = \Psi
PC
0
\Bigl(
\partial
x
u
PC
0
(
t
)
,\partial
2
xt
u
PC
0
(
t
)
,
\bigl\{
\partial
x
u
PC
0
(
\tau
)
\bigr\}
\tau
\in
\scrT
,t
\Bigr)
,
(
x,t
)
\in \scrD \times \scrT
,
(3.6b)
u
PC
0
|
t
=0
=
u
\ast
, \partial
t
u
PC
0
|
t
=0
=
v
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(3.6c)
u
PC
0
= 0
,
(
x,t
)
\in
\partial
\scrD \times \scrT
.
(3.6d)
Using a homogenization theorem, together with approximation by piecewise-
constant material properties, we now show that
u
\epsilon
can be approximated by
u
PC
0
;
this will follow from the inequality
\|
u
\epsilon
-
u
PC
0
\|
\scrZ
2
\leq \|
u
\epsilon
-
u
PC
\epsilon
\|
\scrZ
2
+
\|
u
PC
\epsilon
-
u
PC
0
\|
\scrZ
2
.
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
653
The first term on the right-hand side may be controlled using Theorem 3.4. The
fact that dynamics under constitutive law \Psi
\dagger
\epsilon
converge to those under \Psi
\dagger
0
as
\epsilon
\rightarrow
0
may be used to control the second term; this fact is a consequence of the following
theorem.
Theorem 3.7.
Under Assumptions
3.1
, the solution
u
\epsilon
to equations
(2.1)
con-
verges weakly to
u
0
, the solution to equations
(2.8)
with
\rho
= 0
, in
W
1
,
2
(
\scrT
;
V
)
. Thus,
for any
\eta >
0
there exists
\epsilon
crit
>
0
such that for all
\epsilon
\in
(0
,\epsilon
crit
)
,
(3.7)
\|
u
\epsilon
-
u
0
\|
\scrZ
2
<\eta .
Proof.
Since
f
\in \scrZ
, continuous embedding gives
f
\in \scrZ
2
.
Applying Theorem 3.1
of [11] (noting that the work in that paper is set in dimension
d
= 3, but is readily
extended to dimension
d
= 1) establishes weak convergence of
u
\epsilon
to
u
0
in
W
1
,
2
(
\scrT
,V
).
Hence strong convergence in
\scrZ
2
follows by compact embedding of
W
1
,
2
(
\scrT
;
V
) into
\scrZ
2
.
The following corollary is a consequence of Theorem 3.7.
Corollary 3.8.
Under Assumptions
3.1
and assuming
E,\nu
are piecewise con-
stant, the solution
u
PC
\epsilon
to equations
(2.1)
converges weakly to
u
PC
0
, the solution to
equations
(3.6)
with
\rho
= 0
, in
W
1
,
2
(
\scrT
;
V
)
. Thus, for any
\eta >
0
there exists
\epsilon
crit
>
0
such that for all
\epsilon
\in
(0
,\epsilon
crit
)
,
(3.8)
\|
u
PC
\epsilon
-
u
PC
0
\|
\scrZ
2
<\eta .
Combining this result with that of Theorem 3.4, noting continuous embedding of
\scrZ
into
\scrZ
2
, allows us to approximate
u
\epsilon
by
u
PC
0
:
Corollary 3.9.
Let
E
and
\nu
be piecewise-continuous functions, with a finite
number of discontinuities satisfying, along with
f
, Assumptions
3.1
; let
u
\epsilon
be the
corresponding solution to
(2.1)
. Then for any
\eta >
0
, there exists
L
crit
and
\epsilon
crit
with
the property that for all
L
\geq
L
crit
there are
L
-piecewise-constant
E
PC
and
\nu
PC
such
that for all
\epsilon
\in
(0
,\epsilon
crit
)
, the solution to
u
PC
0
to
(3.6)
with
\rho
= 0
satisfies
(3.9)
\|
u
\epsilon
-
u
PC
0
\|
\scrZ
2
<\eta .
3.3. Neural network approximation of constitutive model.
For the spe-
cific KV model in dimension
d
= 1 we know the postulated form of \Psi
PC
0
and can in
principle use this directly as a constitutive model. However, in more complex prob-
lems we do not know the constitutive model analytically, and it is then desirable to
learn it from data from within an expressive model class. To this end we demonstrate
that \Psi
PC
0
can be approximated by an operator \Psi
RNN
0
which has a similar form to
that defined by equations (3.4) but in which the right-hand sides of those equations
are represented by neural networks, leading to an RNN structure. Note that with
this structure, the neural network outputs the stress at a single point in space and
time; in practice, repeated evaluation generates output stress trajectories from the
spatio-temporal dynamics. As such, the architecture is not the same as standard long
short-term memory (LSTM) RNN models. Instead, the feed-forward neural network
\scrG
produces time derivatives of the hidden variables
\xi
, which are used in a forward
Euler step to generate the updated hidden variable value. In this manner, the model
acts pointwise but incorporates memory through a hidden variable. Since the model
acts pointwise, the feed-forward networks
\scrF
and
\scrG
are the same at every time-step,
justifying the ``recurrent"" terminology.
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
654
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
Recall the definitions of (
E
\prime
,\nu
\prime
,L
\prime
) and
\theta
\ast
in Theorem 3.6. We first define the
linear functions
\scrF
PC
:
\BbbR
\times
\BbbR
\times
\BbbR
L
\prime
\rightarrow
\BbbR
and
\scrG
PC
:
\BbbR
L
\prime
\times
\BbbR
\rightarrow
\BbbR
by
\scrF
PC
(
b,c,r
) =
E
\prime
b
+
\nu
\prime
c
- \langle
1
,r
\rangle
,
(3.10a)
\scrG
PC
(
r,b
) =
-
Ar
+
\beta b,
(3.10b)
where
A
= diag(
\alpha
\ell
)
\in
\BbbR
L
\prime
\times
L
\prime
and
\beta
=
\{
\beta
1
,...,\beta
\ell
\} \in
\BbbR
L
\prime
. We then have
\Psi
PC
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
;
\theta
\ast
) =
\scrF
PC
\bigl(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,\xi
(
t
)
\bigr)
,
(3.11a)
\.
\xi
(
t
) =
\scrG
PC
(
\xi
(
t
)
,\partial
x
u
0
(
t
))
, \xi
(0) = 0
(3.11b)
as in Theorem 3.6.
We seek to approximate this map by \Psi
RNN
0
defined by replacing the linear func-
tions
\scrF
PC
and
\scrG
PC
by neural networks
\scrF
RNN
:
\BbbR
\times
\BbbR
\times
\BbbR
L
\prime
\rightarrow
\BbbR
and
\scrG
RNN
:
\BbbR
L
\prime
\times
\BbbR
\rightarrow
\BbbR
to obtain
\Psi
RNN
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
) =
\scrF
RNN
\bigl(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,\xi
(
t
)
\bigr)
,
(3.12a)
\.
\xi
(
t
) =
\scrG
RNN
(
\xi
(
t
)
,\partial
x
u
0
(
t
))
, \xi
(0) = 0
.
(3.12b)
Let
R >
0 and define the bounded set
\sansZ
R
=
\{
w
:
\BbbR
+
\rightarrow
\BbbR
|
sup
t
\in \scrT
|
w
(
t
)
| \leq
R
\}
.
Theorem 3.10
(RNN approximation).
Consider
\Psi
PC
0
defined as by
(3.10)
and
(3.11)
. Assume that there exist
\rho >
0
and
0
\leq
B <
\infty
such that
\rho <
min
\ell
|
\alpha
\ell
|
and
max
\ell
|
\beta
\ell
| \leq
B
. Then, under Assumptions
3.1
, for every
\eta >
0
there exists
\Psi
RNN
0
of
the form
(3.12)
such that
sup
t
\in \scrT
,b,c
\in
\sansZ
R
\bigm|
\bigm|
\Psi
PC
0
\bigl(
b
(
t
)
,c
(
t
)
,
\{
b
(
\tau
)
\}
\tau
\in
\scrT
,t
;
\theta
\ast
\bigr)
-
\Psi
RNN
0
\bigl(
b
(
t
)
,c
(
t
)
,
\{
b
(
\tau
)
\}
\tau
\in
\scrT
,t
\bigr)
\bigm|
\bigm|
<\eta .
The proof of Theorem 3.10 can be found in Appendix A.3.
Note that \Psi
RNN
0
both avoids dependence on the fine-scale
\epsilon
and is Markovian.
The nonhomogenized map \Psi
\dagger
\epsilon
is local in time, while the homogenized map \Psi
RNN
0
is
nonlocal in time and depends on the strain history. Let
u
RNN
0
be the solution to the
following system with constitutive model \Psi
RNN
0
:
\rho \partial
2
t
u
RNN
0
-
\partial
x
\sigma
0
=
f,
(
x,t
)
\in \scrD \times \scrT
,
(3.13a)
\sigma
0
(
t
) = \Psi
RNN
0
\Bigl(
\partial
x
u
RNN
0
(
t
)
,\partial
2
xt
u
RNN
0
(
t
)
,
\bigl\{
\partial
x
u
RNN
0
(
\tau
)
\bigr\}
\tau
\in
\scrT
,t
\Bigr)
,
(
x,t
)
\in \scrD \times \scrT
,
(3.13b)
u
RNN
0
|
t
=0
=
u
\ast
, \partial
t
u
RNN
0
|
t
=0
=
v
\ast
,
(
x,t
)
\in \scrD \times \{
0
\}
,
(3.13c)
u
RNN
0
= 0
,
(
x,t
)
\in
\partial
\scrD \times \scrT
.
(3.13d)
Ideally we would like an approximation result bounding
\|
u
\epsilon
-
u
RNN
0
\|
\scrZ
2
, the
difference between solution of the multiscale problem (2.1) and the Markovian RNN
model (3.13), in the case
\rho
= 0
.
Using Corollary 3.9 shows that this would follow from
a bound on
\|
u
PC
0
-
u
RNN
0
\|
\scrZ
, where
u
PC
0
solves (3.6), in the case
\rho
= 0
.
We note,
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING MARKOVIAN HOMOGENIZED CONSTITUTIVE MODELS
655
however, that although Theorem 3.10 gives us an approximation result between \Psi
PC
0
and \Psi
RNN
0
, proving that
u
PC
0
and
u
RNN
0
are close requires developing a new theory
for the fully nonlinear PDE for
u
RNN
0
; developing such a theory is beyond the scope
of this work and is difficult for two primary reasons: (i) the monotonicity property
of \Psi
RNN
0
with respect to strain rate is hard to establish globally for a trained model;
(ii) the functions
\scrF
RNN
,
\scrG
RNN
may not be differentiable. As a result, existence and
uniqueness of
u
RNN
0
remain unproven; however, numerical experiments in section 4
indicate that in practice,
u
RNN
0
does approximate
u
\epsilon
well.
Remark
3.11.
\bullet
Monotonicity of \Psi
RNN
0
with respect to strain rate is a particular issue when
\rho
= 0 (no inertia), as in this case it is needed to define an (implicit) equation
for
\partial
t
u
0
to determine the dynamics. It is for this reason that our experiments
will all be conducted with
\rho >
0, obviating the need for the determination
of an (implicit) equation for
\partial
t
u
0
. However, this leads to the issue that the
homogenized equation is only valid for a subset of initial conditions, in the
inertial setting
\rho >
0; see Remark 2.1.
\bullet
In practice we will train \Psi
RNN
0
using data for the inputs
b,c
which are ob-
tained from a random function, or set of realizations of random functions.
The choice of the probability measure from which this training data is drawn
will affect the performance of the learned model when \Psi
RNN
0
is evaluated at
b
=
\partial
x
u
RNN
0
and
c
=
\partial
2
xt
u
RNN
0
, for displacement
u
generated by the model
given by (3.13).
3.4. RNN optimization.
In the following section, we present numerical results
using a trained RNN as a
surrogate model
: an efficient approximation of the complex
microscale dynamics; in this section we discuss the problem of finding such an RNN.
To learn the RNN operator approximation, we are given data
(3.14)
\bigl\{ \bigl(
\partial
x
u
0
\bigr)
n
,
\bigl(
\partial
2
xt
u
0
\bigr)
n
,
\bigl(
\sigma
0
\bigr)
n
\}
N
n
=1
,
where the suffix
n
denotes the
n
th strain, strain rate, and stress trajectories over the
entire time interval
\scrT
. Each strain trajectory (
\partial
x
u
0
)
n
is drawn i.i.d. from a measure
\mu
on
\scrC
(
\scrT
;
\BbbR
). There is no need to generate training data on the same time interval
\scrT
as the macroscale model; we do so for simplicity. Note that since the RNN acts
pointwise, for every
n
th set of trajectories, the RNN is evaluated
T
times, where
T
=
\scrT
dt
for time discretization
dt
.
The data for the homogenized constitutive model is given by
\sigma
0
(
t
) = \Psi
\dagger
0
(
\partial
x
u
0
(
t
)
,\partial
2
xt
u
0
(
t
)
,
\{
\partial
x
u
0
(
\tau
)
\}
\tau
\in
\scrT
,t
)
,
defined via solution of the cell problem (2.4); but it may also be obtained as the
solution to a forced boundary problem on the microscale, as stated in the following
lemma.
Lemma 3.12.
Let
u
solve the equations
\partial
y
\sigma
(
y,t
) = 0
,
(
y,t
)
\in \scrD \times \scrT
,
(3.15a)
\sigma
(
y,t
) =
E
(
y
)
\partial
y
u
(
y,t
) +
\nu
(
y
)
\partial
2
ty
u
(
y,t
)
,
(
y,t
)
\in \scrD \times \scrT
,
(3.15b)
u
(0
,t
) = 0
, u
(1
,t
) =
b
(
t
)
,
(
y,t
)
\in
\partial
\scrD \times \scrT
,
(3.15c)
u
(
y,
0) = 0
, y
\in \scrD
.
(3.15d)
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
656
K. BHATTACHARYA, B. LIU, A. STUART, AND M. TRAUTNER
Then
\{
\sigma
(
t
)
\}
T
t
=0
= \Psi
\dagger
0
\bigl(
b
(
t
)
,\partial
t
b
(
t
)
,
\{
b
(
t
)
\}
T
t
=0
,t
\bigr)
,
where
\Psi
\dagger
0
is the map defined in
(2.6)
.
The proof can be found in Appendix B.2 and justifies the application of data
resulting from this problem to the homogenized model. In the following, we denote
by (
\widehat
\sigma
0
)
n
and
\widehat
\.
\xi
n
the output of
\scrF
RNN
and
\scrG
RNN
on data point
n
:
(
\widehat
\sigma
0
)
n
(
t
) =
\scrF
RNN
\Bigl(
(
\partial
x
u
0
)
n
(
t
)
,
(
\partial
2
xt
u
0
)
n
(
t
)
,
\widehat
\xi
n
(
t
)
\Bigr)
,
\widehat
\.
\xi
n
(
t
) =
\scrG
RNN
\Bigl(
(
\partial
x
u
0
)
n
(
t
)
,
\widehat
\xi
n
(
t
)
\Bigr)
,
\widehat
\xi
n
(0) = 0
.
To train the RNN, we use the following relative
L
2
loss function, which should be
viewed as a function of the parameters defining the neural networks
\scrF
RNN
,
\scrG
RNN
.
Accessible Loss Function:
(3.16)
L
1
(
\{
\sigma
0
\}
N
n
=1
,
\{
\widehat
\sigma
0
\}
N
n
=1
) =
1
N
N
\sum
n
=1
\|
(
\sigma
0
)
n
-
(
\widehat
\sigma
0
)
n
\|
L
2
(
\scrT
;
\BbbR
)
\|
(
\sigma
0
)
n
\|
L
2
(
\scrT
;
\BbbR
)
.
Remark
3.13. To test robustness of our conclusions, we also employed relative
and absolute
L
2
squared loss functions. In doing so we did not observe significant
differences in the predictive accuracy of the resulting models.
In the case of a material that is 2-piecewise-constant on the microscale, we can
explicitly write down the analytic form of the solution, and thus can also know the
values of the hidden variable
\{
\xi
n
\}
N
n
=1
and its derivative
\{
\.
\xi
n
\}
N
n
=1
, for each data tra-
jectory as expressed in equation (3.11). It is intuitive that training an RNN on an
extended data set which includes the hidden variable should be easier than using the
original data set (3.14). In order to deepen our understanding of the training process
we will include training in a 2-piecewise-constant which uses this hidden data, moti-
vating the following loss function. Since, in general, the hidden variable is inaccessible
in the data, we refer to the resulting loss as the inaccessible relative loss function.
Inaccessible Loss Function:
L
2
(
\{
(
\sigma
0
)
n
\}
N
n
=1
,
\{
(
\widehat
\sigma
0
)
n
\}
N
n
=1
,
\{
\.
\xi
n
\}
N
n
=1
,
\{
\widehat
\.
\xi
n
\}
N
n
=1
)(3.17a)
=
1
N
N
\sum
n
=1
\left(
\|
(
\sigma
0
)
n
-
(
\widehat
\sigma
0
)
n
\|
L
2
(
\scrT
;
\BbbR
)
\|
(
\sigma
0
)
n
\|
L
2
(
\scrT
;
\BbbR
)
+
\|
\.
\xi
n
-
\widehat
\.
\xi
n
\|
L
2
(
\scrT
;
\BbbR
)
\|
\.
\xi
n
\|
L
2
(
\scrT
;
\BbbR
)
\right)
.
(3.17b)
This inaccessible loss function helps identify the RNN whose existence in proved
in previous sections.
4. Numerical results.
The numerical results reach the following conclusions,
all of which guide the use of machine-learned constitutive models within complex
nonlinear problems beyond the one-dimensional linear viscoelastic setting considered
in this work:
I.
Machine-learned constitutive models.
We can find RNNs that yield
low-error simulations when used as a surrogate model in the macroscopic
Copyright
©
by SIAM. Unauthorized reproduction of this article is prohibited.
Downloaded 06/14/23 to 131.215.225.153 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy