Online
Learning
of
Entrainment
Closures
in
a
Hybrid
Machine
Learning
Parameterization
Costa
Christopoulos
1
,
Ignacio
Lopez‐Gomez
1,2
,
Tom
Beucler
3,4
,
Yair
Cohen
1,5
,
Charles
Kawczynski
1
,
Oliver
R.
A.
Dunbar
1
,
and
Tapio
Schneider
1
1
California
Institute
of
Technology,
Pasadena,
CA,
USA,
2
Now
at
Google
Research,
Mountain
View,
CA,
USA,
3
Faculty
of
Geosciences
and
Environment,
University
of
Lausanne,
Lausanne,
Switzerland,
4
Expertise
Center
for
Climate
Extremes,
University
of
Lausanne,
Lausanne,
Switzerland,
5
Now
at
NVIDIA
Corporation,
Santa
Clara,
CA,
USA
Abstract
This
work
integrates
machine
learning
into
an
atmospheric
parameterization
to
target
uncertain
mixing
processes
while
maintaining
interpretable,
predictive,
and
well‐established
physical
equations.
We
adopt
an
eddy‐diffusivity
mass‐flux
(EDMF)
parameterization
for
the
unified
modeling
of
various
convective
and
turbulent
regimes.
To
avoid
drift
and
instability
that
plague
offline‐trained
machine
learning
parameterizations
that
are
subsequently
coupled
with
climate
models,
we
frame
learning
as
an
inverse
problem:
Data‐driven
models
are
embedded
within
the
EDMF
parameterization
and
trained
online
in
a
one‐dimensional
vertical
global
climate
model
(GCM)
column.
Training
is
performed
against
output
from
large‐eddy
simulations
(LES)
forced
with
GCM‐simulated
large‐scale
conditions
in
the
Pacific.
Rather
than
optimizing
subgrid‐scale
tendencies,
our
framework
directly
targets
climate
variables
of
interest,
such
as
the
vertical
profiles
of
entropy
and
liquid
water
path.
Specifically,
we
use
ensemble
Kalman
inversion
to
simultaneously
calibrate
both
the
EDMF
parameters
and
the
parameters
governing
data‐driven
lateral
mixing
rates.
The
calibrated
parameterization
outperforms
existing
EDMF
schemes,
particularly
in
tropical
and
subtropical
locations
of
the
present
climate,
and
maintains
high
fidelity
in
simulating
shallow
cumulus
and
stratocumulus
regimes
under
increased
sea
surface
temperatures
from
AMIP4K
experiments.
The
results
showcase
the
advantage
of
physically
constraining
data‐driven
models
and
directly
targeting
relevant
variables
through
online
learning
to
build
robust
and
stable
machine
learning
parameterizations.
Plain
Language
Summary
In
this
research,
we
aim
to
improve
projections
of
the
Earth's
climate
response
by
creating
a
hybrid
model
that
integrates
machine
learning
(ML)
into
parts
of
an
existing
atmospheric
model
that
are
less
certain.
This
integration
improves
our
hybrid
model's
performance,
particularly
in
tropical
and
subtropical
oceanic
regions.
Unlike
previous
approaches
that
first
trained
the
ML
and
then
ran
the
host
model
with
ML
embedded,
we
train
the
ML
while
the
host
model
is
running
in
a
single
column,
which
makes
the
model
more
stable
and
reliable.
Indeed,
when
tested
under
conditions
with
higher
sea
surface
temperatures,
our
model
accurately
predicts
outcomes
even
in
scenarios
that
were
not
encountered
during
the
ML
training.
Our
study
highlights
the
value
of
combining
ML
and
traditional
atmospheric
models
for
more
robust
and
data‐driven
climate
predictions.
1.
Introduction
The
latest
suite
of
global
climate
models
(GCMs)
continues
to
exhibit
a
large
range
of
climate
sensitivities,
the
measure
of
Earth's
equilibrium
temperature
response
to
a
doubling
of
atmospheric
greenhouse
gas
concentrations
(Meehl
et
al.,
2020
).
Variance
in
modeled
responses
has
been
traced
to
disparate
representations
of
subgrid‐scale
(SGS)
processes
not
explicitly
resolved
by
climate
models,
specifically
those
controlling
the
characteristics
of
cloud
feedbacks
(Bony
et
al.,
2015
;
Sherwood
et
al.,
2014
;
Vial
et
al.,
2013
;
Zelinka
et
al.,
2020
).
Furthermore,
climate
models
often
fail
to
reproduce
several
key
statistics
from
the
recent
past
when
run
retrospectively
(Vignesh
et
al.,
2020
).
In
light
of
these
discrepancies,
researchers
have
launched
systematic
efforts
across
the
climate
modeling
enterprise
to
incorporate
machine
learning
(ML)
methods
into
GCMs,
in
order
to
improve
the
ability
of
climate
model
components
to
learn
from
high
fidelity
data.
This
study
specifically
uses
a
training
data
set
focused
on
marine
low
cloud
regimes
in
the
central
and
eastern
Pacific—areas
that
are
particularly
problematic
to
model
in
GCMs
(Nam
et
al.,
2012
;
Črnivec
et
al.,
2023
),
yet
are
critical
for
precise
assessments
of
equilibrium
climate
sensitivity
due
to
cloud
feedbacks
(Brient
&
Schneider,
2016
;
Myers
et
al.,
2021
;
Siler
et
al.,
2018
).
RESEARCH
ARTICLE
10.1029/2024MS004485
Key
Points:
•
We
train
a
hybrid
subgrid
parameterization
to
minimize
the
mismatch
between
a
single‐column
model
and
large‐eddy
simulation
mean
states
•
Within
the
parameterization,
the
entrainment
mixing
closure
is
fully
data‐driven
and
trained
online
via
ensemble
Kalman
inversion
•
With
no
prior
information
on
entrainment,
we
learn
physically
realistic
mixing
closures
indirectly
from
mean
simulation
states
Correspondence
to:
C.
Christopoulos,
cchristo@caltech.edu
Citation:
Christopoulos,
C.,
Lopez‐Gomez,
I.,
Beucler,
T.,
Cohen,
Y.,
Kawczynski,
C.,
Dunbar,
O.
R.
A.,
&
Schneider,
T.
(2024).
Online
learning
of
entrainment
closures
in
a
hybrid
machine
learning
parameterization.
Journal
of
Advances
in
Modeling
Earth
Systems
,
16
,
e2024MS004485.
https://doi.org/10.1029/
2024MS004485
Received
31
MAY
2024
Accepted
21
OCT
2024
©
2024
The
Author(s).
Journal
of
Advances
in
Modeling
Earth
Systems
published
by
Wiley
Periodicals
LLC
on
behalf
of
American
Geophysical
Union.
This
is
an
open
access
article
under
the
terms
of
the
Creative
Commons
Attribution
License,
which
permits
use,
distribution
and
reproduction
in
any
medium,
provided
the
original
work
is
properly
cited.
CHRISTOPOULOS
ET
AL.
1
of
22
Initiatives
to replace
existing
physics‐based
parameterizations
in atmospheric
models
entirely
with ML are often
marred
withchallenges
surrounding
numerical
instability
andextrapolation
performance.
Instabilities,
suchasthe
generation
of unstable
gravity
wave modes
(Brenowitz
et al., 2020),
largely
arise from feedbacks
between
the
learned
SGS parameterization
and the dynamical
core upon integration.
Currently,
the favored
strategy
is to train
ML models
offline
via supervised
learning
to predict
SGS tendencies
as a function
of the resolved
atmospheric
state,thencoupletrained
models
toadynamical
coretoperform
inferences
ateachmodeltimestep
(Krasnopolsky
et al., 2013; Rasp et al., 2018; Yuval & O’Gorman,
2020). As an example
of the offline
training
procedure
for
atmospheric
turbulence,
a recent encoder‐decoder
approach
was used to learn vertical
turbulent
fluxes in dry
convective
boundary
layers on the basis of coarse‐grained
large‐eddy
simulations
(Shamekh
& Gentine,
2023).
Although
significant
progress
has been made toward
advancing
and stabilizing
data‐driven
parameterizations
(Brenowitz
& Bretherton,
2019; Wang et al., 2022; Watt‐Meyer
et al., 2023),
the conventional
offline
training
strategy
precludes
learning
unobservable
processes
indirectly
from relevant
climate
statistics.
Furthermore,
in-
stabilities
arising
fromsystem
feedbacks
arenottypically
incorporated
intotraining,
andcannotbeeasilyassessed
until ML models
are coupled
to a dynamical
core (Ott et al., 2020; Rasp, 2020).
More recently,
the advent
of
differentiable
simplified
general
circulation
models
(e.g.,without
phasetransitions
ofwater)hasenabled
spatially
three‐dimensional
(3D) online training
of ML‐based
SGS parameterizations
using short‐term
forecasts
(Kochkov
et al., 2024).
These strategies
have not yet overcome
the problems
of instability
and extrapolation
to warmer
climates
and remain
difficult
to interpret.
We take steps to address
these issues by employing
ensemble
Kalman
inversion
(EKI) to perform
parameter
estimation
withinaSGSparameterization
fromstatistics
ofatmospheric
profiles
inasinglecolumn
setup(Dunbar
et al., 2021; Huang,
Schneider,
& Stuart,
2022; M. A. Iglesias
et al., 2013).
Treating
learning
as an inverse
problem
directly
enables
online learning.
Inverse
problems
are characterized
by setups where the dependent
variable
of some target process
is neither
directly
observable
nor explicitly
included
in the loss function.
In this
case, it is through
secondary
causal effects
of atmospheric
dynamics
on observable
atmospheric
quantities
that
parameters
are optimized.
In the field of dynamical
systems,
theory underpinning
the use of inversion
techniques
toinferparameters
iswellestablished
(Huang,
Huang,
etal.,2022; M.A. Iglesias
etal.,2013), andtheyhavealso
been shownto be effective
for learning
neural networks
(NNs),
especially
in chaotic
system
wherethe smoothing
properties
of ensemble
methods
can beadvantageous
(Dunbar
etal., 2022; Kovachki
&Stuart,2019). In practice,
ensemble
Kalman
methods
have been used to learn drift and diffusion
terms in the Lorenz
’96 model (Schneider
etal.,2021), nonlinear
eddyviscosity
models
forturbulence
(Zhang
etal.,2022), theeffectsoftruncated
variables
in a quasi‐geostrophic
ocean‐atmosphere
model (Brajard
et al., 2021),
and NN‐based
parameterizations
of the
quasi‐biennial
oscillation
and gravity
waves (Pahlavan
et al., 2024). An alternative
approach
to online learning
relies on differentiable
methods
to explicitly
compute
gradients
through
the physical
model to learn data‐driven
components
(C. Shen et al., 2023; Um et al., 2021).
The differentiable
learning
approach
has been used suc-
cessfully
to learn NN‐based
closures
in numerous
idealized
turbulence
setups
(Kochkov
et al., 2021; List
et al., 2022; MacArt
et al., 2021; Shankar
et al., 2023). In an Earth system
modeling
setting,
differentiable
online
learning
hasbeenusedtolearnstableturbulence
parameterizations
inanidealized
quasi‐geostrophic
setup(Frezat
et al., 2022) and residual
corrections
to an upper‐ocean
convective
adjustment
scheme
(Ramadhan
et al., 2023).
While
promising,
differentiable
methods
preclude
computing
gradients
through
physical
models
with non‐
differentiable
components,
such as the physics
stemming
from water phase changes
in cloud parameteriza-
tions. Furthermore,
given existing
work surrounding
differentiable
and inverse
methods
for geophysical
fluid
dynamics,
there remains
a lack of literature
demonstrating
indirect
learning
of data‐driven
components
in more
comprehensive
atmospheric
parameterizations
of convection,
turbulence,
and clouds.
Our contribution
is the
application
of these methods
in a more realistic
climate
modeling
setting,
a use case which can directly
improve
operational
Earth system
models.
We extend
a flexible
and modular
framework
that allows for the selective
addition
of expressive,
non‐parametric
components
where physical
knowledge
is limited,
introduced
by Lopez‐Gomez
et al. (2022).
Our approach
promotes
generalizability
and interpretability.
Interpretability
comes
by virtue of targeting
specific
physical
processes,
which enables
a mechanistic
analysis
of their effect on climate.
Generalizability
is a result of both
retaining
thisphysical
framework
andemploying
aninversion
strategy
thattargetsclimate
statistics.
Thephysical
framework
includes
the partial
differential
equations
in which
the closure
is embedded,
the non-
dimensionalization
of data‐driven
input variables,
and the dimensional
scales that modulate
learned
nondimen-
sional closures.
In contrast,
a fully data‐driven
parameterization
benefits
from expressivity
at the expense
of
Journal
ofAdvances
inModeling
EarthSystems
10.1029/2024MS004485
CHRISTOPOULOS
ET AL.
2 of 22
19422466, 2024, 11, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024MS004485 by California Inst of Technology, Wiley Online Library on [30/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
sensitivity
to
training
data,
leading
to
difficulties
in
extrapolating
to
unobserved
climates.
Generalizability
is
verified
in
our
setup
by
assessing
performance
on
an
out‐of‐distribution
climate
where
SSTs
are
uniformly
increased
by
4
K;
test
error
decreases
in
lockstep
with
training
error
from
the
present
climate
and
overfitting
is
not
observed.
In
this
study,
we
will
investigate
the
performance
of
a
single
column
model
containing
data‐driven
lateral
mixing
closures
spanning
a
range
of
complexities,
from
linear
regression
models
to
neural
networks.
In
Section
2
,
we
describe
in
detail
the
data‐driven
architectures,
training
data,
and
online
calibration
pipeline.
Section
3
outlines
the
performance
of
the
data‐driven
eddy‐diffusivity
mass‐flux
(EDMF)
scheme
in
terms
of
the
root
mean
squared
error
of
the
mean
atmospheric
state
in
a
current
and
warmer
climate,
and
representative
vertical
profiles
are
presented
with
physical
implications
discussed.
Relative
to
the
previous
work
of
Lopez‐Gomez
et
al.
(
2022
),
modeling
improvements
are
made
by
both
modifying
the
calibration
pipeline
and
addressing
structural
biases
in
the
EDMF
model
itself,
namely
boundary
conditions
and
the
lateral
mixing
formulation.
2.
Online
Training
Setup
An
overarching
goal
of
SGS
modeling
is
to
produce
computationally
efficient
schemes
that
emulate
expensive
high‐resolution
simulations,
given
the
same
large‐scale
forcings,
boundary
conditions,
and
initial
conditions.
Of
primary
importance
are
the
prediction
of
SGS
fluxes
and
cloud
properties,
which
are
determined
by
small‐scale
processes
not
resolvable
by
the
GCM
dynamical
core.
In
the
setup
described
here,
parameters
in
a
full‐complexity
SGS
scheme
are
systematically
optimized
through
the
ensemble
Kalman
inversion
technique
to
match
charac-
teristics
of
high‐resolution
simulations,
namely
time‐mean
vertical
profiles
and
vertically
integrated
liquid
water
content
produced
by
large‐eddy
simulations
(LES)
(Z.
Shen
et
al.,
2022
).
A
variant
of
the
SGS
scheme
is
introduced,
which
imposes
fewer
assumptions
and
incorporates
more
general
data‐driven
functions
that
can
be
determined
with
data.
The
SGS
model
is
an
eddy‐diffusivity
mass‐flux
(EDMF)
scheme
that
parameterizes
the
effects
of
turbulence,
convection,
and
clouds.
The
reference
high‐resolution
simulations
are
performed
with
PyCLES
(Pressel
et
al.,
2015
),
which
explicitly
models
convection
and
turbulent
eddies
larger
than
O
(
10
m
)
.
The
process
diagram
in
Figure
1
illustrates
how
calibrations
are
performed
using
the
SGS
model.
Components
of
the
diagram
are
detailed
in
the
sections
that
follow,
starting
with
the
EDMF
scheme.
Figure
1.
Schematic
illustrating
the
ensemble
Kalman
inversion
pipeline
used
for
online
training
of
a
one‐dimensional
(1D)
atmospheric
model
with
both
physics‐based
and
data‐driven
components
(hybrid
EDMF).
Black
arrows
indicate
fixed
operations
between
components,
and
red
arrows
indicate
dynamic
information
flow
on
the
basis
of
Kalman
updates
to
EDMF
parameters.
The
training
data
comprises
176
LES
simulations
from
the
AMIP
climate,
processed
in
batches
of
16
cases
for
each
ensemble
Kalman
iteration.
Lateral
mixing
rates
are
formulated
as
the
product
of
a
dimensional
scale
γ
and
a
data‐
driven,
nondimensional
function
F
.
Journal
of
Advances
in
Modeling
Earth
Systems
10.1029/2024MS004485
CHRISTOPOULOS
ET
AL.
3
of
22
19422466, 2024, 11, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024MS004485 by California Inst of Technology, Wiley Online Library on [30/01/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2.1.
Eddy‐Diffusivity
Mass‐Flux
(EDMF)
Scheme
Overview
EDMF
schemes
partition
GCM
grid
boxes
into
two
or
more
subdomains,
each
characterized
by
containing
either
coherent
structures
(updrafts)
or
relatively
isotropic
turbulence
(environment).
While
most
SGS
schemes
use
separate
parameterizations
for
the
boundary
layer,
shallow
convection,
deep
convection,
and
stratocumulus
re-
gimes,
the
extended
EDMF
scheme
we
use
(herein
referred
to
as
EDMF)
simulates
all
regimes
in
a
unified
manner
by
making
fewer
simplifying
assumptions
(Thuburn
et
al.,
2018
).
The
scheme
includes
partial
differential
equations
(PDEs)
for
prognostic
updraft
properties
(notably
temperature,
humidity,
area
fraction,
and
mass
flux),
which
are
coupled
to
PDEs
for
environmental
variables
(temperature,
humidity,
and
turbulent
kinetic
energy).
The
physical
skeleton
of
the
EDMF
consists
of
these
coarse‐grained
equations
of
motion
and
houses
a
collection
of
closures,
appearing
as
right‐hand‐side
tendency
terms
for
the
prognostic
variable
equations.
The
EDMF
scheme
we
use
was
initially
introduced
by
Tan
et
al.
(
2018
).
It
contains
closure
functions,
for
example,
for
entrainment
and
detrainment,
which
capture
physics
without
a
known,
closed‐form
expression;
specifying
them
is
necessary
to
fully
define
the
set
of
EDMF
PDEs
such
that
they
can
be
numerically
integrated.
Closures
in
the
EDMF
equations
play
a
role
similar
to
SGS
parameterizations
in
grid‐scale
prognostic
equations.
Tendencies
from
SGS
parameterizations
appear
in
dynamical
core
equations,
and,
similarly,
tendencies
from
closures
appear
in
the
EDMF
equations.
In
the
context
of
GCMs,
the
EDMF
parameterization
predicts
vertical
SGS
fluxes
and
cloud
properties
due
to
unresolved
processes.
The
present
EDMF
parameterization,
which
is
run
at
50
m
vertical
resolution,
has
been
shown
to
effectively
generalize
between
isotropic
and
stretched
vertical
grids
(Lopez‐Gomez
et
al.,
2022
).
Its
prediction
of
second‐order
quantities
such
as
turbulent
kinetic
energy
(TKE),
which
approach
zero
as
the
resolution
increases,
and
its
inherent
SGS
memory
endow
it
with
some
“scale‐aware”
properties
that
become
especially
important
as
convection
begins
to
be
partially
resolved
in
the
“gray
zone”
(Boutle
et
al.,
2014
;
Schneider
et
al.,
2024
;
Tan
et
al.,
2018
);
however,
we
have
not
explicitly
tested
its
resolution
dependence
yet.
Following
domain
decomposition,
the
contributions
of
EDMF
SGS
fluxes