U-FNO - an enhanced Fourier neural operator-based
deep-learning model for multiphase flow
Gege Wen
a
, Zongyi Li
b
, Kamyar Azizzadenesheli
c
, Anima Anandkumar
b
,
Sally M. Benson
a
a
Energy Resources Engineering, Stanford University, 367 Panama
St, Stanford, 94305, CA, USA
b
Computing and Mathematical Sciences, California Institute of Technology, 1200 E.
California Blvd., MC 305-16, Pasadena, 91125, CA, USA
c
Department of Computer Science, Purdue University, 305 N University St, West
Lafayette, 47907, IN, USA
Abstract
Numerical simulation of multiphase flow in porous media is essential for
many geoscience applications. Machine learning models trained with numer-
ical simulation data can provide a faster alternative to traditional simulators.
Here we present U-FNO, a novel neural network architecture for solving mul-
tiphase flow problems with superior accuracy, speed, and data efficiency. U-
FNO is designed based on the newly proposed Fourier neural operator (FNO),
which has shown excellent performance in single-phase flows. We extend the
FNO-based architecture to a highly complex CO
2
-water multiphase prob-
lem with wide ranges of permeability and porosity heterogeneity, anisotropy,
reservoir conditions, injection configurations, flow rates, and multiphase flow
properties. The U-FNO architecture is more accurate in gas saturation and
pressure buildup predictions than the original FNO and a state-of-the-art
convolutional neural network (CNN) benchmark. Meanwhile, it has supe-
rior data utilization efficiency, requiring only a third of the training data to
achieve the equivalent accuracy as CNN. U-FNO provides superior perfor-
mance in highly heterogeneous geological formations and critically important
applications such as gas saturation and pressure buildup “fronts” determina-
tion. The trained model can serve as a general-purpose alternative to routine
numerical simulations of 2D-radial CO
2
injection problems with significant
speed-ups than traditional simulators.
Keywords:
Multiphase flow, Fourier neural operator, Convolutional neural
Preprint submitted to Elsevier
May 6, 2022
arXiv:2109.03697v3 [physics.geo-ph] 4 May 2022
network, Carbon capture and storage, Deep learning
1. Introduction
Multiphase flow in porous media is important for many geoscience ap-
plications, including contaminant transport [1], carbon capture and storage
(CCS) [2], hydrogen storage [3], oil and gas extraction [4], and nuclear waste
storage [5]. Due to the multi-physics, non-linear, and multi-scale nature of
these processes, numerical simulation is the primary approach used to solve
mass and energy conservation equations for these applications [6]. These
numerical simulations are often very time consuming and computationally
intensive since they require fine spatial and temporal discretization to accu-
rately capture the flow processes [7, 8]. Meanwhile, the inherent uncertainty
in property distributions of heterogeneous porous media necessitates proba-
bilistic assessments and inverse modeling to aid engineering decisions [9, 10].
Both of these procedures require large numbers of forward numerical simu-
lation runs and are often prohibitively expensive [11].
A number of machine learning-based methods have been proposed over
the past few years to provide faster alternatives to numerical simulation [12].
Most existing machine learning-based methods can be categorized into the
following two categories: (1) data-driven finite-dimensional operators that
learn Euclidean space mappings from numerical simulation data [13, 14, 15,
16, 17, 18], and (2) physics-informed/ physics-constrained/neural finite dif-
ference learning methods that parameterize the solution functions with a neu-
ral network [19, 20, 21]. The first type, finite-dimensional operators, is often
implemented with convolutional neural networks (CNN). These CNN-based
models have been successful in providing fast and accurate predictions for
high-dimensional and complex multiphase flow problems [17, 22, 18, 23, 24].
However, CNN-based methods are prone to overfitting, therefore requiring
large numerical simulation data sets that can be unmanageable as the prob-
lem dimension grows. Also, the results produced by these models are tied
to the specific spatial and temporal meshes used in the numerical simula-
tion data set. The second approach, often implemented with artificial neural
networks (ANN) (e.g., CNNs [25, 26]), uses neural finite difference meth-
ods that require separate trainings for any new instance of the parameters
or coefficients [19] (
e.g.
, new permeability map or injection rate). There-
fore, these methods require as much computational effort as traditional nu-
2
merical solvers, if not more. Furthermore, for the Buckley-Leverett two-
phase immiscible flow problem that is common for subsurface flow problems,
physics-informed approaches often require observed data or additional diffu-
sive term/physical constraint to improve convergence [27, 28, 29].
Recently, a novel approach, the
neural operator
, has been proposed that
directly learns the infinite-dimensional-mapping from any functional para-
metric dependence to the solution [30, 31, 32, 33]. Unlike neural finite dif-
ference methods, neural operators are data-driven therefore require training
only once. Meanwhile, neural operators are mesh-independent, so they can be
trained and evaluated on different grids. Due to the cost of evaluating global
neural integral operators, previously proposed neural operators have not yet
been able to achieve the desirable degree of computational efficiency [34].
However, one type of neural operator, the Fourier neural operator (FNO),
alleviates this issue through the implementation of a Fast Fourier Trans-
form [34]. The FNO has shown excellent performance on single-phase flow
problems with great generalization ability, and is significantly more data ef-
ficient than CNN-based methods [34].
Here we extend the FNO-based architecture to multiphase flow prob-
lems. We find that while FNO’s testing accuracy is generally higher than
CNN-based models, the training accuracy is sometimes lower due to the reg-
ularization effect of the FNO architecture. To improve upon this, we present
an enhanced Fourier neural operator, named U-FNO, that combines the ad-
vantages of FNO-based and CNN-based models to provide results that are
both highly accurate and data efficient. Through the implementation of the
newly proposed U-Fourier layer, we show that the U-FNO model architec-
ture produces superior performance over both the original FNO [34] and a
state-of-the-art CNN benchmark [18]. We apply the U-FNO architecture to
the highly complex CO
2
-and-water multiphase flow problem in the context
of CO
2
geological storage to predict dynamic pressure buildup and gas sat-
uration. The trained U-FNO models provide an alternative to numerical
simulation for 2D-radial CO
2
injection problems with wide ranges of perme-
ability and porosity heterogeneity, anisotropy, reservoir conditions, injection
configurations, flow rates, and multiphase flow properties.
3
2. Problem setting
2.1. Governing equation
We consider a multi-phase flow problem with CO
2
and water in the con-
text of geological storage of CO
2
. The CO
2
and water are immiscible but have
mutual solubility. The general forms of mass accumulations for component
η
=
CO
2
or
water
are written as [35]:
∂
(
φ
∑
p
S
p
ρ
p
X
CO
2
p
)
∂t
=
−∇·
[
F
CO
2
|
adv
+
F
CO
2
|
dif
]
+
q
CO
2
(1)
∂
(
φ
∑
p
S
p
ρ
p
X
water
p
)
∂t
=
−∇·
[
F
water
|
adv
+
F
water
|
dif
]
.
(2)
Here
p
denotes the phase of
w
(wetting) or
n
(non-wetting). In the siliciclastic
rocks present at most geological storage sites, water is the wetting phase [36].
However, due to the mutual solubility of water and CO
2
, there is a small
amount of CO
2
in the water phase and a small amount of water in the CO
2
phase. Here
φ
is the porosity,
S
p
is the saturation of phase
p
, and
X
η
p
is the
mass fraction of component
η
in phase
p
.
For both components, the advective mass flux
F
η
|
adv
is obtained by sum-
ming over phases
p
,
F
η
|
adv
=
∑
p
X
η
F
p
=
∑
p
X
η
(
−
k
k
r,p
ρ
p
μ
p
(
∇
P
p
−
ρ
p
g
)
)
(3)
where each individual phase flux
F
p
is governed by the multiphase flow exten-
sion of Darcy’s law.
k
denotes the absolute permeability,
k
r,p
is the relative
permeability of phase
p
that non-linearly depends on
S
p
,
μ
p
is the viscosity
of phase
p
that depends on
P
p
, and
g
is the gravitational acceleration.
Due to the effect of capillarity, the fluid pressure
P
p
of each phase is
P
n
=
P
w
+
P
c
(4)
P
w
=
P
w
(5)
where the capillary pressure
P
c
is a non-linear function of
S
p
. Additionally,
porosity
φ
, density
ρ
p
, and the solubility of
CO
2
in Equation 1 and Equation 2
are also non-linear functions that depend on
P
p
. A table of notation is
included in Appendix A.
4
To simplify the problem setting, our simulation does not explicitly include
molecular diffusion and hydrodynamic dispersion. However some unavoid-
able numerical diffusion and numerical dispersion resulting from approximat-
ing spatial gradients using the two-point upstream algorithm [37] is intrinsic
to the numerical simulations used for the neural network training.
2.2. Numerical simulation setting
We use the numerical simulator ECLIPSE (e300) to develop the multi-
phase flow data set for CO
2
geological storage. ECLIPSE is a full physics
simulator that uses the finite difference method with upstream weighting
for spatial discretization and the adaptive implicit method for temporal dis-
cretization [37]. We inject super-critical CO
2
at a constant rate into a radially
symmetrical system
x
(
r,z
) through a vertical injection well with a radius of
0.1 m. The well can be perforated over the entire thickness of the reservoir
or limited to a selected depth interval. We simulate CO
2
injection for 30
years at a constant rate ranging from 0.2 to 2 Mt/year. The thickness of the
reservoir ranges from 12.5 to 200 m with no-flow boundaries on the top and
bottom. We use a vertical cell dimension of 2.08 m to capture the vertical
heterogeneity of the reservoir. The radius of the reservoir is 100,000 m. The
outer boundary is closed, but is sufficiently distant from the injection well
that it behaves like an infinite acting reservoir.
Two hundred gradually coarsened grid cells are used in the radial di-
rection. Grid sensitivity studies show that this grid is sufficiently refined
to capture the CO
2
plume migration and pressure buildup, while remaining
computationally tractable [8]. Simulated values of the gas saturation (
SG
)
and pressure buildup (
dP
) fields at 24 gradually coarsening time snapshots
are used for training the neural nets. Refer to Appendix B for detailed
spatial and temporal discretizations.
2.3. Variable sampling scheme
We sample two types of variables for each numerical simulation case: field
variables and scalar variables. As shown in Figure 1, field variables include
the horizontal permeability map (
k
x
), vertical permeability map (
k
y
), poros-
ity map (
φ
), and injection perforation map (
perf
). The reservoir thickness
b
is randomly sampled in each simulation case and controls the reservoir di-
mension in each of the following field variables. The variable
b
is applied as
an active cell mask to label the rows within the specified thickness.
5
Figure 1: Example of mapping between A. input to B. output gas saturation and C.
pressure buildup. A. Field and scalar channels for each case. Note that the scalar variables
are broadcast into a channel at the same dimension as the field channels. B. Gas saturation
evolution for 6 out of 24 time snapshots. C. Pressure buildup evolution for 6 out of 24
time snapshots.
•
k
x
: The Stanford Geostatistical Modeling Software (SGeMS) [38] is
used to generate the heterogeneous
k
x
maps. SGeMS produces perme-
ability map according to required input parameters such as correlation
lengths in the vertical and radial directions, medium appearances (Ap-
pendix C), as well as permeability mean and standard deviation. A
wide variety of permeability maps representing different depositional
environments are included in the data set and the permeability value
ranges widely from 10 Darcy to 0.001 mD. Appendix C summarizes
statistical parameters that characterize the permeability maps. Note
that in a radially symmetrical system, these maps form rings of hetero-
geneity around the injection well. We do not claim these permeability
maps are realistic models of any reservoir, but use them to demonstrate
the proposed model’s performance in heterogeneous systems.
•
k
y
: The vertical permeability map is calculated by multiplying the
k
x
map by the anisotropy map. To generate the anisotropy map, val-
ues of
k
x
are binned into
n
aniso
materials where each bin is assigned
a randomly sampled anisotropy ratio. The anisotropy ratios are then
assigned to the anisotropy map according to the location of the corre-
6
Table 1: Summary of input variable’s type, sampling range, distribution, and unit. All
input sampling are independent with the exception of porosity and vertical permeability
map. The dimension of field variables are (96,200). *: refer to Appendix C for a detailed
statistical parameter summary for generating heterogeneous
k
x
map.
variable type
sampling parameter
notation
distribution
unit
field
horizontal permeability field
k
x
heterogeneous*
-
# of anisotropic materials
n
aniso
X
∼U{
1
,
6
}
-
material anisotropy ratio
k
x
/k
y
X
∼U
[1
,
150]
-
porosity (perturbation)
φ
∼N
(0
,
0
.
005)
-
reservoir thickness
b
X
∼U
[12.5
,
200]
m
perforation thickness
b
perf
X
∼U
[12
,b
]
m
perforation location
-
randomly placed
-
scalar
injection rate
Q
X
∼U
[0
.
2
,
2]
MT/y
initial pressure
P
init
X
∼U
[100
,
300]
bar
iso-thermal reservior temperature
T
X
∼U
[35
,
170]
◦
C
irreducible water saturation
S
wi
X
∼U
[0
.
1
,
0
.
3]
-
van Genuchten scaling factor
λ
X
∼U
[0
.
3
,
0
.
7]
-
sponding
k
x
. Note that the anisotropy ratio is uncorrelated with the
magnitude of the radial permeability. This procedure roughly mimics
a facies-based approach for assigning anisotropy values.
•
φ
: Previous studies show that porosity and permeability are loosely
correlated with each other [39]. Therefore, to calculate porosity we
first use the fitting relationship presented in Pape et al [39] and then
perturb these values with a random Gaussian noise
with mean value
of zero and standard deviation of 0.001.
•
perf
: The injection interval thickness
b
perf
is randomly sampled within
the range from 12.5 m to the specific reservoir thickness
b
of that case.
We placed the perforation interval on the injection well, by randomly
sampling the depth of the perforation top from 0 m to (
b
−
b
perf
) m.
Visualizations of the above field variables are shown in Appendix C. Table 1
summarizes the parameter sampling ranges and distributions that are used
to generate these field variables. The sampling parameters are independent
of each other with the exception of porosity and permeability.
Scalar variables include the initial reservoir pressure at the top of the
reservoir (
P
init
), reservoir temperature (
T
), injection rate (
Q
), capillary pres-
sure scaling factor (
λ
) [40], and irreducible water saturation (
S
wi
). The pa-
rameter sampling range and distributions are summarized in Table 1. While
7
the scalar variables
P
init
and
T
and determined independently, cases that
yield unrealistic combinations of these variables are excluded. These field
and scalar input variables create a very high-dimensional input space, which
often requires massive training data to avoid overfitting when using tradi-
tional CNN-based models.
3. Methods
The goal of a neural operator is to learn an infinite-dimensional-space
mapping from a finite collection of input-output observations. To formulate
the problem, we define the domain
D
⊂
R
d
be a bounded and open set;
A
be the input function space;
Z
be the output function space.
A
and
Z
are separable Banach spaces of functions defined on
D
that take values in
R
d
a
and
R
d
z
respectively.
G
†
:
A → Z
is a non-linear map that satisfies
the governing PDEs. Suppose we have
a
j
that are drawn from probability
measure
μ
in
A
, then
z
j
=
G
†
(
a
j
). We aim to build an operator
G
θ
that
learns an approximation of
G
†
by minimizing the following problem using a
cost function
C
.
min
θ
E
a
∼
μ
[
C
(
G
θ
(
a
)
,
G
†
(
a
))]
(6)
Since
a
j
∈ A
and
z
j
∈ Z
are both functions, we use
n
-point discretiza-
tion
D
j
=
{
x
1
,...,x
n
} ⊂
D
to numerically represent
a
(
x
)
j
|
D
j
∈
R
n
×
d
a
and
z
(
x
)
j
|
D
j
∈
R
n
×
d
z
. In this paper,
a
(
x
)
j
represents the field and scalar variables
described in Section 2.3;
z
(
x
)
j
represents the outputs of temporally varying
gas saturation and pressure buildup fields. We demonstrate in this section
that the proposed U-FNO architecture learns the infinite-dimensional-space
mapping
G
θ
from a finite collections of
a
(
x
)
j
and
z
(
x
)
j
pairs utilizing integral
kernel operators in the Fourier space. Figure 2 provides a schematic of the
U-FNO architecture. A table of notation is included in Appendix A.
3.1. Integral kernel operator in the Fourier space
We define the integral kernel operator (illustrated as the yellow boxes in
Figure 2b and c) by
(
K
(
v
l
)
)
(
x
) =
∫
D
κ
(
x,y
)
v
l
(
y
)d
v
l
(
y
)
,
∀
x
∈
D.
(7)
To efficiently parameterize kernel
κ
, the FNO method considers the rep-
resentation
v
l
(and also
v
m
) in the Fourier space and utilizes Fast Fourier
8
Figure 2: A. U-FNO model architecture.
a
(
x
) is the input,
P
and
Q
are fully connected
neural networks, and
z
(
x
) is the output. B. Inside the Fourier layer,
F
denotes the Fourier
transform,
R
is the parameterization in Fourier space,
F
−
1
is the inverse Fourier transform,
W
is a linear bias term, and
σ
is the activation function. C. Inside the U-FNO layer,
U
denotes a two step U-Net, the other notations have identical meaning as in the Fourier
layer.
Transform (FFT) [34]. By letting
κ
(
x,y
) =
κ
(
x
−
y
) in Equation 7 and
applying the convolution theorem, we can obtain
(
K
(
v
l
)
)
(
x
) =
F
−
1
(
F
(
κ
)
·F
(
v
l
)
)
(
x
)
,
∀
x
∈
D
(8)
where
F
denotes a Fourier transform of a function
f
:
D
→
R
c
and
F
−
1
is
its inverse. Now, we can parameterize
κ
directly by its Fourier coefficients:
(
K
(
v
l
)
)
(
x
) =
F
−
1
(
R
·F
(
v
l
)
)
(
x
)
,
∀
x
∈
D.
(9)
where
R
is the Fourier transform of a periodic function
κ
. Since we assume
that
κ
is periodic, we can apply a Fourier series expansion and work in the
discrete modes of Fourier transform.
We first truncate the Fourier series at a maximum number of modes
k
max
,
and then parameterize
R
directly as a complex valued (
k
max
×
c
×
c
)-tensor
with the truncated Fourier coefficients. As a result, multiplication by the
9
learnable weight tensor
R
is
(
R
·F
(
v
l
)
)
k,i
=
c
∑
j
=1
R
k,i,j
(
F
(
v
l
))
k,j
,
∀
k
= 1
,...,k
max
, i
= 1
,...,c.
(10)
By replacing the
F
by the FFT and implementing
R
using a direct linear
parameterization, we have obtained the Fourier operator as illustrated in
Figure 2B and C with nearly linear complexity.
3.2. U-FNO architecture
The U-FNO architecture contains the following three steps:
1. Lift input observation
a
(
x
) to a higher dimensional space
v
l
0
(
x
) =
P
(
a
(
x
)) through a fully connected neural network transformation
P
.
2. Apply iterative Fourier layers followed by iterative U-Fourier layers:
v
l
0
7→
...
7→
v
l
L
7→
v
m
0
7→
...
7→
v
m
M
where
v
l
j
for
j
= 0
,
1
,...,L
and
v
m
k
for
k
= 0
,
1
,...,M
are sequences of functions taking values in
R
c
for channel dimension
c
.
3. Project
v
m
M
back to the original space
z
(
x
) =
Q
(
v
m
M
(
x
)) using a fully
connected neural network transformation
Q
.
Within each newly proposed U-Fourier layer (Figure 2c), we have
v
m
k
+1
(
x
) :=
σ
(
(
K
v
m
k
)
(
x
) +
(
U
v
m
k
)
(
x
) +
W
(
v
m
k
(
x
))
)
,
∀
x
∈
D
(11)
where
K
is the kernel integral transformation defined above,
U
is a U-Net
CNN operator, and
W
is a linear operator, which are all learnable.
σ
is
an activation function that introduces strong non-linearity to each U-Fourier
layer. Refer to Li et al. [34] for the formulation of the original Fourier layer.
3.3. Characteristics of the U-Fourier layer
In contrast to the original Fourier layer in FNO [34], the U-FNO architec-
ture proposed here appends a U-Net path in each U-Fourier layer. The U-Net
processes local convolution to enrich the representation power of the U-FNO
in higher frequencies information. The number of Fourier and U-Fourier lay-
ers,
L
and
M
, are hyperparameters that can be optimized for the specific
problem. For the multi-phase flow problem considered here, we found that
the architecture with half Fourier layers and half U-Fourier layers achieves
10
the best performance, compared to architectures with all Fourier layers or
all U-Fourier layers.
Note that the Fourier neural operator is an infinite-dimensional-operator,
which generates mesh-free/resolution invariant predictions. However, when
we append the U-Net block, we introduced the CNN-based path that does
not inherently provide the flexibility of training and testing at different dis-
cretizations. We made this choice because the CO
2
-water multiphase flow
problem is very sensitive to numerical dispersion and numerical dissolution,
which are both tied to a specific grid resolution. When training and testing
at different grid dimensions, the numerical noise is often transformed in a
nonphysical way. As a result, for this problem, we prioritize achieving higher
training and testing accuracy, which the U-FNO provides. Nevertheless, un-
der the circumstance where one wants to test the U-FNO model at unseen
grid resolutions, we developed additional down-sampling and up-sampling
techniques that can be applied to the U-Net component to re-introduce the
resolution invariant feature. An example of this technique and its perfor-
mance are discussed in Section 5.3.
Finally, the U-Fourier layer’s performance improvement is not limited to
the spatial-temporal 3D multiphase flow problem considered in this paper.
We found that the U-FNO’s 2D variation also outperforms the original FNO-
2D in a steady-state Darcy’s flow problem. Refer to Appendix D for details.
3.4. Data configuration
This section describes the configuration of the inputs and outputs for the
proposed U-FNO architecture. For the data input, each of the field variables
in Figure 1A is represented by a channel. Since we use a gradually coarsening
radial grid for the numerical simulations, a logarithm conversion in the radial
direction is applied in training to project the field variables onto a uniform
grid that can be represented by a (96
,
200) matrix. Notice that reservoir
thickness is also a variable and 96 cells represents a 200 m thick reservoir.
When the reservoir is thinner than 200 m, we use zero-padding to denote
cells that are outside of the actual reservoir. For the scalar variables, the
values are simply broadcast into a matrix with dimension of (96
,
200).
In addition to the input variables, we also supply the spatial grid infor-
mation to the training by using one channel to denote radial cell dimensions
and another channel to denote vertical cell dimensions. The temporal grid
information is supplied into the network as an additional dimension. The in-
put to each data sample is constructed by concatenating the field variables,
11
scalar variables, spatial grids, and temporal grid together.
For the gas saturation and pressure buildup outputs as shown in Figure 1B
and C, we use the same logarithm conversion to project the outputs onto a
uniform grid. We then concatenate the outputs for different time snapshots
to obtain a spatial-temporal 3D volume. The pressure buildup is normalized
into zero-mean and unit-variance distribution. For gas saturation, we do
not normalize the data because the saturation values always range from 0
to 1. The dimensions of the input and outputs are shown for in each model
architecture (Appendices D to G).
The data set contains 5,500 input-to-output mappings. We use a 9/1/1
split to segregate the data set into 4,500 samples for training, 500 samples
for validation, and 500 samples for testing.
3.5. Loss function design and training
We use a relative
lp
-loss to train the deep learning models. The
lp
-loss is
applied to both the original output (
y
(
r,z,t
)) and the first derivative of the
output in the
r
-direction (
d
y
/
d
r
), and is written as:
L
(
y,
ˆ
y
) =
||
y
−
ˆ
y
||
p
||
y
||
p
+
β
||
d
y
/
d
r
−
ˆ
d
y
/
d
r
||
p
||
d
y
/
d
r
||
p
,
(12)
where ˆ
y
is the predicted output,
ˆ
d
y
/
d
r
is the first derivative of the predicted
output,
p
is the order of norm, and
β
is a hyper-parameter. This relative
loss has a regularization effect and is particularly effective when the data
have large variances on the norms. Our experiments show that, compared
to an
MSE
-loss, a relative loss significantly improves the performance for
both gas saturation and pressure buildup. The second term in Equation 12
greatly improves quality of predictions for gas saturation at the leading edge
of the plume. Similarly this term improves prediction of the sharp pressure
buildup around the injection well. We use the
l
2-loss for gas saturation and
pressure buildup since it provides faster convergence than the
l
1-loss.
As described in Section 2, our data set contains reservoirs with various
thicknesses and the cells outside of the reservoir are padded with zeros for
both input and output. To accommodate for the variable reservoir thick-
nesses, during training, we construct an active cell mask for each data sam-
ple and only calculate the loss within the mask. Our experiments show that
this loss calculation scheme achieves better performance than calculating the
whole field because of the better gradient distribution efficiency.
12
During training, the initial learning rate is specified to be 0.001 and the
learning rate gradually decreases with a constant step and reduction rate.
These hyper-parameters are optimized for the gas saturation and pressure
buildup model separately. The training stops when the loss no longer de-
creases, which is 100 and 140 epochs for the gas saturation and pressure
buildup model respectively.
4. Results
This section compares 4 types of model architectures: original FNO pro-
posed in Li et al. [34], the newly proposed U-FNO in this paper, a conv-FNO
that uses a
conv3d
in the place of the U-Net, and the state-of-the-art bench-
mark CNN used in Wen et al. [18]. All models are trained on the proposed
loss function (Equation 12) and directly output the 3D (96
×
200
×
24) gas
saturation and pressure field in space and time. Detailed parameters for each
model are summarized in Appendices D to G.
4.1. Gas saturation
Figure 3: Training and validation relative loss evolution vs. epoch for U-FNO, FNO,
conv-FNO and CNN benchmark for A. gas saturation and B. pressure buildup.
Figure 3A demonstrates that the best performance for both the training
and validation data set is achieved with the U-FNO model. Interestingly,
for the gas saturation model, we notice that although the original FNO has
a higher training relative loss than the CNN benchmark, the validation rel-
ative loss by the original FNO is lower than that of the CNN benchmark.
This indicates that FNO has excellent generalization ability and achieves
better performance than the CNN even though FNO has a higher training
13
relative loss. Nevertheless, the original FNO has the highest relative loss
in the training set due to the inherent regularization effect by using a finite
set of truncated Fourier basis. The Conv-FNO and U-FNO architecture is
therefore designed to enhance the expressiveness by processing the higher
frequency information that are not picked up by the Fourier basis. We can
observe from Figure 3A that the training loss is significantly improved even
by simply adding a plain
conv3d
in the Conv-FNO case. When the FNO
layer is combined with a U-Net in the U-FNO case, the model takes the ad-
vantages of both architectures and consistently produces the lowest relative
loss throughout the entire training (Figure 3A).
Figure 4: A. Gas saturation testing set plume mean absolute error (
MPE
) and plume
R
2
scores (
R
2
plume
) using CNN, FNO, conv-FNO, and U-FNO. B. Pressure buildup field
mean relative error (
MRE
) and
R
2
scores using the same four models.
Figure 4A demonstrates the testing set plume mean absolute error (
MPE
)
and plume
R
2
scores (
R
2
plume
) for each model architectures. We evaluate the
gas saturation models’ accuracy within the CO
2
separate phase plume be-
cause the gas saturation outside of the plume is always 0. Here “within the
plume” is defined as non-zero values in either data or prediction. The test-
ing set results represent the predictability of the model on truly unseen data
and U-FNO achieves the best performance with the lowest
MPE
and high-
est
R
2
plume
. Comparing to the benchmark CNN, the average test set
MPE
using U-FNO is 46% lower while the
R
2
plume
increased from 0.955 to 0.981.
We can also compare the degree of overfitting by calculating the difference
between the training and testing set
MPE
(refer to Appendix I for training
set
MPE
). For example, the average
MPE
difference in CNN is 70% higher
than in U-FNO (1.0% and 0.3% respectively).
In addition to considering the average performance over the entire train-
ing, validation, and testing sets, we also compare model predictions for four
different cases with varying degrees of complexity in Figure 5. For each case,
14
Figure 5: Visualizations and scatter plots for example a to d. In each example, visu-
alizations show the true gas saturation (
SG
), U-FNO predicted, U-FNO absolute error,
CNN predicted, and CNN absolute error. The mean absolute error
μ
MAE
is labeled on
the U-FNO and CNN absolute error plots. Scatter plots shows numerical simulation vs.
predicted by U-FNO and CNN model on each grid. The legend for all of the scatter plots
is shown in the bottom right.
Figure 5 shows a comparison between the predicted and true values of the
CO
2
saturation for each grid cell in the model over the entire 30 year injec-
tion period. The U-FNO has superior performance compared to the CNN for
all of these examples as quantified by the higher
R
2
value and narrower 95%
prediction bands. Case b. and d. are especially obvious examples in which
the U-FNO successfully predicts the complicated horizontal saturation vari-
ations where the CNN ignores the heterogeneity and simply predicts more
uniform saturation fields.
15
4.2. Pressure buildup
For pressure buildup, the U-FNO also achieves the lowest relative error
for both training and validation data sets. As shown in Figure 3B, the
training and validation relative errors for the U-FNO are consistently low
throughout the training process. Figure 4 shows U-FNO’s superior testing set
performance in field mean relative error (
MRE
) and
R
2
score. Specifically,
the test set average
MRE
is reduced by 24% from CNN to U-FNO. By
comparing the differences between the training and testing sets in Appendix
I, we can observe that all FNO-based models produce smaller overfitting
compared to CNN.
The superior performance of the U-FNO for pressure buildup predictions
is also demonstrated for the four examples shown in Figure 6. In each case
the U-FNO has higher R
2
values and narrower 95% prediction bands. Unlike
the gas saturation outputs, pressure buildup distributions are challenging to
predict since they have a larger radius of influence and larger differences be-
tween cases. For example, the maximum pressure buildup in the 4 examples
shown in Figure 6 varies from
∼
20 bar to
∼
220 bar. Notice that the the CNN
model especially struggles with cases that have large radius of influence (e.g.
case d) while the U-FNO model maintains excellent accuracy at locations
that are far away from the injection well.
5. Discussion
5.1. U-FNO’s advantages over CNN
5.1.1. Data utilization efficiency
The results in Section 4 demonstrate the excellent generalization ability
of the FNO-based architectures. To further compare the data utilization ef-
ficiency of the newly proposed U-FNO model with the benchmark CNN, we
train each model using various numbers of samples and plotted the testing
set
MPE
and
MRE
in Figure 7. Each model is trained for the same number
of epochs. For gas saturation, the CNN requires up to 3.4 times more train-
ing data to achieve the same level of performance as the U-FNO. Similarly,
the pressure buildup CNN requires 2.4 times more training data to achieve
a test set
MRE
of 1%. In practical terms, the U-FNO saved 530 and 440
CPU hours in data set generation for the gas saturation and pressure buildup
models respectively (for a reference CNN model trained with 4500 training
samples). Figure 7 also indicates that the CPU hours saved by using U-FNO
16
Figure 6: Visualizations and scatter plots for examples a to d. In each example, visualiza-
tions show the true pressure buildup (
dP
), U-FNO predicted, U-FNO relative error, CNN
predicted, and CNN relative errors. The relative errors are defined as in [23]; the mean
relative error
μ
MRE
is labeled on the U-FNO and CNN relative error plots. Scatter plots
shows numerical simulation vs. predicted by U-FNO and CNN model on each grid. The
legend for all of the scatter plots is shown in the bottom right.
grows increasingly as test set errors reduces. The U-FNO’s data utilization
efficiency greatly alleviates the computational resource needed in data gen-
eration and training, therefore can better support complex high-dimensional
problems.
5.1.2. Accuracy in the “front” determination
Gas saturation and pressure buildup “fronts” are important quantities for
CO
2
storage projects and are often used for regulatory oversight [41], moni-
toring, or history matching [42] purposes. The distance to the gas saturation
“front” corresponds to the maximum extent of the plume of separate phase
CO
2
. The pressure buildup “front” often refers to the radius at a specified
17
Figure 7: A. Gas saturation testing set
MPE
vs. training size for U-FNO and CNN.
The grey text labels the CNN to U-FNO training size ratios. The CPU hours saved are
calculated using the average simulation time and linear interpolation of the
MPE
vs.
training size relationship. B. Pressure buildup testing set
MRE
vs. training size for
U-FNO and CNN. The CPU hours saved are calculated same as above.
threshold value of pressure buildup because pressure fields are smooth. In
this experiment, we compare the accuracy of the U-FNO and CNN models
to evaluate the gas saturation and pressure buildup “fronts”. Table 2(a) and
(b) shows that the U-FNO is 2.7 times more accurate than the CNN for sat-
uration “front” prediction and 1.8 times more accurate for pressure “front”
prediction.
5.1.3. Accuracy in the heterogeneous geological formations
The U-FNO is more accurate than the CNN for highly heterogeneous
geological formations. The training data set includes a wide variety of ho-
mogeneous to heterogeneous permeability maps. For this comparison, we se-
lected the most “heterogeneous” and most “homogeneous” formations from
the testing set that have the highest and lowest 10% permeability standard
deviations. Table 3 summarized the average gas saturation
MPE
in both
18