Bayesian Numerical Homogenization

Numerical homogenization, i.e. the finite-dimensional approximation of solution spaces of PDEs with arbitrary rough coefficients, requires the identification of accurate basis elements. These basis elements are oftentimes found after a laborious process of scientific investigation and plain guesswork. Can this identification problem be facilitated? Is there a general recipe/decision framework for guiding the design of basis elements? We suggest that the answer to the above questions could be positive based on the reformulation of numerical homogenization as a Bayesian Inference problem in which a given PDE with rough coefficients (or multi-scale operator) is excited with noise (random right hand side/source term) and one tries to estimate the value of the solution at a given point based on a finite number of observations. We apply this reformulation to the identification of bases for the numerical homogenization of arbitrary integro-differential equations and show that these bases have optimal recovery properties. In particular we show how Rough Polyharmonic Splines can be re-discovered as the optimal solution of a Gaussian filtering problem.

If the prior on C[0, 1] is that of a Brownian Motion (i.e. f (t) = B t where B t is a Brownian motion and B 0 is normal), then E f (x) f (x 1 ), . . . , f (x n ) is the piecewise linear interpolation of f between the points x 1 , . . . , x n and one re-discovers the trapezoidal quadrature rule.
If the prior on C[0, 1] is that of the first integral of a Brownian Motion (i.e. f (t) ∼ t 0 B s ds) then the posterior E f (x) f (x 1 ), . . . , f (x n ) is the cubic spline interpolant and integrating k times yields splines of order 2k + 1. Although this link has lead to the identification of new quadrature rules for numerical integration [49], it appears to have remained little known and our paper is prompted by the question of the existence of similar link between Bayesian Inference and Numerical Homogenization.
As a prototypical example, consider the numerical homogenization of the PDE − div a(x)∇u(x) = g(x) x ∈ Ω; g ∈ L 2 (Ω), where Ω is a bounded subset of R d with piecewise Lipschitz boundary, a is a symmetric, uniformly elliptic d × d matrix on Ω and with entries in L ∞ (Ω).
As for the identification of quadrature rules in the numerical homogenization, the identification of accurate basis elements in numerical homogenization has been based on a difficult process of scientific investigation. Let us now turn our attention to the Bayesian approach to this problem. An immediate question is where do we place the prior? (1) If the prior is placed on u then posterior values do not see (depend on) the microstructure.
(2) If the prior is placed on a then the microstructure becomes random whereas our purpose is the numerical homogenization of a given deterministic microstructure. Let us also note that the randomization of the microstructure, as investigated by Polynomial Chaos Approximation/Stochastic Expansion methods [35,34,64,8,30,21], does not lead to the simplification in the homogenization process but to increased complexity with the dimension of input stochastic variables [58,14] (although Stochastic Expansion methods have been used successfully to beat Monte-Carlo sampling they do not lead to averaging results seen in homogenization). (3) If the prior is placed on g then the noise propagates through the microstructure and the posterior value of u contains that information.
This observation motivates us to place the prior on the source term g in (1.1), e.g., replace it by white noise (i.e. a centered Gaussian field on Ω with covariance function δ(x − y)) and consider the stochastic PDE Observe that the solution (1.2) at the point x, u(x), is a random variable and its best (mean squared) approximation given u(x 1 ), . . . , u(x N ) (its values at the points x 1 , . . . , x N ) is its conditional expectation E u(x) u(x 1 ), . . . , u(x N ) . One result of this paper is that where the functions φ i are Rough Polyharmonic Splines [53] (RPS) which have been identified as accurate basis elements for the numerical homogenization of (1.1) having noteworthy variational, optimal recovery and localization properties. The discovery of these Rough Polyharmonic Splines has required a significant amount of work and trial and errors but here, they are identified after a single step of Bayesian conditioning. This observation motivates us to investigate what the same process of Bayesian conditioning would give under different different priors and under other observations than the values of u at individual points. In particular, we will use this link between Bayesian Inference and Numerical Homogenization to identify bases for the numerical homogenization of arbitrary linear integro-differential equations. Our purpose is to show that this link is generic and could in principle be used, beyond numerical homogenization, as a guiding principle for the coarse-graining of multi-scale systems. The Bayesian approach to this problem is to (1) Put a prior on the degrees of freedom of the system (2) Select a finite number of coarse variables (3) Compute the posterior value of the state of the system conditioned on the coarse variables.

General setup
Let L and B be linear integro-differential operators on Ω and ∂Ω such that (1)  Consider the integro-differential equation As with (1.1) the numerical homogenization of (2.1) will require the assumption that g belongs to a strict subspace of H L (Ω). We will assume that L and B are such that (2.1) (1) admits a unique solution in H(Ω) (2) and a Green's function G. Recall that G is defined as the solution of LG(x, y) = δ(x − y) x ∈ Ω, BG(x, y) = 0 for x ∈ ∂Ω. (2.2) where δ(· − y) is the Delta mass of dirac at the point y.
Example 2.1. Note that for the prototypical example (1.1) we have Our purpose is to identify a good basis for the numerical homogenization or coarsegraining of (2.1).

Bayesian Numerical Homogenization
Our Bayesian approach to the numerical homogenization of (2.1) is to replace the source term g by a Gaussian field ξ. More precisely we introduce ξ, a centered Gaussian field on Ω with covariance function and consider the stochastic integro-differential equation which is the Kernel of L * L (i.e., L * LΓ(x, y) = δ(x − y), where L * is the adjoint of L).
Proof. Since L and B are linear operators, u is a linear function of ξ and is therefore a Gaussian field. Moreover its covariance function is given by which finishes the proof.

On the choice of the noise
We will show that the choice of the noise ξ is determined by the regularity of the source term g in the right hand side of (2.1). More precisely if ξ is white noise then the resulting accuracy estimates will be obtained under the assumption that g ∈ L 2 (Ω) and as a function of g L 2 (Ω) If ξ is not white noise (i.e. if its covariance function is not δ(x − y)) then we assume that there exists two linear integro-differential operators L Λ and B Λ such that ξ is the stochastic solution of the following equation with white noise ξ ′ as source term: In what follows, if ξ is not white noise then we assume it to be obtained as in (3.6) and the resulting accuracy estimates will be obtained under the assumption that L Λ g ∈ L 2 (Ω) and as a function of L Λ g L 2 (Ω) . A prototypical example corresponds to the situation where ξ is obtained as the regularization of white noise via a power of the Laplace Dirichlet operator on Ω and this allows us to identify optimal recovery bases under the assumption that g ∈ H s (Ω) with s ≥ 0 or s < 0.

Identification of basis elements via conditioning
Let N be a strictly positive integer. Our Bayesian approach is based on the conditioning of the solution of (3.2) posterior to the observation of N linear functions of u(x), expressed as where ψ 1 , . . . , ψ N are N linearly independent generalized functions on Ω such that for all i Note that (3.8) implies that if u is the solution of (3.2) then is a well defined center Gaussian random vector with covariance matrix Θ.
where v is the solution of Observe that since v is the solution of (3.12) and v 2 L 2 (Ω) = l T Θl, it follows that Θ is symmetric positive definite. Indeed if Θ is not positive definite, then there would exist a non zero vector l ∈ R N such that Θl = 0. This would imply v L 2 (Ω) = 0 which is a contradiction since the equation (3.12) has a non zero solution (since l = 0 and the ψ i are linearly independent).
Our motivation for using Gaussian noise in (3.2) lies in the fact that for Gaussian fields, conditional expected values can be computed via linear projection. Henceforth our approach is also akin to Gaussian filtering for numerical homogenization and the following Theorem shows that this approach allows for the identification of a (projection) basis φ i . Theorem 3.5. Let u be the solution of (3.2) and Ψ defined by (3.10), then with and Furthermore, u(x) conditioned on the value of Ψ is a Gaussian random variable with mean (3.14) and variance Since u and Ψ belong to the same Gaussian space, it follows that u Ψ is a linear function of Ψ obtained by minimizing the mean squared error with respect to c ∈ R N , where Θ is defined by (3.9). We conclude the proof by identifying the minimizer in c, using Lemma 3.4 for the invertibility of Θ and noting that (3.17) is simply (3.19) at the minimum in c.
Example 3.1. If L and B correspond to the prototypical example (1.1) (see also Example 2.1), if ξ is white noise (i.e. if its covariance matrix is Λ(x, y) = δ(x − y)), and if the observable functions are masses of Diracs at points x i ∈ Ω (and d ≤ 3 which is required for (3.8)), then Theorem 3.5 implies (1.3) and the basis elements φ i are the RPS elements of [53] which are a generalization of Polyharmonic Splines to PDEs with rough coefficients. Recall that Polyharmonic splines can be traced back to the seminal work of Harder and Desmarais [40] and Duchon [22,23,24]. Note also that according to Theorem (3.5) the process of Bayesian conditioning gives us the whole posterior distribution of u(x) and not only its (conditional) expected value. In particular, the distribution of u(x) conditioned on u(x 1 ), . . . , u(x N ) is a Gaussian random variable with mean (1.3) and variance and this observation can be used to compute the probability of deviation of the RPS interpolation from u(x) by a given margin and guide the addition of interpolation points (note that σ 2 (x) = 0 at the interpolation points x 1 , . . . , x N ).
Remark 3.6. We will show in Theorem 5.1 that σ(x) also controls the pointwise error between the solution of the original integro-differential equation (2.1) and the approximation N i=1 φ i (x) Ω u(y)ψ i (y) dy.

Variational properties of basis elements
In this section we will show that as for RPS [53], the basis elements φ i from Bayesian Inference have remarkable variational and optimal recovery properties that can be used (1) for their practical computation (2) for the derivation of accuracy estimates.

White Gaussian noise
In this subsection we will assume that ξ is white noise (i.e. Λ(x, y) = δ(x − y)). Define V := φ ∈ H(Ω) Lφ ∈ L 2 (Ω) and Bφ = 0 on ∂Ω (4.1) and let ·, · be the (scalar) product on V defined by: for u, v ∈ V ,  and (by Cauchy-Schwartz inequality and Γ(·, x), and consider the the following optimization problem over V i : Proposition 4.2. V i is a non-empty closed affine subspace of V . Problem (4.9) is a strictly convex quadratic optimization problem over V i . The unique minimizer of (4.9) is φ i as defined by (3.16).
Proof. Let us first prove that φ i ∈ V i . Let and Bθ i (x) = 0 on ∂Ω. Noting that Lθ i 2 L 2 (Ω) = Θ i,i we deduce from Lemma 3.4 that θ i ∈ V . We conclude from (3.16) that φ i ∈ V . Now observing that where δ i,i = 1 and δ i,j = 0 for j = i. We conclude that φ i ∈ V i which implies that V i is non empty (it is easy to check that it is a closed affine sub-space of V ). Now let us prove that problem (4.9) is a strictly convex optimization problem over and we need to show that f (λ) is a strictly convex function. Observing that and noting that v − w, v − w > 0 (otherwise one would have v = w) we deduce that f is strictly convex in λ. We conclude that (see, for example, [29, pp. 35, Proposition 1.2]) that Problem (4.9) is a strictly convex optimization problem over V i and that it admits a unique minimizer in V i . We will postpone the proof of the fact that φ i is the minimizer of (4.9) to the proof of Theorem 4.6.

Remark 4.3.
It is important to note that in practical (numerical) applications each element φ i would be obtained by solving the quadratic optimization problem (4.9) rather than through the representation formula (3.16). Note also that, if u is the (stochastic) solution of (3.2), then φ i is also equal to the expected value of u(x) conditioned on Remark 4.4. A simple calculation allows us to show that φ i is also the solution of the following nested equations where (L * , B * ) is the adjoint operator associated with G(y, x) (the transpose of G(x, y)).

Remark 4.5.
Another simple calculation allows us to show that φ i is also the solution of the following nested equations where c ∈ R N is an unknown vector determined by the third equation in (4.19).
Write V 0 the subset of V defined by Theorem 4.6. It holds true that • The basis φ i is orthorgonal to V 0 with respect to the product ·, · , i.e.
• For all i, j ∈ {1, . . . , N }, Remark 4.7. Theorem and its proof is analogous to the optimal property of strictly conditionally positive definite kernels [61] when used as interpolant solutions of the optimal recovery problem [38].
5 Accuracy of the basis elements φ i

Pointwise estimates
Let v V be defined as in (4.3).
where σ 2 (x) is the variance of u(x) (solution of (3.2)) conditioned on Ω u(y)ψ 1 (y) dy, . . . , Ω u(y)ψ N (y) dy as defined by (3.17). In particular if u is the solution of the original integro-differential equation (2.1), then if φ i , σ are derived from white noise, and if φ i , σ are derived from the noise with covariance function Λ described in (3.6).
Proof. Let v ∈ V and x ∈ Ω. Using the reproducing kernel property of Theorem 4.1 we obtain that We conclude by expanding the right hand side of (5.5) and the definition φ i (x) = N j=1 Θ −1 i,j Ω Γ(x, y)ψ i (y) dy.
Remark 5.2. σ 2 (x) is also known as the Power function in radial basis function interpolation [61,33]. The proof of Theorem is similar to the one used to derive local error estimates for radial basis function interpolation of scattered data (see [63] in which σ 2 (x) was referred to as the Kriging function, a terminology coming from geostatistics [47]).

H(Ω)-norm estimates
Let V 0 be the subset of V defined by (4.21). Write Where . H(Ω) is the natural norm associated with the space on which the operator L is defined.
and ρ(V 0 ) is the smallest constant for which (5.7) holds for all v ∈ V .
which concludes the proof.
Remark 5.4. Observe that Theorem 5.3 implies that if u is the solution of the original integro-differential equation (2.1) and φ i , σ are derived from white noise, then Similarly, if φ i , σ are derived from the noise with covariance function Λ described in (3.6), then where C depends only on λ min (a), λ max (a) and H where λ max (a) := sup x∈Ω,l =0 l T a(x)l/|l| 2 , λ min (a) := inf x∈Ω,l =0 l T a(x)l/|l| 2 and H is the mesh-norm and Let us also recall that the proof of (5.13) is based on the following Poincaré inequality and we conclude by applying Poincar'e inequality to the L 2 -norm of v within each cell We will give the last example as a theorem.
Theorem 5.6. Let L and B be as in the prototypical example (1.1) (Example 2.1), let ξ be white noise, and let x i ∈ Ω be N arbitrary points of Ω with mesh norm H (see (5.14)). Let ψ 1 , . . . , ψ N be linearly independent generalized probability densities on Ω with (possibly overlapping) support support(ψ i ). Define Proof. The proof of (5.20) is simply based on the observation that if v ∈ V 0 then (since Ω v(x)ψ i (x) dx = 0) there exists N points y 1 , . . . , y N such that v(y i ) = 0 and the mesh norm of those points is bounded by H. Therefore we can apply the result of Example 5.1.

Statistical Decision Theory
Another motivation for exploring Bayesian approximations of the solution space, lies in the decision theory/game theory approach to numerical homogenization. In this approach one looks at the numerical homogenization problem (1.1) as a game where player A chooses a function θ of the observable u(x 1 ), . . . , u(x N ) and player B chooses a source term g in a class/set A. These two choices combine and form an error term E(θ, g) = u(x) − θ u(x 1 ), . . . , u(x N ) (6.1) Player's A objective is to minimize the error (6.1) while player's B objective is to maximize it. A surprising result from Wald's Decision Theory [60] (stemming out of Von Neumann's Game Theory [59]) is that, although such games are deterministic, under weak regularity conditions, the optimal strategy for player B is to play at random by placing an optimal prior distribution on the set of candidates for g and, similarly, the best strategy for player A lives in the Bayesian class obtained by placing an (optimal) prior on the set of candidates for g. Of course, in the present work we do not know, nor have investigated the optimality of the Gaussian prior (on g) in the game theoretic sense described above. Our choice has been motivated by the simplicity (linearity) of calculations associated Gaussian filtering and the fact that such a choice allows for the identification of accurate numerical homogenization bases with optimal recovery properties (Theorem 4.6).