Multifidelity uncertainty quantification and model validation of large-scale multidisciplinary systems

Abstract. A simulation-based framework for multifidelity uncertainty quantification is presented, which informs and guides the design process of complex, large-scale, multidisciplinary systems throughout their life cycle. In this framework, uncertainty in system models is identified, characterized, and propagated in an integrated manner through the analysis cycles needed to quantify the effects of uncertainty on the quantities of interest. This is part of the process to design systems and verify their compliance to performance requirements. Uncertainty quantification is performed through mean and variance estimators as well as global sensitivity analyses. These computational analyses are made tractable by the use of multifidelity methods, which leverage a variety of low-fidelity models to obtain speed-ups, while keeping the main high-fidelity model in the loop to guarantee convergence to the correct result. This framework was applied to the James Webb Space Telescope observatory integrated model used to calculate the wavefront error caused by thermal distortions. The framework proved to reduce the time required to perform global sensitivity analyses from more than 2 months to less than 2 days, while reducing the error in the final estimates of the quantities of interest, including model uncertainty factors. These technical performance improvements are crucial to the optimization of project resources such as schedule and budget and ultimately mission success.


Introduction
A rigorous simulation-based framework for multifidelity uncertainty quantification (UQ) is presented which informs and guides the design, verification, and validation processes of complex, large-scale, multidisciplinary systems throughout their life cycle. The term multifidelity refers to the use of multiple mathematical models, of varying fidelity (accuracy) and cost, to perform a computational analysis of uncertainties in a system. In such an approach, inexpensive lowfidelity (LF) models are used for computational acceleration, while an expensive high-fidelity (HF) model is used to ensure accuracy of the results and retain the physics of the phenomena the model is trying to represent. This multifidelity approach enables rigorous UQ and provides insightful results from a complex system model faster and with a better understanding of the model inputs and outputs than the traditional way of relying only on an expensive HF model. For large-scale projects or programs, the latter approach is often a cause of schedule and cost overruns, driven by the large amount of time spent designing and analyzing systems with an inefficient model at the detriment of other important activities, such as integration and testing. As a case study, the successful application of this framework to the James Webb Space Telescope (JWST) 1 is described in detail and highlights the benefits that this framework brings in terms of technical performance improvements and optimization of project resources such as schedule and budget.
The development of complex systems relies on mathematical models to explore the trade space in the early phases of the system life cycle, optimize the system design in the more advanced phases, and simulate system performance to verify its ultimate compliance to mission requirements. Because a mathematical model is an imperfect representation of reality, it is necessary to subject the model to a validation process to confirm that it is an adequate representation of the physical system and is capable of predicting the system's behavior accurately with respect to the requirements within the domain of the intended application of the model. 2 This is accomplished through hardware testing and reconciliation of the test results with the model predictions.
A model's ability to perfectly reproduce reality is affected by several factors, including unknown model parameters, inadequacy of the model to represent certain phenomena, the fact that it may not be practical to run the model to observe the output for every configuration of interest, etc. 3 Model results are therefore inherently affected by different sources of uncertainty that need to be understood and included in the analysis cycles to inform and guide the decisionmaking process in the design and management of a project or program. As a result, understanding the uncertainty and the factors that drive it is important to all players in the decision chain, as it affects project resources.
The framework discussed here represents a paradigm shift from the current state of the art and the modus operandi in several government and industrial settings, where model uncertainty factors (MUFs) and/or safety factors, often based solely on engineering judgment, are usually applied downstream of the mathematical models to the worst-case model outputs (Ref. 4, pp. 89-92). There are two drawbacks to this approach. First, as defined in Ref. 4, MUFs are a semiquantitative adjustment, either additive or multiplicative or both, that is made to the results of a simulation-based analysis to account for uncertainty. Here, semiquantitative means based on past experience or engineering expertise rather than data. Second, designing a system for worstcase scenarios is equivalent to developing the system for a sequence of events that are each individually unlikely to occur in the system lifetime and even more unlikely to occur together. As a consequence, the use of MUFs to the outputs of worst-case scenario analyses leads to overly conservative predictions and overdesigned systems. While sometimes overdesign may be the more efficient way to solve a design problem, uncertainty in design, especially in a large-scale mission, such as JWST, is expensive.
The significance of the research results presented here is twofold. First, in this multifidelity UQ framework, uncertainty in system models is identified, categorized, modeled, and propagated in an integrated manner throughout the analysis cycles needed to design systems and verify their compliance to key performance requirements. Such an integrated approach naturally leads to an increased knowledge of the system that is being designed and to an in-depth understanding of the uncertainty's impact on system performance, robustness, reliability, and safety. 5 Gaining this knowledge is of paramount importance when considering that 85% of a project's total life cycle cost is locked in by the end of preliminary design. 6 Therefore, this framework becomes a crucial instrument especially if applied early in the project life cycle, when there is larger freedom in the design and decisions can have a stronger impact in subsequent phases.
Second, this framework has the ability to provide methods and tools to robustly and efficiently design complex, large-scale, multidisciplinary systems where several forms of uncertainties are present. This approach thus improves project performance by reducing the analysis time and freeing up resources for activities, such as integration and testing, that can have a major influence on the system's ability to meet the requirements.
This multifidelity UQ framework finds applications in several fields, such as groundwater modeling, 7 ocean dynamics, 8 and remote sensing, 9 and is proving crucial to a number of space missions, 10 as the inclusion of such critical information, needed to support resource utilization decisions and risk mitigation strategies, will maximize the probability of mission success.
The paper is organized as follows. Section 2 provides the background and motivation behind this work. Section 3 describes the UQ framework and Sec. 4 specifically addresses its multifidelity implementation. Section 5 illustrates in more detail how to carry out a global sensitivity analysis (GSA) study to rank model parameters and prioritize research efforts for those parameters with the greatest uncertainty to reduce their impact on the model outputs. In addition, this section describes the first formulation of multifidelity GSA (MFGSA) based on rank statistics, which yields the greatest computational benefit. The James Webb Space Telescope is used throughout as an example of a complex, large-scale, multidisciplinary system in which the described framework was employed to quantify uncertainty by overcoming the challenges imposed by the large computational cost of the models used. Section 6 summarizes the main findings and describes ongoing work and future perspectives.

Background and Motivation
The quality and reliability of a system's model depend on the model's adequacy to describe the system and its real-life behavior. 11 Model parameter values are often known only under ideal conditions or in a few limited laboratory test cases, 12-15 which either do not necessarily apply to the specific system or mission being investigated or lie outside the domain of the intended application of the model. 2 In addition, test settings and boundary conditions, as well as the hardware configuration, lead engineers to introduce effective parameters to account for geometric complexity [e.g., in a multilayer insulation (MLI) blanket] and physical phenomena caused by hardware interactions (e.g., aluminum screws bolted into a titanium rod or the presence of test sensors, instrumentation, etc.), which are difficult to model. The exact values of these effective parameters and their variances are unknown and have to be estimated by the engineers based on their expertise or judgment. Finally, while new complex systems are most impacted by uncertainties, even later system copies suffer from this issue.
Simulation-based design under uncertainty addresses such shortcomings by incorporating uncertainty upstream in the design process and propagating its effects through the model to obtain an output where one can identify the dominant effects contributing to the system's performance or robustness degradation.
Modeling, analyzing, and validating systems that are built to satisfy increasingly stringent requirements become more and more challenging as such systems grow in complexity and scale. 16 This leads to massive, computationally heavy models, characterized by a large number of parameters, easily exceeding the thousands. A multifidelity approach 17 can combine computationally intensive HF models with computationally efficient LF models in a principled way, making simulation-based design under uncertainty possible even for large-scale, multidisciplinary systems. Multifidelity approaches have been used to enable UQ and optimization for a variety of applications, including aircraft design, 18,19 structural analysis and optimization, 20,21 and trajectory simulation for entry, descent, and landing. 22 Lastly, models developed for different systems often reside in separate platforms that require manual transfer of data from one model to another when performing integrated, end-to-end analyses. An example is a thermal-structural-optical analysis where each discipline model is developed in a specific software environment and requires a number of interfaces to pass the outputs of one model as inputs to the next model in the chain of analysis to compute the final figure of merit. In these cases, it is imperative to automate several of the analysis and UQ tasks that involve running models on different platforms a large number of times.
The work presented in this paper shows how to combine all of these aspects in the design process to maximize use of the available computational resources within a prescribed budget.

Uncertainty Quantification
Uncertainty is usually defined as either aleatory or epistemic. Aleatory uncertainty is representative of unknowns that differ each time the same experiment is run and can be quantified through random sampling techniques such as Monte Carlo (MC) or quasi-MC methods. 23 For this reason, it is also called statistical uncertainty and the evaluation of uncertainty by the statistical analysis of series of observations is termed a type A evaluation. 24 Epistemic (systematic) uncertainty is due to things one could know in principle but does not in practice. For example, epistemic uncertainty can arise due to a measurement that is not accurate, a model that neglects certain effects, or some data that have been deliberately hidden. Epistemic uncertainty can be reduced by gaining a better knowledge of the system, process, or mechanism under consideration. The evaluation of uncertainty by means other than the statistical analysis of series of observations is termed a type B evaluation (of uncertainty). 24 Methods commonly used for epistemic uncertainties include probability bounds analysis, 25 fuzzy logic, 26 or evidence theory (i.e., the Dempster-Shafer theory, 27,28 which is a generalization of the Bayesian theory of subjective probability).
The multifidelity approach adopted in the framework presented here can be used to quantify the effects of both aleatory and epistemic uncertainties in the design cycle. For example, a finding that certain aleatory uncertainties contribute very little to the overall output uncertainty would allow these uncertainties to be neglected in future analyses, reducing the cost of these analyses. On the other hand, a finding that certain epistemic uncertainties contribute significantly to the variance of the output would indicate that more resources should be invested in reducing those uncertainties through, for example, further experiments. The analysis results presented in this work, however, focus on aleatory uncertainty, without loss of generality. Figure 1 shows the general scheme adopted to quantify aleatory uncertainty in this framework. The study presented here does not account for model inadequacy, i.e., the uncertainty in the model itself due to missing or wrong physics and any other type of modeling error. The extension to consider model inadequacy based on the works in Refs. 3 and 29 is an area for future work.
In step 1, all possible sources of aleatory uncertainty are identified, such as, for example, material properties, geometry, orbital parameters, etc.
In step 2, these uncertainties are characterized probabilistically: each uncertain variable is assigned a probability distribution (defined by a probability density/mass function in the continuous/discrete cases, respectively). These probability distributions are estimated by engineers based on experimental data, literature reviews, and engineering expertise.
In step 3, the model input parameter uncertainties are propagated through the model itself to compute the model outputs, which will also be stochastic in nature. This is accomplished through MC-type simulations, whereby all the parameters are varied simultaneously, and their values are sampled from their respective probability density functions. Because MC simulations cannot be run for an infinite number of samples to observe the output for every configuration of interest, there will be some residual error that is quantified as sampling error. Fig. 1 UQ process. The sources of uncertainty are identified in step 1 and then characterized through probability distributions in step 2. By means of a wrapper, the uncertainty is propagated in step 3 through each discipline model to compute the QoI. These are then used to perform uncertainty analysis in step 4. At the end of the process, in step 5, the UQ outputs are available for use.
In step 4, uncertainty analysis can be performed for all quantities of interest (QoIs) based on engineering needs. (Quantities of interest can also represent technical performance metrics. Here, the term "quantities of interest" is used to avoid loss of generality and reducing the scope of the methodologies presented in this work.) Some examples of uncertainty analysis are: • Estimators: One could compute statistical properties such as the expected value (mean) and variance of the distributions of each model output, as well as MUFs.
• Sensitivity analysis: A GSA 30 can provide a list of parameters prioritized by their influence in terms of their contribution to the total variance of a specific QoI. This is important in the context of requirements verification and model validation. For example, cumulative distribution functions can be used to evaluate the probabilities that each QoI meets specific requirements within a specified level of confidence, as predicted by the model. If there is insufficient confidence that a requirement will be met, GSA can provide a list of parameters ranked by their contribution to the total variance on a specific QoI related to that particular requirement. Additional research could therefore be performed to reduce the uncertainty in these parameters and thus decrease their contribution to the model output uncertainty. This step is discussed in more detail in Sec. 5.
• Bayesian model validation: When experimental data are available, Bayes' theorem can be used to update the model parameter prior distributions and thus increase confidence in the model. 5 • Optimization under uncertainty: The results of step 3 can be used to optimize stochastic figures of merit that may be required for the design of a specific system, such as expectation and variance or a linear combination of the two, etc. 31 Finally, step 5 gathers the outputs of the process in terms of optimized/validated models, QoI sensitivities to model uncertainties, and requirements verification results.

Application to JWST
During the course of its mission, JWST repeatedly changes its attitude as it slews from one science target to another. Because a typical observation lasts a few hours and the thermal time constant of JWST is on the order of days, there is insufficient time for JWST to reach thermal equilibrium before the next slew. As the thermal state changes during these maneuvers throughout the mission, the thermally induced deformations in the observatory structures also vary, causing perturbations in the wavefront error (WFE). 32 In the work presented here, the hot-tocold slew case is used as a benchmark to analyze the WFE of the telescope for an attitude change from þ5 deg to −45 deg in pitch and from −5 deg to þ5 deg in roll during nominal science operations. This change of state is referred to as HC23 ( Fig. 2) and defines the allowed range of boresight pointing angles for the observatory relative to the Sun line, which must remain in the range 85 deg to 135 deg at all times to keep the telescope behind the sunshield. As a result of this hypothetical maneuver, the temperatures of several subsystems change, before stabilizing to their steady-state values. As an example, this work focuses on thermal distortions of the primary mirror (PM) backplane support structure, which are expected to dominate the WFE evolution.
The WFE is derived by first computing the temperature predictions, ΔT, through the JWST observatory thermal model. These predictions serve as input to the JWST observatory thermal distortion model, which provides the final WFE value. The thermal model is run in SINDA/G, 33 a finite-difference-based thermal software. Similar to the approach described in Ref. 34, the nominal ΔT values predicted by the thermal model are multiplied by some MUFs before mapping performed by the structural analysis team in preparation for input to the thermal distortion model, which is a finite-element model developed in NASTRAN ® with millions of degrees of freedom.
For the thermal model, the subsystems of interest in this work are the telescope's PM, primary mirror backplane assembly (PMBA), backplane support frame (BSF), integrated science instrument module (ISIM) bench, ISIM near-infrared spectrometer (NIRSpec) focal plane array (FPA) radiator, ISIM science instruments (SI) radiators and aft optics subsystem central baffle, for a total of n ¼ 7 ΔT outputs. The thermal model input parameters identified for this study are materials' conductivities, emissivities, and effective emissivities (e.g., for an MLI blanket), for a total of m ¼ 194 variables.
For the thermal distortion model, the output of interest is the WFE, whereas the input parameters identified for this study are the Young modulus (E), Poisson coefficient (ν), and coefficient of thermal expansion of different telescope structures, for a total of m ¼ 66 variables.
Based on information gathered in the literature, 12,13,15 experimental data collected over the years to build a JWST material property database, elicitation from subject matter experts, the parameters' probability density functions were found to be either uniform or normal. Wherever data were absent, uniform distributions were assumed, indicating a complete lack of knowledge regarding the distributions of those particular parameters, assuming equal probability for their values within acceptable bounds. All the uniform distributions used in this study have lower and upper bounds varying within AE10% and AE5% of the parameters' nominal values for the thermal and thermal distortion model, respectively. The normal distributions have a wider variability depending on the material.
The parametric uncertainties were propagated in the integrated model through MC simulations, whereby many samples were drawn from the parameter distributions to approximate the characteristics of the QoI distributions. The effects of radiative parameters such as materials' emissivities were accounted for after a model order reduction strategy was implemented, due to the high computational cost needed to vary each of these parameters in NASA's Thermal Synthesizer System (TSS) 35 and update the corresponding SINDA/G files. Each single TSS run required about 2 days and each of the 12 emissivity parameters was varied to its minimum and maximum values, for a total of 24 runs. This approach thus allowed defining an envelope for such parameters and their effects on the QoI distributions, implying that their contribution to the overall uncertainty could not be greater than this.
The QoIs are the mean and variance of the HC23 ΔT values for each of the subsystems mentioned above and the final WFE. In practice, estimators of these QoIs were computed with a finite number of samples, thus providing an approximation of the true mean and variance of the model outputs, as shown in Eq. (1). If the model is represented by the function Y ¼ fðXÞ, where Y ¼ fY 1 ; X Y ; : : : ; Y n g is a vector with n outputs and X ¼ fX 1 ; X 2 ; : : : ; X m g is a vector with m input parameters, each with its own probability density function, the mean and variance estimators are, respectively, defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 0 8 (1) Fig. 2 The hot-to-cold slew maneuver consists of changing the telescope attitude from 85 deg to the Sun (hot) to 135 deg (cold), corresponding to a change in pitch angle from −5 deg to þ45 deg. Because JWST operates in an ecliptic coordinate framework, two small continuous viewing zones can be seen, which are centered at each of the ecliptic poles. Their 5-deg radius is determined by the 85-deg solar exclusion zone, although any observation approaching the 85 deg limit will have additional limitations due to safety considerations.
where X k is the k'th independent realization of the input X and N is the number of realizations (or samples). Here, Y represents either the vector of ΔTs from the thermal model or the WFE from the thermal distortion model. The vector σ 2 is the vector of variances σ 2 j for each of the j outputs.
The MC standard error, e MC , relates the number of samples N with the confidence level P in the results: . In other words, to achieve a 1% error at the 1σ level (P ¼ 68%), N ¼ 2000 samples are necessary. For the thermal model, a number of samples N ¼ 1500 was adopted to achieve an e MC ≈ 1.2% at the 1σ level (i.e., 68% confidence level), while keeping the computational cost of running the JWST observatory thermal model within the prescribed budget. The computational budget was limited by a combination of factors, including simulation run time, number of available SINDA/G licenses, and project schedule requirements. The computation of the samples needed in Eq. (1) was parallelized on two Intel ® Xeon ® CPU E5-2620 v2 @ 2.10-GHz processors and six cores, given the availability of six SINDA/G licenses. Each model run took ∼4.5 h with a starting guess from a previously converged simulation.
For the thermal distortion model, a number of samples N ¼ 460 was adopted (e MC ≈ 2% at the 1σ level), with each model run requiring ∼55 min on average. The computation of the samples was performed on NASA's 129056-core "Discover" supercomputing cluster, an assembly of multiple Linux scalable units built upon commodity components capable of nearly 6.8 petaflops and parallelized by simultaneously running up to five simulations based on the number of available NASTRAN ® license seats. Table 1 shows the mean and variance estimators for all of the outputs.
For each QoI, a MUF is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 4 5 5 where Y nom is the model output Y under nominal conditions, and Y and σ are the MC estimators of the average Y and variance calculated according to Eq. (1) with the same samples previously drawn from the parameters' probability density functions for each QoI. Figure 3 shows the MUFs for each thermal subsystem. The orange squares are associated with MUFs determined through a root-sum-square (RSS) approach as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 3 4 9  Here, each parameter, one at a time, is changed to the minimum and maximum values of its distribution (In the case of a normal distribution, the values at AE6σ were considered as maximum and minimum, respectively, which extend to nearly the totality of the data) and the largest ΔT resulting from either change, ΔT max , is taken into the RSS. The RSS approach assumes that all the parameters are uncorrelated and does not account for any parameter interaction effects.
In addition, because all the model parameters are pushed to their extreme values, the resulting MUFs can be seen as an envelope of worst-case scenarios. The blue circles, instead, represent the MUFs calculated through Eq. (2) with the probability density functions identified in this study. The nature of the MC approach also allows calculating the standard deviation of each MUF, thus providing a quantitative indication of their confidence or coverage range, which is not possible with an RSS approach. This example shows how easy it can be to overdesign systems when making worst-case scenario assumptions, as done with an RSS approach, and to underestimate available margins. Quantification of uncertainty eliminates these drawbacks by providing results based on data and by avoiding the improper use of project resources, including schedule and cost, caused by overly conservative assumptions and excessive reliance on subjective engineering judgment.

Multifidelity Uncertainty Quantification
In steps 3 and 4 of the UQ process in Fig. 1  the multifidelity estimator has lower variance than the HF estimator (more certainty in the multifidelity estimator); for a fixed estimator quality, the multifidelity estimator is cheaper to compute than the HF estimator.
Denote by f 1 the HF model, with cost (usually given by model run time) w 1 . Denote by f 2 ; f 3 ; : : : ; f l a set of l − 1 LF models, with costs w 2 ; w 3 ; : : : ; w l , respectively. LF models map the same inputs to the same quantity of interest as the HF model but sacrifice some amount of accuracy to gain speed. Some examples of LF models include projection-based reduced models, 37-40 data-fit interpolation, and regression models, 41,42 including machine-learning-based methods, such as support vector machines, 43,44 and other simplified models 45,46 that rely on early-stopping criteria, coarse-grid approximations, or simplified physics assumptions. The multifidelity estimator for the QoI mean is given as follows for two models (the HF model f 1 and one LF model f 2 ): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 5 9 2 where α 2 is scaling constant 36 that stretches f 2 to the same scale as f 1 , N 1 is the number of samples drawn from the expensive model f 1 , and N 2 is the number of samples drawn from the cheaper LF model N 2 , and N 2 > N 1 . By re-arranging Eq. (4) as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 5 0 5 it can be seen that the multifidelity estimator can be viewed as a high sample estimate of the LF mean, plus an additive correction based on a few samples of the HF model. While here we include the formula for only two models (the HF model and one LF model), the extension to multiple LF models is straightforward, involving adding the term α l ð 1 for each additional LF model l ¼ 3;4; etc. The multifidelity estimator is unbiased, i.e., the expected value of the multifidelity mean estimator Y MF in Eq. (4) is equal to the expected value of the HF mean estimator Y in Eq. (1), E½Y MF ¼ E½Y. 36 Remarkably, this unbiasedness property holds even when the LF model(s) is (are) biased, due to the terms in parentheses in Eq. (4) canceling when the expectation of the whole equation is taken. The multifidelity estimator is also more certain than the HF estimator with the same cost, i.e., the variance of the multifidelity mean estimator is less than the variance of the HF mean estimator, Var½Y MF < Var½Y, when certain conditions are met: 36 roughly speaking, these conditions require that the loss in fidelity of the LF model be compensated for by a sufficient reduction in computational cost.
The work in Ref. 36 provides a model selection algorithm that ensures that the set of models used in the multifidelity estimate satisfy the conditions for the multifidelity estimator to outperform the HF estimator. This allows selection from a rich diversity of models of varying approximation quality and cost, based on different aspects of the system physics. Reference 36 also provides an algorithm to optimally allocate a fixed computational budget (usually given in CPU time) among the different models, that is, the algorithm outputs the number of samples N 1 ; N 2 , etc. that should be given to each model, which yield the multifidelity estimator with lowest variance. These algorithms are used in the subsequent numerical results for model selection and the distribution of samples among the different models available.
The MFMC approach described above for estimating the QoI mean can be extended to estimate QoI variances and/or other statistics by replacing each of the mean estimators in Eq. (4) with the appropriate estimator for the variance or other statistic of interest. 47 This approach is used in Sec. 5.1 for multifidelity estimation of sensitivity indices. Table 2 shows the LF models created from the JWST HF observatory thermal model, f 1 , how well they correlate with it, and their run time.

Application to JWST
Seven different LF models were established as follows. Model f 2 was created by setting the convergence criteria of the HF model to less stringent values to decrease the total run time from 4.5 h to 45 min, while agreeing with the HF model predictions within 5%. Models f 3 , f 4 , and f 5 were developed by interpolation between sampled points of f 1 of first-, second-, and third-order degree polynomials, respectively. Three other models were derived with UQLab, 48 an opensource MATLAB toolbox for UQ. Model f 6 is a canonical low-rank approximation (LRA), [49][50][51] which UQLab identifies by performing a threefold cross-validation with 1500 training points and 1500 validation points from the HF model to select a model with rank 3 and Legendre polynomial bases of degree 2. Model f 7 is a Kriging approximation with constant trend and anisotropic ellipsoidal Matérn 5∕2 correlation function. [52][53][54] The hyperparameters of the model were determined through a maximum likelihood estimation method, which employed a hybrid genetic optimization algorithm 55 and 1200 training points. Model f 8 is a regression-based model with support vector machines (SVR), 43,44 which uses a linear ε-insensitive loss function with an isotropic Matérn 5∕2 kernel. The hyperparameters of the model were determined through a hybrid covariance matrix adaptation-evolution strategy (HCMAES) optimization algorithm 56 to minimize the span estimate of the leave-one-out (LOO) error with 300 training points.
For the JWST observatory thermal model, the MFMC model selection algorithm 36 selected three of the LF models shown in Table 2, automatically eliminating those that do not meet the constraint on the ratios of the costs and correlation coefficients as described earlier in Sec. 4 and Ref. 36. The share of samples for each model is shown in Fig. 4. Most of the samples are  Fig. 4 Sample sharing. More than half of the samples is allocated to f 3 , which is the fastest among the models used, despite its low correlation with f 1 . Its lower accuracy is compensated by f 6 and f 7 , which have a slightly longer run time but higher correlation with f 1 . This is kept in the loop enough to increase the accuracy of the results without worsening the computational burden. allocated to f 3 , which is the fastest among all models, despite its low correlation with f 1 . This is compensated by f 6 and f 7 , which are characterized by a slightly longer run time but higher correlation. f 1 is kept in the loop enough to increase the accuracy of the results without worsening the computational burden.
In Fig. 5, the root mean square error (RMSE) of the MFMC approach is compared to that of a regular MC UQ analysis to show how, in general, for a given RMSE, multifidelity estimates converge in less time and, for a given computational budget, achieve a lower RMSE.

Multifidelity Global Sensitivity Analysis
GSA seeks to quantify the sensitivity of a model's QoI to its individual inputs. In particular, variance-based GSA 57-59 divides the total variance of the model QoI into percentages that can be attributed to the j'th input alone or the interaction between the i'th and j'th inputs, etc. 60 Two types of variance-based sensitivity indices are of particular interest to engineers: the Sobol' main index, S m;i , and the Sobol' total index, S t;i . 57 Both Sobol' indices lie in the interval ½0;1. The main index measures the fraction of the total QoI variance that can be attributed to the i'th input alone, whereas the total index measures the fraction of the total QoI variance that can be attributed to the i'th input as well as the interactions of the i'th input with other inputs. Inputs with larger sensitivity indices contribute more to QoI uncertainty. Computing Sobol' indices for all inputs thus allows inputs to be ranked in terms of importance, enabling engineers to drop unimportant inputs for simplified analyses or to prioritize further characterization of important inputs to reduce the overall QoI uncertainty.
Let D ¼ VarðYÞ denote the total variance of a model output. For vector outputs, the GSA procedure is simply repeated for each element of the vector output. The Sobol' main effect sensitivity index S m;i of the i'th input parameter is defined as follows: The numerator in Eq. (6) is the variance over the distribution of X i of the conditional mean of the output Y conditioned on the input X i . S m;i measures the contribution of input X i to the output variance, without accounting for any interactions with other input parameters. For some intuition, note that if fðXÞ ¼ fðX i Þ, i.e., if X i is the only input that contributes to the output, then E½YjX i ¼ fðX i Þ ¼ fðXÞ, so in this case D i ¼ D and the Sobol' main index would be S m;i ¼ 1, indicating 100% of the output variance is attributable to input X i . On the other hand, if X i does not have any influence on Y, then E½YjX i ¼ E½Y, so D i ¼ VarðE½YÞ ¼ 0. The Sobol' total effect sensitivity index of factor X i is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 7 2 3 S t;i ¼ 1 − VarðE½YjX 1 ; : : : ; X i−1 ; X iþ1 ; : : : ; X m Þ VarðYÞ ; i ¼ 1; : : : ; m: The numerator in Eq. (7) is the variance of the conditional mean, where the conditioning is on all inputs except the i'th input. S t;i measures the total contribution of input X i on the output vector's variance, accounting for the influence of X i alone as well as its interactions with the other input parameters. Sobol' indices are usually computed via MC sampling methods using what are known as "fixing methods" or the "pick-freeze" approach. These terms refer to the way that the conditional expectations in Eqs. (6) and (7) are estimated. Such methods generate pairs of input samples by first generating one set of N input samples, and then for each input sample, the input component(s) on which the expectation is conditioned is "fixed" or "frozen" and all other input components are resampled to generate a second associated sample. The products of the outputs associated with these pairs of inputs are used to estimate the variances of the conditional means in Eqs. (6) and (7). Several different pick-freeze estimators for these indices have been proposed in the literature, 58 The MFMC approach described in Sec. 4 is used in Ref. 47 to accelerate the estimation of Sobol' sensitivity indices by replacing the mean estimators in Eq. (4) with the pick-freeze estimators of Sobol' sensitivity indices. This MFGSA approach is fundamentally different from some other approaches in the literature which also use the term "multifidelity," 64,65 where Sobol' indices are computed for an LF model whose accuracy is increased using HF model evaluations. In such approaches, the resulting estimators are biased, whereas the MFGSA approach of Ref. 47 yields conditional expectation estimators which are unbiased with respect to the HF model (regardless of bias introduced by LF models!). The next section describes how this MFGSA approach is used for a GSA analysis of the JWST observatory thermal model.

Application to JWST
A GSA analysis previously performed on the JWST observatory thermal model is described in Ref. 66. Here, the results were shown for the Sobol' main effect sensitivity indices of m ¼ 5 parameters (out of 194) through regular MC integration. 67 To achieve an e MC ≈ 1% at the 1σ level, N ¼ 2000 samples were required for a total of ðm þ 1Þ × N ¼ 12;000 simulations. Using only LF model f 2 , which warrants a 5% agreement with HF model f 1 in terms of nominal values and runs in about 45 min (as opposed to the 4.5 h required by f 1 ), and parallelizing the simulations on 6 cores, 12 days of computational time per Sobol' index were needed, for a total of ∼60 days. While this analysis showed which of the five parameters of interest needed to be addressed by further research, it also showed the limiting factors of an MC-based GSA in terms of quality of results and run time when dealing with expensive models.
The MFGSA approach of Ref. 47 overcomes these limitations by leveraging the smaller cost of LF models that are kept in the loop with the HF model for accuracy. An MFGSA was performed on the JWST HF observatory thermal distortion model, g 1 , which takes the temperature values output by the thermal model discussed in Sec. 3.1 to perform a thermal distortion analysis, with the WFE being the final QoI.
Four different LF models were created as follows. A linearized model g 2 was built by evaluating the first derivatives of the model output with respect to each input and expanding the model output in a Taylor series truncated at the first order. This is analogous to performing a local sensitivity analysis. Three other models were derived with UQLab. Model g 3 is a canonical LRA, which UQLab identifies by performing a threefold cross-validation with 100 training points and 360 validation points from g 1 to select a model with rank 1 and Hermite and Legendre polynomial bases of degree 1. Model g 4 is a Kriging approximation with linear trend and anisotropic ellipsoidal Matérn 5∕2 correlation function. The hyperparameters of the model were determined through a covariance matrix adaptation-evolution strategy optimization algorithm 56 to perform a 70-fold cross-validation. Model g 5 is a regression-based SVR model which uses a linear ε-insensitive loss function with anisotropic ellipsoidal Gaussian kernel. The hyperparameters of the model were determined through a Broyden-Fletcher-Goldfarb-Shanno optimization algorithm [68][69][70][71] to minimize the span estimate of the LOO error. Table 3 shows the LF models created and selected by the MFGSA algorithm, their correlation to the HF model, their run time, and the number of samples that were allocated to each model as determined by the algorithm with a budget p ¼ 1500. For the HF MC estimators, only 12 HF samples are affordable with this computational budget. For the MFGSA estimators, the leftover budget (not enough for another HF sample) is used to generate 3422 samples from LF model g 2 and 96,661 samples from LF model g 3 . For this particular combination of budget, model costs, and model fidelities, the model allocation algorithm uses the leftover budget after 12 HF evaluations for the very low-cost, LF evaluations, so both the multifidelity and single-fidelity MC approach use the same number of HF evaluations. However, this is not always the case: in other circumstances it is common to trade off a few HF evaluations in exchange for more LF evaluations.
In Fig. 6, the mean and variance estimates of the WFE are shown for three different cases, highlighting the benefits of using a multifidelity approach. In blue is a histogram of the mean and variance as determined by an MC simulation with only the HF model with the prescribed budget of p ¼ 1500. The histogram in orange represents the multifidelity estimates for the same budget,   Fig. 6 Comparison of (a) the WFE mean and (b) variance estimators for UQ performed with only the HF model (blue), and the HF model and two LF models (orange) for the same budget (p ¼ 1500). In yellow are the same multifidelity estimates with an increased budget (p ¼ 3000).
indicating a reduction in the spread of both estimators. The histogram in yellow shows an additional improvement when the budget is increased to p ¼ 3000 with a number of samples equal to 24, 6230, and 194,733 for the same three models used above.
The Sobol' main and total effect sensitivity indices for p ¼ 1500 are shown in Fig. 7. Both sets of indices point to parameters #13 and #21 as the ones carrying about 1∕3 of the variance contribution to the output, followed by #7 and #5 with about 1∕10. The main effect indices appear to have some residual variability, which would be reduced with a higher number of simulations to improve convergence. Such residual is much less noticeable in the total indices. Despite this, other parameters that appear to contribute to the WFE variance are #20 and #42 at a <5% level.
In terms of overall performance, MFGSA brought several improvements. First, it enabled computing both the main and total effect sensitivity indices for all the 66 parameters of interest within the prescribed budget, without having to leave any parameters out, as was the case for the JWST observatory thermal model discussed in Sec. 3.1. Second, accuracy guarantees on the final results were retained through the HF model, thus removing that additional source of uncertainty that was introduced in the JWST observatory thermal model when performing GSA with only one LF model.
The main drawback was related to the ability to draw HF samples from the NASTRAN ® model, which could only be run on a supercomputer without a direct interface with the MFGSA software in MATLAB. This part of the process could not be automated and required saving the HF samples offline. Based on the values shown in Table 3, the minimum number of HF simulations needed was ðm þ 2Þ × N ¼ ð66 þ 2Þ × 12 ¼ 816. Because the NASTRAN ® model could only simulate one condition at a time, the HC23 change of state of the telescope (see Sec. 3.1) required two simulations per sample, bringing the total number to 1632 and an equivalent computational time of about 62 days. This number is close to the 60 days needed to perform the GSA described in Sec. 3.1 for only five parameters, which represents a major improvement. Nonetheless, collecting samples from an expensive HF model continues to represent the main hurdle to making even a tool such as MFGSA practical, especially when this type of information is needed over shorter timescales.

Rank Statistics-Based MFGSA
This section describes a powerful new multifidelity estimator for global sensitivity indices that unites the multifidelity framework with a recently proposed Sobol' main effect index estimator based on rank statistics. The new approach requires OðNÞ model evaluations and is more efficient than the pick-freeze approach described in Sec. 5.1, which requires OðmNÞ evaluations. While the pick-freeze methods have long been the state of the art, they have two major disadvantages. First, they rely on a particular experimental design (fixing some input components and resampling others) that may be unavailable in practice. Second, the number of model calls to estimate all first-order Sobol' indices grows linearly with the number of input parameters m, as discussed in previous sections. These two limitations were recently addressed and eliminated by the work of Gamboa et al., 72 which is able to embed Chatterjee's method 73 in the GSA framework, resulting in a dramatic reduction of the number of samples needed to estimate Sobol' indices of any order. Chatterjee's method consists of defining a coefficient of correlation between parameters based on rank statistics. In statistics, ranking is the data transformation in which numerical or ordinal values are replaced by their rank when the data are sorted. For example, the following observed numerical data: 5.3, 1.8, 9.2, 6.4, would be ranked in ascending order as 2, 1, 4, 3, respectively. Despite being simple in nature, like other classical coefficients (e.g., Pearson's, Spearman's or Kendall's correlation), Chatterjee's coefficient is able to estimate the degree of dependence between the parameters of interest while retaining a simple asymptotic theory under the assumption of independence. Gamboa generalizes Chatterjee's method, requiring only one set of N samples to compute Sobol' indices for all m inputs.
In this framework, called rank statistics-based GSA (RSB GSA), the Sobol' main effect sensitivity index of variable X i for output Y is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 5 2 0 where Y is the sample mean of Y, VarðYÞ is the sample variance of Y, and R i ðkÞ defines a reordering (permutation) of the output Y samples based on sorting the i'th input components X i in ascending order, as follows: let ϕ i ðkÞ denote the rank of X i;k in the sample fX i;1 ; X i;2 ; : : : ; X i;N g. That is, κ ¼ ϕ i ðkÞ denotes the place of X i;k in the set after rearranging the inputs fX i;k 0 g N k 0 ¼1 in ascending order. Let k ¼ ϕ −1 i ðκÞ denote the inverse of this operation, i.e., ϕ −1 i ðκÞ gives us the original index of the κ'th smallest X i in the original ordering of the set. Then, R i ðkÞ is defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 3 8 5 From Eq. (8) it can be seen that a total of N model runs is required to compute all Sobol' main effect sensitivity indices, since the same Y samples are reordered according to Eq. (9) for every parameter X i . The number of runs, therefore, no longer depends on the number of parameters, m. This represents a considerable reduction from the GSA/MFGSA budgets, which grow linearly with m. Equation (8), however, usually requires N to be very large (of the order of 10 4 -10 5 ) to achieve convergence. 72 Reducing the number of runs to N may therefore still not be sufficient to making this approach useful when dealing with expensive models. Algorithm 1 combines the RSB global sensitivity estimators with the MFGSA approach of Ref. 47 in a new approach that enables GSA of multidisciplinary, large-scale systems at even further reduced cost. This new approach is called RSB MFGSA. The results from Algorithm 1 are shown in Fig. 8. In gray are the indices obtained with an RSB GSA performed with HF model g 1 and all the 90 available samples, which were not enough to achieve convergence. In black are the indices obtained with the RSB MFGSA, which are similar to those in Fig. 7(a) but were obtained with a remarkably lower effort. With a budget p ¼ 47, the number of samples (shown at the top right-hand side of Fig. 8) required by the RSB MFGSA approach is similar to the number of samples required by MFGSA with a budget p ¼ 3000 (see Sec. 5.1 and Fig. 6). Table 4 summarizes all the computational methodologies developed over the years to evaluate Sobol' indices and a comparison of the required computational budgets for the JWST case study. This comparison is based on the observation that the RSB MFGSA approach of Fig. 8 with a budget of p ¼ 47 leads to index estimates that are qualitatively similar to the estimates of the MFGSA of Fig. 7 with a budget of p ¼ 1500. The MFGSA approach is therefore approximately 42× more expensive than the RSB MFGSA approach. To estimate the budget  Sort ϕ in ascending order and get ordering index vector ϕðkÞ.
Compute RðkÞ by reordering ϕ with the indices determined in the previous step for every variable.
Get output corresponding to the new rank, Y R i ðkÞ .
Initialize S G m as in Eq. (8).

12:
Add correction to existing estimate: S G m ← S G m þ α l ΔS G m .

13: end for
Cataldo, Qian, and Auclair: Multifidelity uncertainty quantification and model validation. . . required for the HF -only estimators to converge to a similar rate, we note that the variances of the HF -only variance estimators are approximately 4× larger than the variances of the multifidelity variance estimators for both the GSA and the RSB GSA approaches, respectively. Since MC variance scales inversely with the number of samples, we estimate that the HF -only versions of the estimators will require a 4× larger budget to achieve similar levels of convergence to the multifidelity estimates. Note that these budgets are for the JWST case study considered in this work, and that the specific budgets required for GSA will depend on the problem at hand, the method used, and the desired level of convergence. However, the comparison in Table 4 demonstrates that for this complex multidisciplinary system with 66 input parameters, the required budget decreases due to the introduction of both multifidelity and RSB approaches, making the combination of the two a powerful tool for the computation of Sobol' indices of expensive models and a large number of parameters. Note also that these budgets are what is required for a single set of Sobol' index estimates: in practical settings, the user may wish to repeat the estimates multiple times to understand the randomness in the estimates.

Conclusions
A framework to quantify the effects of parametric uncertainty in mathematical models that are used to design complex, large-scale, multidisciplinary systems was presented along with simulation procedures that address the difficulties of dealing with computationally expensive models and a number of parameters exceeding 60. Multifidelity techniques have been shown to reduce the execution time from months to days by leveraging different types of LF models to accelerate the UQ tasks (requiring expensive MC simulations), while keeping the HF model in the loop to maintain accuracy guarantees on the final result.
This framework was successfully applied to the James Webb Space Telescope's observatory integrated model, whose thermal module outputs temperature values that become the input to the structural module, which in turn provides the WFE of the PM caused by thermal distortions. The uncertainties of both modules' input parameters were identified and characterized in terms of probability density functions and propagated through the model. Uncertainty analysis was then performed by computing mean and variance estimators and MUFs for seven telescope subsystems. Global sensitivity analyses were carried out to identify the parameters that could cause the greatest reduction in the model output variance by targeting them with experiments aimed at reducing their variance.
The computational cost of all these tasks, which required exploring the design space with a number of samples sufficient to achieve at least an MC standard error ≤1% at the 1σ level, was shown to be reduced by adopting multifidelity techniques. These were applied to the two abovementioned models, which were characterized by 194 and 66 input parameters and an average run Table 4 Review of Sobol' indices computation methodologies and comparison of computational budgets required to obtain Sobol' index estimates of similar quality to the multifidelity estimates in Figs. 7 and 8. Note that the computational budget is given in CPU hours. The analyses may take less "real-world" time if, for example, some model samples can be run in parallel, or more "real-world" time if, for example, the user must wait in a queue for a software license to become available.

Method
Authors Budget (CPU hours) GSA Sobol', 57  time of ∼4.5 and 1 h, respectively. When coupled with rank statistics, multifidelity techniques made global sensitivity analyses efficient and practical to use in the context of designing multidisciplinary, large-scale systems by reducing the number of runs of the HF model by orders of magnitude from a few thousand to 10 to 25. This reduced the time required for such an analysis from over 2 months to only 2 days. In addition, for a given computational budget, multifidelity techniques enabled achieving lower errors on the final estimates. The approach described in this work allowed identifying the main drivers for subsequent research activities aimed at reducing their variance and better understanding their impact on system performance. More in general, this approach has demonstrated the ability to guide and inform the decision-making process with an increased knowledge of the system under consideration. This is most crucial in the early phases of a project life cycle, before most of the cost is locked in. However, it is applicable throughout with benefits spanning from a more effective use of computational resources, faster way of obtaining critical knowledge on the system's main drivers, the ability to keep the project on schedule and its budget on track, and the ability to invest more in activities such as integration and testing that can uncover problems before launching the system in its operational environment.
Future work attempts to include model inadequacy as a way to quantify the uncertainty of the model itself and enable a Bayesian-based model validation that is able to account for the impacts of model inadequacy on the system's response.