Applying Gaussian Process Machine Learning and Modern Probabilistic Programming to Satellite Data to Infer CO₂ Emissions
Creators
Abstract
Satellite data provides essential insights into the spatiotemporal distribution of CO2 concentrations. However, many atmospheric inverse models fail to adequately incorporate the spatial and temporal correlations inherent in satellite observations and often lack rigorous methods for estimating parameters like spatial length scales. We introduce an inference model that processes the spatiotemporal covariance in satellite data and estimates hyperparameters such as covariance length scales. Our approach uses the Gaussian process (GP) machine learning (ML) and modern probabilistic programming languages (PPLs) to perform atmospheric inversions of emissions from satellite data. We develop a GP ML inversion system based on modern PPLs and the GEOS-Chem chemical transport model, simulating atmospheric CO2 concentrations corresponding to the Orbiting Carbon Observatory-2/3 (OCO-2/3) data for July 2020. In our supervised learning framework, we treat the GEOS-Chem simulated data set as the target, with predictors derived by scaling the target with sector-specific factors hidden from the GP machine. Our results show that the GP model, combined with GPU-enabled PPLs, effectively retrieves true emission scaling factors and infers noise levels concealed within the data. This suggests that our method could be applied over larger areas with more complex covariance structures, enabling comprehensive analysis of the spatiotemporal patterns observed in OCO-2/3 and similar satellite data sets.
Copyright and License
© 2025 The Authors. Published by American Chemical Society. This publication is licensed under CC-BY 4.0.
Acknowledgement
We thank the PI Computing Allowance Program at LBNL for computing allocations from the Lawrencium Cluster and the NASA High-End Computing Program through the NASA Advanced Supercomputing Division at NASA Ames Research Center for additional resources. The views and opinions expressed herein by the authors do not necessarily state or reflect those of NASA, the United States Government, or The Regents of the University of California.
Funding
This work at Lawrence Berkeley National Laboratory (LBNL) was supported by NASA’s funding (80HQTR21T0101) through the Earth Science Program for Carbon Cycle Science, under contract no. DE-AC02–05CH11231 with the U.S. Department of Energy.
Data Availability
To help the reader understand the overall method of GP modeling and provide details for the MLL approach, we provide the GPyTorch implementation for the GP MLL model and the input data at https://sites.google.com/lbl.gov/calgem/GP. Vulcan emissions can be accessed at 10.3334/ORNLDAAC/1810 (date last accessed: September 27, 2022), and fire emissions data are accessible at Zenodo (https://zenodo.org/records/7229675; date last accessed February 1, 2023). Bias-corrected versions of OCO-3 Level 2 (v10.4r) and OCO-2 Level 2 (v11r) data are available from https://disc.gsfc.nasa.gov/(date last accessed: June 15, 2023). The CARB California GHG Emission inventory is provided at https://ww2.arb.ca.gov/ghg-inventory-data (date last accessed: November 2, 2022), and the CalTrans Performance Measurement System data is found at https://pems.dot.ca.gov/(date last accessed: November 5, 2022). Air traffic data from the OpenSky Network is available at https://zenodo.org/records/5815448 (date last accessed: November 5, 2022). Container throughput counts for the Port of Oakland, Port of Los Angeles, and Port of Long Beach are respectively found at https://www.oaklandseaport.com/performance/facts-figures/(date last accessed: November 10, 2022), https://www.portoflosangeles.org/business/statistics/container-statistics (date last accessed: November 10, 2022), and https://polb.com/business/port-statistics/#teus-archive-1995-to-present (date last accessed: November 10, 2022). Carbon dioxide and biogenic fluxes data from CarbonTracker and the SMUrF model code are accessible at https://gml.noaa.gov/aftp/products/carbontracker/co2/CT-NRT.v2022-1/fluxes/daily/(date last accessed: March 14, 2024) and https://github.com/wde0924/SMUrF (date last accessed: March 29, 2023), respectively.
Supplemental Material
Description of Gaussian process kernels, probability distribution of prior parameters, methods for prior emissions, details of the GC model simulations, details of the GP MLL and classical Bayesian methods, uncertainty estimation method for GP MLL, and details of the computational cost for GP inverse modeling (PDF)
Files
jeong-et-al-2025-applying-gaussian-process-machine-learning-and-modern-probabilistic-programming-to-satellite-data-to.pdf
Files
(9.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:926ca7c764149ff86a036d6f9d2aba44
|
3.6 MB | Preview Download |
|
md5:ef71e9a28ecd0d435435090cce9d41d6
|
5.4 MB | Preview Download |
Additional details
Identifiers
- PMID
- 39992284
- PMCID
- PMC11912316
Related works
- Describes
- Journal Article: PMC11912316 (PMCID)
- Journal Article: 39992284 (PMID)
- Is supplemented by
- Supplemental Material: https://pubs.acs.org/doi/suppl/10.1021/acs.est.4c09395/suppl_file/es4c09395_si_001.pdf (URL)
Funding
- National Aeronautics and Space Administration
- 80HQTR21T0101
- United States Department of Energy
- DE-AC02-05CH11231
Dates
- Accepted
-
2025-02-14
- Available
-
2025-02-24Published online