Published March 11, 2025 | Version Published
Journal Article Open

Applying Gaussian Process Machine Learning and Modern Probabilistic Programming to Satellite Data to Infer CO₂ Emissions

  • 1. ROR icon Lawrence Berkeley National Laboratory
  • 2. ROR icon Ames Research Center
  • 3. ROR icon California Institute of Technology
  • 4. ROR icon University of Washington

Abstract

Satellite data provides essential insights into the spatiotemporal distribution of CO2 concentrations. However, many atmospheric inverse models fail to adequately incorporate the spatial and temporal correlations inherent in satellite observations and often lack rigorous methods for estimating parameters like spatial length scales. We introduce an inference model that processes the spatiotemporal covariance in satellite data and estimates hyperparameters such as covariance length scales. Our approach uses the Gaussian process (GP) machine learning (ML) and modern probabilistic programming languages (PPLs) to perform atmospheric inversions of emissions from satellite data. We develop a GP ML inversion system based on modern PPLs and the GEOS-Chem chemical transport model, simulating atmospheric CO2 concentrations corresponding to the Orbiting Carbon Observatory-2/3 (OCO-2/3) data for July 2020. In our supervised learning framework, we treat the GEOS-Chem simulated data set as the target, with predictors derived by scaling the target with sector-specific factors hidden from the GP machine. Our results show that the GP model, combined with GPU-enabled PPLs, effectively retrieves true emission scaling factors and infers noise levels concealed within the data. This suggests that our method could be applied over larger areas with more complex covariance structures, enabling comprehensive analysis of the spatiotemporal patterns observed in OCO-2/3 and similar satellite data sets.

Copyright and License

© 2025 The Authors. Published by American Chemical Society.  This publication is licensed under CC-BY 4.0.

Acknowledgement

We thank the PI Computing Allowance Program at LBNL for computing allocations from the Lawrencium Cluster and the NASA High-End Computing Program through the NASA Advanced Supercomputing Division at NASA Ames Research Center for additional resources. The views and opinions expressed herein by the authors do not necessarily state or reflect those of NASA, the United States Government, or The Regents of the University of California.

Funding

This work at Lawrence Berkeley National Laboratory (LBNL) was supported by NASA’s funding (80HQTR21T0101) through the Earth Science Program for Carbon Cycle Science, under contract no. DE-AC02–05CH11231 with the U.S. Department of Energy.

Data Availability

To help the reader understand the overall method of GP modeling and provide details for the MLL approach, we provide the GPyTorch implementation for the GP MLL model and the input data at https://sites.google.com/lbl.gov/calgem/GP. Vulcan emissions can be accessed at 10.3334/ORNLDAAC/1810 (date last accessed: September 27, 2022), and fire emissions data are accessible at Zenodo (https://zenodo.org/records/7229675; date last accessed February 1, 2023). Bias-corrected versions of OCO-3 Level 2 (v10.4r) and OCO-2 Level 2 (v11r) data are available from https://disc.gsfc.nasa.gov/(date last accessed: June 15, 2023). The CARB California GHG Emission inventory is provided at https://ww2.arb.ca.gov/ghg-inventory-data (date last accessed: November 2, 2022), and the CalTrans Performance Measurement System data is found at https://pems.dot.ca.gov/(date last accessed: November 5, 2022). Air traffic data from the OpenSky Network is available at https://zenodo.org/records/5815448 (date last accessed: November 5, 2022). Container throughput counts for the Port of Oakland, Port of Los Angeles, and Port of Long Beach are respectively found at https://www.oaklandseaport.com/performance/facts-figures/(date last accessed: November 10, 2022), https://www.portoflosangeles.org/business/statistics/container-statistics (date last accessed: November 10, 2022), and https://polb.com/business/port-statistics/#teus-archive-1995-to-present (date last accessed: November 10, 2022). Carbon dioxide and biogenic fluxes data from CarbonTracker and the SMUrF model code are accessible at https://gml.noaa.gov/aftp/products/carbontracker/co2/CT-NRT.v2022-1/fluxes/daily/(date last accessed: March 14, 2024) and https://github.com/wde0924/SMUrF (date last accessed: March 29, 2023), respectively.

Supplemental Material

Description of Gaussian process kernels, probability distribution of prior parameters, methods for prior emissions, details of the GC model simulations, details of the GP MLL and classical Bayesian methods, uncertainty estimation method for GP MLL, and details of the computational cost for GP inverse modeling (PDF)

Files

jeong-et-al-2025-applying-gaussian-process-machine-learning-and-modern-probabilistic-programming-to-satellite-data-to.pdf

Additional details

Identifiers

Related works

Describes
Journal Article: PMC11912316 (PMCID)
Journal Article: 39992284 (PMID)
Is supplemented by
Supplemental Material: https://pubs.acs.org/doi/suppl/10.1021/acs.est.4c09395/suppl_file/es4c09395_si_001.pdf (URL)

Funding

National Aeronautics and Space Administration
80HQTR21T0101
United States Department of Energy
DE-AC02-05CH11231

Dates

Accepted
2025-02-14
Available
2025-02-24
Published online

Caltech Custom Metadata

Caltech groups
Division of Geological and Planetary Sciences (GPS)
Publication Status
Published