Published October 7, 2025 | Version Published
Journal Article Open

Comparing Cache Utilization Trends for Regional Data Caches

  • 1. ROR icon Lawrence Berkeley National Laboratory
  • 2. ROR icon California Institute of Technology
  • 3. ROR icon Indiana University Bloomington
  • 4. ROR icon University of California, San Diego

Abstract

The rapid growth of data volumes from large scientific collaborations, such as the Large Hadron Collider (LHC), presents significant challenges for the High Energy Physics (HEP) community. With annual data volumes projected to increase by a factor of thirty by 2028, efficient data management has become a critical concern. The HEP community’s reliance on wide-area networks for global data distribution often results in redundant long-distance transfers, leading to network congestion and degraded application performance. This study investigates the effectiveness of regional data caches in mitigating network congestion and enhancing application performance, using a large-scale dataset of millions of access records from regional caches in Southern California, Chicago, and Boston, which serve the LHC’s CMS experiment. Our analysis reveals the substantial potential of in-network caching to transform large-scale scientific data dissemination, enabling faster and more efficient data access for researchers and scientists. Additionally, neural networks trained on data from multiple regional caches demonstrate enhanced predictive accuracy, particularly benefiting caches with limited historical data through transfer learning, thereby validating their robust generalization capability.

Copyright and License

© The Authors, published by EDP Sciences, 2025. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Acknowledgement

This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the US Department of Energy under Contract No. DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC). This work was also supported by the National Science Foundation through the grants OAC-1836650, PHY-2323298, PHY-2121686 and OAC-2112167.

Files

epjconf_chep2025_01341.pdf

Files (26.1 MB)

Name Size Download all
md5:10f99c6578e5bbfb11535ba8aaadd84d
26.1 MB Preview Download

Additional details

Funding

United States Department of Energy
DE-AC02-05CH11231
National Science Foundation
OAC-1836650
National Science Foundation
PHY-2323298
National Science Foundation
PHY-2121686
National Science Foundation
OAC-2112167

Caltech Custom Metadata

Caltech groups
Division of Physics, Mathematics and Astronomy (PMA)
Publication Status
Published