Published June 2022 | Version Submitted + Published
Book Section - Chapter Open

Access Trends of In-network Cache for Scientific Data

  • 1. ROR icon University of California, Berkeley
  • 2. ROR icon Lawrence Berkeley National Laboratory
  • 3. ROR icon Energy Sciences Network
  • 4. ROR icon University of California, San Diego
  • 5. ROR icon California Institute of Technology

Abstract

Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.

Additional Information

© 2022 Copyright held by the owner/author(s). Attribution 4.0 International (CC BY 4.0). This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC). This work was also supported by the National Science Foundation through the grants OAC-2030508, OAC-1836650, MPS-1148698, PHY-1120138 and OAC-1541349.

Attached Files

Published - 3526064.3534110.pdf

Submitted - 2205.05563.pdf

Files

2205.05563.pdf

Files (3.3 MB)

Name Size Download all
md5:2365013ccf508d49112747886e5778fb
1.4 MB Preview Download
md5:4f0b9cb30a6e3a40714de2d4095c61c4
1.9 MB Preview Download

Additional details

Identifiers

Eprint ID
116058
Resolver ID
CaltechAUTHORS:20220802-839219000

Related works

Funding

Department of Energy (DOE)
DE-AC02-05CH11231
NSF
OAC-2030508
NSF
OAC-1836650
NSF
MPS-1148698
NSF
PHY-1120138
NSF
OAC-1541349

Dates

Created
2022-08-03
Created from EPrint's datestamp field
Updated
2022-08-03
Created from EPrint's last_modified field