CaltechAUTHORS
  A Caltech Library Service

Access Trends of In-network Cache for Scientific Data

Han, Ruize and Sim, Alex and Wu, Kesheng and Monga, Inder and Guok, Chin and Würthwein, Frank and Davila, Diego and Balcas, Justas and Newman, Harvey (2022) Access Trends of In-network Cache for Scientific Data. In: SNTA '22: Fifth International Workshop on Systems and Network Telemetry and Analytics. ACM , New York, NY, pp. 21-28. ISBN 978-1-4503-9315-7. https://resolver.caltech.edu/CaltechAUTHORS:20220802-839219000

[img] PDF - Published Version
Creative Commons Attribution.

1MB
[img] PDF - Submitted Version
Creative Commons Attribution Share Alike.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220802-839219000

Abstract

Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.


Item Type:Book Section
Related URLs:
URLURL TypeDescription
https://doi.org/10.1145/3526064.3534110DOIArticle
https://arxiv.org/abs/2205.05563arXivDiscussion Paper
ORCID:
AuthorORCID
Würthwein, Frank0000-0001-5912-6124
Davila, Diego0000-0002-8664-5154
Balcas, Justas0000-0001-9538-5078
Newman, Harvey0000-0003-0964-1480
Additional Information:© 2022 Copyright held by the owner/author(s). Attribution 4.0 International (CC BY 4.0). This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and also used resources of the National Energy Research Scientific Computing Center (NERSC). This work was also supported by the National Science Foundation through the grants OAC-2030508, OAC-1836650, MPS-1148698, PHY-1120138 and OAC-1541349.
Funders:
Funding AgencyGrant Number
Department of Energy (DOE)DE-AC02-05CH11231
NSFOAC-2030508
NSFOAC-1836650
NSFMPS-1148698
NSFPHY-1120138
NSFOAC-1541349
Subject Keywords:network cache, resource utilization, data pattern, prediction, xcache
DOI:10.1145/3526064.3534110
Record Number:CaltechAUTHORS:20220802-839219000
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220802-839219000
Official Citation:Ruize Han, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, Justas Balcas, and Harvey Newman. 2022. Access Trends of In-network Cache for Scientific Data. In Proceedings of the Fifth Int’l Workshop on Systems and Network Telemetry and Analytics (SNTA ’22), June 30, 2022, Minneapolis, MN, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3526064.3534110
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:116058
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:03 Aug 2022 15:29
Last Modified:03 Aug 2022 15:29

Repository Staff Only: item control page