Applications of Virtual Data in the LIGO Experiment
Many Physics experiments today generate large volumes of data. That data is then processed in many ways in order to achieve the understanding of fundamental physical phenomena. Virtual Data is a concept that unifies the view of the data whether it is raw or derived. It provides a new degree of transparency in how data-handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN (Grid Physics Network) is a NSF-funded project, which aims to realize the concepts of Virtual Data. Among the physics applications participating in the project is the Laser Interferometer Gravitational-wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. LIGO will produce large amounts of data, which are expected to reach hundreds of petabytes over the next decade. Large communities of scientists, distributed around the world, need to access parts of these datasets and perform efficient analysis on them. It is expected that the raw and processed data will be distributed among various national centers, university computing centers, and individual workstations. In this paper we describe some of the challenges associated with building Virtual Data Grids for experiments such as LIGO.
© 2002 Springer-Verlag Berlin Heidelberg. This work was supported by NSF under contract ITR-0086044, "GriPhyN:Grid Physics Network," (www.griphyn.org). Scott Koranda's work was also supported by the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. We wish to thank all the members of the GriPhyN project for their valuable contributions.