Massive Datasets in Astronomy
Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, and computer technology. Numerous large digital sky surveys and archives already exist, with information content measured in multiple Terabytes, and even larger, multi-Petabyte data sets are on the horizon. Systematic observations of the sky, over a range of wavelengths, are becoming the primary source of astronomical data. Numerical simulations are also producing comparable volumes of information. Data mining promises to both make the scientific utilization of these data sets more effective and more complete, and to open completely new avenues of astronomical research. Technological problems range from the issues of database design and federation, to data mining and advanced visualization, leading to a new toolkit for astronomical research. This is similar to challenges encountered in other data-intensive fields today. These advances are now being organized through a concept of the Virtual Observatories, federations of data archives and services representing a new information infrastructure for astronomy of the 21 st century. In this article, we provide an overview of some of the major datasets in astronomy, discuss different techniques used for archiving data, and conclude with a discussion of the future of massive datasets in astronomy.
Additional Information© 2002 Springer Science+Business Media Dordrecht. We would like to acknowledge our many collaborators in the Virtual Observatory initiative. Particular thanks go to the members of the Digital Sky project and the NVO interim steering committee, where many of the ideas presented herein were initially discussed. We would also like to thank the editors for the immense patience with the authors. This work was made possible in part through the NPACI sponsored Digital Sky Project (NSF Cooperative Agreement ACI-96-19020) and generous grants from the SUN Microsystems Corporation and Microsoft Research. RJB would like to personally acknowledge financial support from the Fullam Award and NASA grant number NAG5-10885. SGD acknowledges support from NASA grants NAG5-9603 and NAG5-9482, the Norris foundation, and other private donors.
Submitted - 0106481.pdf