A Caltech Library Service

Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

Szalay, Alexander S. and Kunszt, Peter Z. and Thakar, Ani and Gray, Jim and Slutz, Don and Brunner, Robert J. (2000) Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey. ACM SIGMOD Record, 29 (2). pp. 451-462. ISSN 0163-5808. doi:10.1145/335191.335439.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


The next-generation astronomy digital archives will cover most of the sky at fine resolution in many wavelengths, from X-rays, through ultraviolet, optical, and infrared. The archives will be stored at diverse geographical locations. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) is creating a 5-wavelength catalog over 10,000 square degrees of the sky (see The 200 million objects in the multi-terabyte database will have mostly numerical attributes in a 100+ dimensional space. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by multidimensional spatial and attribute indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes will accelerate frequent searches. Splitting the data among multiple servers will allow parallel, scalable I/O and parallel data analysis. Hashing techniques will allow efficient clustering, and pair-wise comparison algorithms that should parallelize nicely. Randomly sampled subsets will allow de-bugging otherwise large queries at the desktop. Central servers will operate a data pump to support sweep searches touching most of the data. The anticipated queries will require special operators related to angular distances and complex similarity tests of object properties, like shapes, colors, velocity vectors, or temporal behaviors. These issues pose interesting data management challenges.

Item Type:Article
Related URLs:
URLURL TypeDescription
Additional Information:© 2000 ACM. We would like to acknowledge support from the Astrophysical Research Consortium, the HSF, NASA and Intel’s Technology for Education 2000 program, in particular George Bourianoff (Intel).
Funding AgencyGrant Number
Astrophysical Research ConsortiumUNSPECIFIED
Subject Keywords:Database, archive, data analysis, data mining, astronomy, scaleable, Internet
Issue or Number:2
Record Number:CaltechAUTHORS:20161220-153938862
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:73014
Deposited On:20 Dec 2016 23:56
Last Modified:11 Nov 2021 05:10

Repository Staff Only: item control page