Snapshot Processing in Streaming Environments
Computational issues related to streaming data, and in particular the monitoring and rapid correlation of multiple sources of streaming data, are becoming increasingly important in contexts ranging from business processes to crisis detection. For example, a government system to detect bioterror attacks must correlate multiple streams of possibly low-confidence data from sensors and local and national public health information networks with cues from indicators such as news and government sources indicating geographical locations, tactics and timing of possible attacks. The results of this correlation trigger appropriate responses, such as flagging information for more in-depth analysis or sending alerts to public health officials. Monitoring and correlation applications of this type are ideal for deployment on distributed computing grids, because they have high transaction throughput, require low latency, and can be partitioned into sets of small communicating computations with regular communication patterns. An important consideration in these applications is the need to ensure that, at any given time, computations are carried out on an accurate - or at least close to accurate - picture of the environment being monitored. One way of doing this, which we call snapshot processing, is to treat collections of events that occur at approximately the same time as representing a global snapshot - a valid state - of the environment. Computation on the resulting series of snapshots is much like computation on a real-time video of the entire environment. We briefly describe our model for these stream processing computations and introduce the concept of snapshot processing
© 2006 IEEE. The research described here has been supported in part by the National Science Foundation under grant CCR-0312778, ITR: Information Infrastructures for Crisis Management, and by the Lee Center for Advanced Networking at Caltech.
||60.3 kB||Preview Download|