STREAMS AND EDDIES Streaming Data Data not resident on disk -- but rather "flowing by". E.g. stock ticker, sensor measurements, ... Can be stored and queries later on, but preferably queried right away. Leads to work on streaming databases. Techniques for Streaming Data Online algorithms Forced pipelining of execution (cannot sort) Often will require some state try to keep this small, may store locally on disk if needed. AT&T Bill Computation Example. Need to consider interaction of streaming and disk-resident data. Usually handled by considering disk data another stream. New Access Methods for Streaming Data Ripple join is a good example. Incrementally join what you get new for one operand with everything youhave for the other. [[You do not need to understand ripple joins in detail or River at all]] Eddies Work as an n-ary operation, with Ready and Done bits. Statistically select next component operator to perform. Choice of operator depends on queue length -- provides automatic adjustment based on cost. Lottery ticket scheme provides adjustment for selectivity. Escrow mechanism to implement (window-based) forgetting for dynamic adaptivity. Limits to Eddies Synchronization barriers. Understand effect of algorithm choice. Maximize moments of symmetry. Benefits of Eddies Not just for streaming data. Helps to get good query optimization even where costs/selectivities/arrival rates are poorly understood or changing, due to any of: Hardware and workload complexity Poorly characterized data (as in federated systems) Long Running queries