STREAMS AND EDDIES

Streaming Data

Data not resident on disk -- but rather "flowing by".
E.g. stock ticker, sensor measurements, ...
Can be stored and queries later on, but preferably queried right away.
Leads to work on streaming databases.


Techniques for Streaming Data

Online algorithms
Forced pipelining of execution (cannot sort)
Often will require some state
      try to keep this small, 
      may store locally on disk if needed.
AT&T Bill Computation Example.
Need to consider interaction of streaming and disk-resident data.
     Usually handled by considering disk data another stream.


New Access Methods for Streaming Data

Ripple join is a good example.
Incrementally join what you get new for one operand with everything youhave
for the other.
[[You do not need to understand ripple joins in detail or River at all]]


Eddies

Work as an n-ary operation, with Ready and Done bits.
Statistically select next component operator to perform.
Choice of operator depends on queue length -- provides automatic adjustment
based on cost.
Lottery ticket scheme provides adjustment for selectivity.
Escrow mechanism to implement (window-based) forgetting for dynamic
adaptivity.


Limits to Eddies

Synchronization barriers.
Understand effect of algorithm choice.
Maximize moments of symmetry.


Benefits of Eddies

Not just for streaming data.
Helps to get good query optimization even where costs/selectivities/arrival
rates are poorly understood or changing, due to any of:
      Hardware and workload complexity
      Poorly characterized data (as in federated systems)
      Long Running queries