Indexing What is an Index * Partition data in buckets * Label each bucket * Use label to determine whether bucket is relevant to query * May organize buckets into a tree. At leaf-level, have to decide what goes in the bucket - data items or pointers to them. Also, if pointers, is it one per entry or a whole list. Clustered vs. unclustered indices. Sparse (clustered) indices as option. Primary vs secondary indices. Range queries vs. exact match queries. Some Indices Hash Index Inverted file Trie B Tree * Basic properties: - high fanout - balanced - dynamically updated. * Side pointers at leaves * B+ tree versus B tree. B+ Tree dynamic adjustment * Upon node overflow - Split node in two. - Increase size of parent by one extra child ref. - Overflow may propagate up. * Upon node underflow - merge with sibling - if merged node too big, split evenly Concurrency Control in Indices * Any update that affects leaf node could propagate to the top of the tree. * Upper portions of tree used by everyone - cannot afford to allow long locks * Come up with special concurrency control mechanisms that work for indices. * E.g. ordered-tree locking protocol. * But now have to redesign recovery system to do the right thing. A Generalized Search Tree New indices needed for new apps... - find all supersets of S - find all molecules that bind to M - your favorite query here (multimedia?) ...and for new queries over old domains: - find all points in region from 12 to 1 o?clock - find all strings that match R. E. Extensible, efficient search tree structure * Balanced tree of (p, ptr) pairs - each pair is an entry E * p is a key ?predicate? * p holds for all objects below ptr * keys on a page may overlap * Key predicates: a user-defined class - This is the only extensibility required! Key Methods * Search - Consistent(E,q): evaluate E.p && q * return a no or a maybe (not necessarily yes) * Labeling - Union(P): new key that holds for all tuples in P. * Partitioning - Penalty(E1,E2): penalty of inserting E2 in subtree E1 - PickSplit(P): split P into two groups of entries Tree Traversal * Search - traverse tree where Consistent is TRUE * Insert - descend tree along least increase in Penalty if there's room at leaf, insert there else split according to PickSplit and propagate changes using Union * Delete - Find the entry via Search, and delete it. Propagate changes using Union