Access Methods Fundamentals Scan versus Index In what order do you get index entries? Clustered vs Unclustered You can take unclustered RIDs and sort them to get "clustering" Hash or sort Double buffering Hashing Static Hashing -- overflow buckets. Dynamic Hashing -- with and without directory. If directory present, double directory when needed, grow data pages by one unit. If directory absent, shouldn't double to grow, so use temporary overflow chains. Selection Express in Conjunctive Normal Form In a conjunction, consider selectivity of predicates - choose most selective index. Consider next-most selective and decide whether to index. Disjunctions as Unions Worst disjunct rules. If you have to scan one, scan all. "Common sub-expression" hard to do, rarely done. Projection Trivial if no duplicate elimination. Sorting or hashing, with obvious early elimination of duplicates. Join - Most expensive of the basic relational operators. - So very carefully optimized. - Many types of joins: Theta Join Equi-join Semi-Join Outer Join Composition Structural Join - Nested-Loop Join - with blocking, with indexing - Sort-Merge Join (repeated values, backtrack, band-join) - Simple Hash Join (hash smaller relation) - Better Hash Joins use partitioning as needed Issue is temp-file for unprocessed partitions versus re-read of processed ones. Partition size hard to guess, dynamic repartitioning on partition overflow. Many versions: Partition Hash Join Grace Hash Join Hybrid Hash Join - Join Indices (Cost of maintenance) - Issues to consider include: Join Selectivity (hard to estimate, but e.g. nested loops may be preferred if high) Skew (in data) Join Order Set Operations Use hash or sort to perform set union, intersection and difference Aggregate Operations Full scan of relation Possibly preceded by sort or hash to support grouping Index can be used to avoid sort/hash.