MEMORY MANAGEMENT ----------------- IMPORTANCE OF DISK The memory hierarchy The role of magnetic disk -- inexpensive, random access, persistent Unit of transfer is block Access time = seek time + rotational latency + transfer time Random versus sequential access -- "blocked" access Idea of double-buffering. Biggest cost in database systems is disk access. How to use limited memory buffers wisely? COST MODEL Focus on what is important for simplicity: disk I/O Must understand what has been simplified away Carefully go through cost computations in Sec. 8.4 -- these make great exam questions. BUFFER MANAGEMENT Operating Systems do LRU -- simple and robust. DBMS can do better since access patterns are not random. Think very hard about fundamental system architecture -- what should the DBMS do for itself, what can it rely on other systems for? What is the mapping of DBMS activities to the OS? Processes, memory, files, ... Early Proposals: Domain Separation -- Classify data into groups, and divide buffer pool between domains. (Use LRU within a class) "New" (Ingres) algorithm -- Assign buffers per relation, and use a priority chain of relations to find free buffer. (Use MRU within a class) Hot Set algorithm -- Define hot points for specific algorithms, and allocate all the buffers needed. If not possible, make query wait. Use LRU for all else. DBMIN Algorithm: Based on Query Set Locality Model -- about 10 common access patterns are identified. Buffers are assigned per file (relation) *instance*. Each page has an owner, which is a file instance. Pages without an owner are in a global free list. Query i can access a page in memory owned by a different query (file instance) j. Admission control forces queries to suspend and wait if not enough memory to accommodate the locality set. (But it keeps resources already allocated to it!!) Extensive performance Evaluation. TECHNOLOGY TRENDS Ever larger buffers can be accommodated as memory sizes grow ever larger. But our largest databases are growing even faster!! On the other hand, many databases are growing slowly, if at all, and more and more of these are able to fit completely in main memory. Main memory databases have been explored off-and-on for 20 years, but have not become central to any major commercial products. Issues in main memory DB: Persistence becomes crucial cost driver -- efficient algorithms for this have the maximum performance impact. Index structures tend to change completely -- e.g. large nodes of B-tree no longer suitable. There still is a memory hierarchy, now through L1, L2 caches. STREAMING DATA How can you process/manipulate data as it "goes by". You are allowed to store (a limited amount of) internal state, but are not allowed to access the data again once it is gone. Some operations are trivial, e.g. select/project (implemented as "filter"). Others are very hard, e.g. sort. Applications in sensor data, but also stock ticker, billing record, ...