MEMORY MANAGEMENT
-----------------

IMPORTANCE OF DISK

The memory hierarchy

The role of magnetic disk
    -- inexpensive, random access, persistent

Unit of transfer is block

Access time = seek time + rotational latency + transfer time

Random versus sequential access -- "blocked" access

Idea of double-buffering.

Biggest cost in database systems is disk access.

How to use limited memory buffers wisely?


COST MODEL

Focus on what is important for simplicity: disk I/O

Must understand what has been simplified away

Carefully go through cost computations in Sec. 8.4 -- these make great exam
questions.

BUFFER MANAGEMENT

Operating Systems do LRU -- simple and robust.

DBMS can do better since access patterns are not random.

Think very hard about fundamental system architecture -- what should the DBMS
do for itself, what can it rely on other systems for?  What is the mapping of
DBMS activities to the OS?  Processes, memory, files, ...

Early Proposals:

      Domain Separation -- Classify data into groups, and divide buffer pool
			   between domains.  (Use LRU within a class)

      "New" (Ingres) algorithm -- Assign buffers per relation, and use a
		           priority chain of relations to find free buffer.
			   (Use MRU within a class)

      Hot Set algorithm -- Define hot points for specific algorithms, and
			   allocate all the buffers needed. If not possible,
			   make query wait.  Use LRU for all else.

DBMIN Algorithm:

      Based on Query Set Locality Model -- about 10 common access patterns
      are identified.

      Buffers are assigned per file (relation) *instance*.

      Each page has an owner, which is a file instance.  Pages without an
      owner are in a global free list.

      Query i can access a page in memory owned by a different query (file
      instance) j.

      Admission control forces queries to suspend and wait if not enough
      memory to accommodate the locality set.  (But it keeps resources
      already allocated to it!!)

Extensive performance Evaluation.


TECHNOLOGY TRENDS

Ever larger buffers can be accommodated as memory sizes grow ever larger.
But our largest databases are growing even faster!!

On the other hand, many databases are growing slowly, if at all, and more and
more of these are able to fit completely in main memory.

Main memory databases have been explored off-and-on for 20 years, but have
not become central to any major commercial products.

Issues in main memory DB:
       Persistence becomes crucial cost driver -- efficient algorithms for
       this have the maximum performance impact.
       Index structures tend to change completely -- e.g. large nodes of
       B-tree no longer suitable.
       There still is a memory hierarchy, now through L1, L2 caches.


STREAMING DATA

How can you process/manipulate data as it "goes by".
You are allowed to store (a limited amount of) internal state, but are not
allowed to access the data again once it is gone.

Some operations are trivial, e.g. select/project (implemented as "filter").
Others are very hard, e.g. sort.

Applications in sensor data, but also stock ticker, billing record, ...