MULTIMEDIA DATABASES Issues: Logical Design Storing and serving multimedia Retrieval by Content Query Specification Index Structure Logical Design: How to fit in relational model? Use BLOBs or ADTs. Object-Relational ... Or use pointers to external files. Encoding (and compression) are issues. Physical Design: Physical layout choices -- within tuple or external External layout necessary for real-time delivery of large data (video) Careful placement on multiple disk drives in video-servers. "Read ahead". Issue of jitter -- resolve through buffering. Multiple stream synchronization. Retrieval by Content: Use User-Defined Functions in SQL to specify. These functions may need to refer to other objects. Provide a friendly interface ... Multimedia objects often less precisely defined than numbers. Approximate matches are useful. How do you define similarity or measure approximation error? Two main classes of queries -- (1) similarity (2) by (computed) attribute Attributes can be -- External (e.g. date, creator, owner, price) Physical (e.g. loudness, color histogram, texture) Semantic (e.g. by visually recognized object in image) - hard to do Similarity/approximate querying is very hard to specify. Multiple measures of distance. How to combine? What is a distance metric? Notion of transformations. Data objects as points in a multi-dimensional space. Attribute Space Derive a "feature vector" for the object. -K attributes, with a value for each. Represent as a point in a K-dimensional attribute space. Represent query as a point too. Approximation by allowing hyper-sphere around query point. Independently bound ranges in each dimension to get query hyper-rectangle. Feature Vector For Image -color, possibly by sector -brightness -texture For Document -terms that occur in the document What if Features are Not Known? Derive attribute space given only a (dis)similarity function. Multi-dimensional scaling. o Assign points at random to a space of desired dimensionality. o Iterates using steepest descent to minimize "stress" function. Index structures to support efficient access -- filter, not solve. Thumbnails -- user picks More expensive analyses -- software picks MULTI-DIMENSIONAL INDEX STRUCTURES: Applicable even in non-spatial applications. R-trees. First decent multi-dimensional structure. Many embellishments since. Understand how it works in terms of GiST. The curse of high-dimensionality. Most multi-dimensional index structures do not scale to high dimensions. Basic intuitions break down - More corners than points in the data set. - Low selectivity in any one dimension. - "Center point" included in most queries. Dimensionality Reduction Data often is not "truly" high dimensional. Variety of transforms can be used. - Fourier transform - Wavelet transform - Singular Value Decomposition And coefficients with small (absolute) values can be dropped.