EECS 651 Source Coding Theory Term Project Presentations Wednesday, April 21, 1999 9 to 12:05 Room 4001 The public is invited. 9:10 JPEG Enhancements: Adaptive Block Sizes and Computed Quantization Matrices Eric A. Durant Enhancing JPEG with adaptive block sizes and quantization matrices having triangular, hyperbolic or circular symmetry leads to superior R-D performance, both in MSE and perceptual quality (the later being the goal). Block sizes are efficiently described by a quadtree, which is grown by minimizing transform coefficient entropy for a given quality using one-level look-ahead descent. A 4th or 5th order rational function, symmetry type, and quality factor compactly describe quantization matrices. For rate estimation, intermediate JPEG entropy coding symbols are calculated, and a Shannon rate estimate based on image data across several quality factors is applied. Further performance-complexity tradeoffs, such as separate entropy tables for each scale, are briefly considered. 9:35 Comparison of Quality Metrics for Various Compression Methods Sangwoo Lee, Daniel Pradilla, Anuj Saxena The purpose of this project is to compare the distortion measures SNR and PSNR to a quality metric for images compressed with jpeg, fractal and quadtree compression methods. The quali ty metric used is called DCTune, developed by Dr. Andrew B. Watson, Dr. Albert Ahumada and others. In order to do this, three different images were compressed using the three compression methods. The images used where lena, couple and harbor. These images were chosen because they represent a wide range of complexity from broader details of lena to finer details of harbor. The images were compressed at different rates. This was accomplished by changing the parameters on the various compression methods. A subjective evaluation was performed on the images to dete rmine the perceptual distortion. The result of this evaluation where compared to the results obtained from the different metrics. 10:00 Analysis of Quadtree Predictive Image Coding Algorithms Selin Aviyente, Styliani Petroudi, Victoria Yee Quadtree decomposition is a technique used to decompose images into different activity levels. In this project, the original image is segmented into subimages based on activity. Each subimage is treated as a separate image and a quadtree based on a threshold criterion is applied to each subimage. The value of each block is predicted based on its neighbors. The resulting prediction errors are quantized using a variable-scale uniform quantizer and are sent to the decoder along with the quadtree structure. The complexity and the performance of this algorithm is analyzed. The results from applying this algorithm to several natural and artificial gray-scale images will be shown. 10:25 An Analysis on the Performance of Transform Coding Jeongtae Kim and Yongsoon Eun In this short project, we propose new performance analysis methods for transform coding. The conventional high resolution performance analysis describes the performance by the variance of each transformed coefficient. We relate the variances of transformed coefficients to the power spectral density of transformed signal. Since a transform can be represented as a set of linear filter, we have related the variances of transformed coefficients to the power spectral density of input signal. By doing this, it becomes possible to predict the performance of given transform coding by simply viewing the power spectral density of input signal. Moreover, we can predict the performance difference of each transform for certain class of input signals. Mathematical analysis and simulation for our method has been focused on the DCT and DWT based transform coding. As a results, we present the result that the DCT is more effective for the highly correlated source and DWT is effective for the uncorrelated source. 10:50 Scaling Factors in Motion Estimation for Video Compression under ITU H.263 Mark Corner and Kevin Holt The transmission of high quality multimedia information is quickly becoming an everyday part of the home, business and mobile computing world. Users of interactive video teleconferencing want high quality video but are impeded by the constraints of low bit rate transmission media. Improving the rate distortion curves of advanced low bitrate codecs, such as MPEG and H.263 video codecs is a difficult task and often these improvements come at a high computational cost. In this project we improve on the two dimensional motion compensation found in most recent video codecs. This motion compensation predicts what regions of the video move between frames. Rather than encode the full difference between frames, the two dimensional motion vectors are encoded seperately. The difference between the two frames, taking into acount the motion estimates, are encoded using a DCT transform, quantization and losseless compression. This leads to an overall bitrate reduction at the same distortion level. This project adds a third dimension to the two dimensional motion estimate. This third dimension is perpendicular to the video frame and can be thought of as a scaling factor. Preliminary results show that this scaling factor can reduce the rate at which the difference frame is encoded by 10%, even in a video without significant perpendicular motion. This does not take into account the added bits needed to encode the 3rd dimension of the video codec, however the project estimates that rate as well. Results are also shown for a video with more significant perpendicular motion. Computational requirements for at least one search method are computed. Also, analytical results are compared to simulations to quantify the reduction in bit rate due to the reduction in difference frame variance. 11:15 Optimal Rate Constrained Scalar Quantization of a Gaussian source R. Nuriyev This project deals with variable-rate scalar quantization of a Gaussian source. We are dealing with rate constrained quantization (RCSQ) as opposed to entropy constrained quantization (ECSQ). The method of Lagrange multipliers is employed to solve the optimization problem. Optimality equations are found. A descent algorithm was employed in MATLAB and proved to be fairly simple, as long as computational time is concerned. Number of levels appeared to be finite in the sense that after some specific value increasing the number of levels did not yield an additional advantage. 20 to 25 levels for a rate of 2 bits/sample appeared to be sufficient. Also entropy constrained scalar quantizer performance is estimated (using a similar algorithm) and results are compared. The problem of local minima is more apparent in the case of rate-constraint than in the case of entropy constraint. Using values of thresholds and reconstruction levels found by ECSQ as initial values for RCSQ algorithm yielded good results, although for some rates the global minimum was still missed. 11:40 Encoding Speech by Sound Segmentation to Obtain a Minimal Rate Melanie R. Woodruff, Emily D. Ebert Goal: To investigate a low rate code for speech encoding with the error criterion bein g intelligibility. Problems: * Speech files are generally large. * Speech storage by traditional quantization of samples does not exploit the inherent periodicity of sounds, or the inherent repetition of speech itself. Solution: * Speech files are separated into speech and silence * Using the change in power spectral density over small intervals, blocks of speech are divided into regions of distinct sounds. * Each sound is quantized into a specially designed variable-length codebook of common sounds used in American English speech. The decisions are made based on an encoder codebook of power spectral densities of the codewords. * The speech is stored as a vector of codewords and the number of times the code word is repeated. * The vector is processed to remove redundancy and stored as a binary sequence derived from lossless coding techniques. * The decoder consists of the variable length codebook and a noise generator for noiselike sounds that are not well represented in a periodic fashion.