Class Projects -- EECS 651 -- Winter 2001 Wednesday, April 18, 1 PM, Room 1001 The public is invited. 1:00-1:20 PM Vorbis Perceptual Audio Encoder Study Arnar Hrafnkelsson & Daniel Schonberg With the popularity of the MPEG-1 Layer 3 (MP3) algorithm for encoding wideband audio sources, people have been looking ahead to the next generation algorithm. Independent developers have developed the Vorbis codec. Vorbis is a patent free effort designed to outperform most current algorithms while maintaining an extremely low encoder and decoder complexity. Design of Vorbis has focused on portable device applications, thus placing limits on storage and computing capability. For this project, we studied various aspects of the Vorbis encoder. Specific areas of interest included the Psycho Acoustic Analysis (PSY), the floor approximation, the vector quantizer (VQ) cell shapes, the VQ dimensions, and the variable block length support. By measuring the change in rate necessary to maintain consistent audio quality was used to test the relative importance of these aspects of the design. 1:25-1:45 PM Sunil Gopalan, Wesley Sowers Effects of Lossy Speech Coding on Speaker Recognition In this project, we investigate the influence of speech coding on speaker recognition performance, with an emphasis on feature extraction and enhancement. Instead of evaluating speaker recognition techniques directly, we attempt to minimize the distortion of parameters used in the recognition/verification process. MFCC coefficients are popular parameters in speaker recognition/verification applications for they have been shown to be robust to channel and noise effects. We attempt to optimize variable rate LPC and DPCM encoders for the purpose of minimizing the distortion of these MFCC coefficients. Our results are in the form of rate-distortion curves and qualitative comparisons. 1:50-2:20 PM Motion-Compensated Video Coding with Adaptive Block Size and Motion Accuracy Sangtae Ahn, Kyoshin Choo, Jinsol Lee, and Hyunjin Park Typical video compression algorithms (like MPEG) perform block-based coding with motion compensation to exploit interframe correlations. In other words, we make first-order prediction using motion compensation and then perform DPCM-like coding of motion-compensated prediction errors. There are two parameters in motion vector estimation and coding: block size for motion vector estimation and motion accuracy for motion vector quatization. The goal is to reduce total rate (difference frame rate and motion rate) given a distortion by choosing those parameters suitably. The difficulty is that the optimal parameters are source- dependent. Roughly speaking, the optimal block size tends to be smaller when the source has higher texture, and the optimal motion accuracy becomes smaller when the block size increases. By using a total rate model established by Ribas-Corbera, we optimize both block size and motion accuracy. Moreover, we propose a low-compexity coding algorithm with adaptive block size and motion accuracy. We show the performance by coding real video sequences. 2:25-2:55 Improvements over Baseline JPEG Arvind Krishnamoorthy, Huzefa Neemuchwala, Mukundakumar Rajukumar, Thyagarajan Sadasiwan JPEG is a DCT based transform code that uniformly quantizes the DCT coefficients of each 8x8 image block with the corresponding step size mentioned in the Quantization Matrix. The loss in image quality is a consequence of this step. We improve the rate vs perceived image quality performance of the baseline JPEG coder by making use of Human Visual System (HVS) characteristics. As a first stage of improvement, we design an Image Independent Quantization Matrix based on the HVS that is optimized to provide minimum rate while introducing the maximum possible percpetually lossless distortion in the image. Secondly, to take advantage of the spatial variations of intensity, texture etc. in the image, we adapt the QM to the characteristics of the 8x8 block. We do this by computing a multiplier for the Quantization Matrix for every block based on the characteristics of the block being quantized. We think that the cost incurred in encoding the multiplier will be less compared to the improvement in rate. Performance of the coders is evaluated by measuring the rate achieved by the coder at the threshold of indistinguishability and comparing it to that achieved by standard JPEG. We also evaluate the PSNR achieved by each of the above encoding schemes at the threshold of indistinguishability. 3:00-3:15 PM -- BREAK 3:15-3:45 PM Vector Quantization for Joint Compression and Classification Jose Costa, Kaiann Fu, Emmanuel Naim, Baptiste Poupard Usually, when discussing source compression, the emphasis is placed on minimizing mean squared error (MSE). However, several applications seek not only to compress but also to classify the original data. This project focuses on analyzing vector quantizers (VQ) which are designed to simultaneously obtain good compression and good classification performance. This is accomplished through minimizing a modified distortion measure equal to the sum of MSE and a weighted probability of classification error or Bayes risk. We analyze the performance of these vector quantizers with several simple sources, such as Gaussian mixtures. We compare the performance of this modified VQ with the performance obtained using a VQ designed to minimize MSE alone. By minimizing the modified distortion measure, the classification error can be improved greatly without significantly degrading the MSE, i.e., the compression ability. The optimal cell shapes for the modified VQ are much different from the cell shapes which minimize normalized moment of inertia. The modified cell shapes tend to model the optimum decision regions. We also look at the possibility of using a tree-structured algorithm to achieve better computational efficiency. We attempt to apply the modified VQ to the problem of document segmentation, where we seek to classify text and images in a document. 3:50-4:15 PM A Compression Scheme for Ultrasound Elasticity Imaging Javier de Ana, Tim Hall, Jason Neiss Techniques in ultrasound imaging have been developed to measure mechanical properties of tissue from a sequence of ultrasound image frames taken while the tissue is compressed. These techniques are collectively known as elasticity imaging. The result is a very large amount of high dynamic range, complex valued data. This project studies a JPEG-like algorithm to significantly compress the data while maintaining good picture quality and whithout adversely affecting the mechanical properties derived from the data. Known properties of the data (Gaussian AR model, depth dependent attenuation) are used to tailor the JPEG algorithm for this application. Successive frames in the sequence are then encoded using DPCM subtraction from the first frame and a simple motion-estimation algorithm. 4:20-4:40 PM Reducing Computational Complexity in Statistically Based Emission Tomography through Vector Quantization of Emission Data Thomas Kragh & Susanne Milas Emission Tomographic Imaging acquires information about internal physiological processes non-invasively. Tracer atoms are injected into a patient, which decay into g-ray photons that travel through the body to an external detector. The information associated with each detection is then used to reconstruct a maximum-likelihood (ML) based estimate of the original tracer distribution within the patient. The quality of the estimated tracer distribution, along with the computational complexity of the ML estimator, is directly proportional to the number of detections. In this project, we apply and evaluate LBG and TSVQ quantization techniques to simulated detections of a two-dimensional Positron Emission Tomography (PET) system in order to reduce the complexity of the estimation problem. We evaluate the mean-square error distortion in the estimated tracer distribution and its relation to the number of quantization cells. We examine how both a weighted distortion measure in the quantizer and the MSE associated with these quantizations techniques varies with the number of quantization cells. 4:45-5:10 Robust DPCM Image Coding through Exploitation of Variable-Rate Code Desynchronization Daniel Marco, Paul Pelzl, and Barun Singh We examine the problem of image coding for the binary symmetric noisy channel. Image data is quantized using DPCM and is transmitted through the channel with the aid of a variable-rate code. The codebook is designed by modifying a Huffman codebook in such a way that any two codewords of the same length must have a Hamming distance of at least 2. This property ensures that a single bit error will cause the decoder to lose synchronization. Codewords are transmitted in blocks of known length, so loss of synchronization may be detected by the decoder with high probability. Detected errors are corrected using the error-free portions of the decoded image. An upper-bound on the probability of undetected error is calculated. Performance of this method is evaluated, and comparisons to previous work are presented.