EECS 651 Winter 2003 CLASS PROJECT PRESENTATION SCHEDULE April 16, Room 3427. 4:40 PM Coding of Sung Queries for Music Information Retrieval Norman H. Adams and Mark A. Bartsch One of the dominant approaches in the field of Music Information Retrieval is that of "query-by-humming." Systems that implement this approach allow a user to search a database of music by singing into a microphone. In this domain, the two primary measures of performance are the accuracy and complexity of the retrieval (classification) task. There are many open questions regarding the best way to encode songs for accurate and efficient classification. Some proposed systems use coarsely quantized representations to exploit musical structure and to compensate for query errors, but it is unclear whether such attempts do in fact improve performance. In this work, we address this question by formulating a generalized two-stage "query-by-humming" system. The first stage is a generalized vector quantizer that estimates the notes sung by a user and codes the result. The second stage classifies the query by computing distances between the coded query and similarly coded songs in the database. Using this system, we examine the relative performance of various "note estimators" and quantization schemes on a set of 480 queries from 14 different singers and 14 different songs. Classification accuracy and computational complexity for these various approaches are discussed. 5:05 PM Design of a CELP Coder and Study of Complexity vs Quality Trade-offs for Different Codebooks. Suresh Kumar Devalapalli Ramji Venkataramanan ;Raghuram Rangarajan In this term project we study the compression of speech signals using Code-excited Linear Prediction(CELP). We design a basic CELP coder and study the effect of changing rate on the distortion of the reconstructed speech. We choose MSE and 'Perceptual' MSE as measures of distortion. We then examine methods to reduce complexity in the encoder by using special types of codebooks viz., binary, ternary, sparse and overlapping codebooks. We compare the performance of the above codebooks in terms of complexity vs quality of reconstructed speech. As a final step, we calculate the reduction in rate that can be obtained by using variable rate coding. 5:35 A Neural Signal Compression Scheme Christos Pateropoulos This project does not aim to analyze current coding schemes; instead, the focus is on quantization and compression of signals with specific statistical characteristics. More specifically, the compression of neural data, using transform coding, is investigated. As a transform code, wavelet coding is used. Wavelet transform is suitable for characterizing neural signals, because they can describe spikes and bursty data with accuracy. The compression is done by taking advantage of the Gaussian characteristics of the neural data and of its wavelets coefficients as well. 7 PM Code Assignment in VQ Design for Noisy Channels Shih-Yu Chang and Chih-Wei Wang Vector quantization is a very useful technique for source coding in the communication systems. As we have learned in class so far, we always assume the transmission of the quantized information is error free. Thus, the overall distortion of the system solely depends on the quantization error. However, in practical systems, the channel is not always perfect, and errors may occur during the transmission. This leads to the problem of designing VQ for the noisy channels. In this project, we will focus on the design of the index assignment for a one and two dimensional quantizers with different sources. Our goal is to find a good index assignment method so that the overall distortion can be as small as possible. We will propose two methods. One method is to assign the Frog-in-the-Box(FIB) Codes seperately for each dimension. For example, in the 2 dimension case, the first n/2 bits are for one dimension and the rest n/2 bits are for the other dimension. The other method is to find a good way (in the sense of small MSE distortion) to go through all the codevectors so that we have an ordered set of the codevectors. Then we simply apply the FIB Code to this ordered set as the case in one dimension. Performances of each method will be compared with experimental results. The encoding complexity will also be analyzed. 7:25 PM Improvement on JPEG2000 Core Coding System Chun-Hao Hsu, Shih-Yi Shih JPEG2000, a new standard for still image coding, has been released recently by the JPEG committee. In contrast to the Discrete Cosine Transform (DCT) used in the original JPEG standard, JPEG2000 implements the Discrete Wavelet Transform (DWT) which seems to be a better compromise between computational complexity and performance. In addition to the wavelet-based coding, JPEG2000 also adopts the new Embedded Block Coding with Optimized Truncation (EBCOT) scheme, which consists of a three-pass-based ordering process followed by a sophisticated MQ binary arithmetic entropy coder and controls the rate according to the optimal rate allocation criteria. Armed with these new techniques, JPEG2000 has made a tremendous improvement compared to the original JPEG especially at low bit rates. In this project, we focus on these state-of-the-art techniques and try to enhance and analyze them by numerous experiments. First, we analyze the effects of using different lengths of wavelet filters and different decomposition levels to the MSE performance and computational complexity in JPEG2000. Second, due to the fantastic nature of DWT, we can optimize the perceptual quality other than the original MSE measure by weighting the MSE distortion of each subband and pixels differently according to the Human Visual System (HVS). Third, the optimality of the rate allocation process is investigated. Fourth, to enhance the scalability of JPEG2000, we make possible the assignments of any specific perceptual or MSE distortion with a minimal rate. Finally, comprehensive comparisons between DWT, KLT, and DCT are made and analyzed in more detail. 7:50 PM Progressive image coding with enhanced visibility of edges at low rates A. Almal, U. Jayakumar, K. Subramanian Optimizing Perceptual Quality using Edge Enhanced Progressive Image Coding. In this project we attempt to analyse how the edge information extracted from an image before coding can be used in the pre and post processing stages to enhance the whole encoding process in terms of Rate, PSNR and more significantly the perceptual quality of the image. The encoding system we are concentrating on are progressive wavelet-based image coders as these coders in an attempt to achieve the greatest reduction in mean squared error (MSE) with each bit sent, only send information on the lowest-frequency wavelet coefficients first. Hence at very low bit rates, images compressed with these coders are dominated by low frequency information and blotchy artifacts which are significant at element boundaries, degrading the perceptual quality of the image. We in this project present a new progressive image coder which incorporates edge information with the goal of improving the perceptual quality of compressed images at very low bit rates where the traditional system comes up short. The idea is to capture important edges in the original image and transmit them on a separate bit stream along with a traditional wavelet coder bit stream. The performance of the system will be evaluated in terms of the PSNR and Perceptual quality of the image in comparison to the traditional progressive coding system with importance placed on the latter. Tradeoffs if any required will also be determined. We also try a novel iterative image reconstruction algorithm whose objective function takes into account the decoded edge information to improve the perceptual quality by reducing the aforesaid blotchy artifacts and the prominence of low frequency information. 8:20 Tradeoffs between Complexity and Rate-Distortion in Making a Macroblock Decision for H.26L: Evaluation and Analytical Modeling Niresh Agarwal, Norihiko Sugita, Alan Wilson Modern video coding algorithms spend a great deal of computation time experimenting with various local encoding decisions to improve rate and distortion. Our project will focus on the tradeoffs between the Complexity and the Rate-Distortion efficiency of one of the most recent video coding standards, H.26L, begin developed for inclusion in the MPEG standard (MPEG-4, Part 10). This standard involves sophisticated inter- and intra-frame prediction that involves macro-block selection. The macro-block mode decision is made using a great amount of computation, essentially trying all possible modes and comparing the resulting rate-distortion tradeoffs. We will explore the effects of simplifying the algorithm by fixing the macro-block decision. This should greatly reduce the complexity of the algorithm, but at the cost of increased rate and/or distortion. We will compute rate and distortion for both fixed and dynamic mode decisions. In addition, we will model the effect of the statistics of the transform coefficients on rate and distortion. We will estimate the coefficient statistics using a selection of sample videos.