EECS 651 Winter 2003

CLASS PROJECT PRESENTATION SCHEDULE

April 16,  Room 3427.

4:40 PM

Coding of Sung Queries for Music Information Retrieval

Norman H. Adams and Mark A. Bartsch

One of the dominant approaches in the field of Music Information
Retrieval is that of "query-by-humming."  Systems that implement this
approach allow a user to search a database of music by singing into a
microphone.  In this domain, the two primary measures of performance
are the accuracy and complexity of the retrieval (classification)
task.  There are many open questions regarding the best way to encode
songs for accurate and efficient classification.  Some proposed
systems use coarsely quantized representations to exploit musical
structure and to compensate for query errors, but it is unclear
whether such attempts do in fact improve performance.  In this work,
we address this question by formulating a generalized two-stage
"query-by-humming" system.  The first stage is a generalized vector
quantizer that estimates the notes sung by a user and codes the
result.  The second stage classifies the query by computing distances
between the coded query and similarly coded songs in the database.
Using this system, we examine the relative performance of various
"note estimators" and quantization schemes on a set of 480 queries
from 14 different singers and 14 different songs.  Classification
accuracy and computational complexity for these various approaches are
discussed.


5:05 PM 

Design of a CELP Coder and Study of Complexity vs Quality Trade-offs
for Different Codebooks.

Suresh Kumar Devalapalli
Ramji Venkataramanan
;Raghuram Rangarajan

In this term project we study the compression of speech signals using
Code-excited Linear Prediction(CELP).  We design a basic CELP coder
and study the effect of changing rate on the distortion of the
reconstructed speech. We choose MSE and 'Perceptual' MSE as measures
of distortion.  We then examine methods to reduce complexity in the
encoder by using special types of codebooks viz., binary, ternary,
sparse and overlapping codebooks. We compare the performance of the
above codebooks in terms of complexity vs quality of reconstructed
speech. As a final step, we calculate the reduction in rate that can
be obtained by using variable rate coding.


5:35  

A Neural Signal Compression Scheme

Christos Pateropoulos

This project does not aim to analyze current coding schemes; instead,
the focus is on quantization and compression of signals with specific
statistical characteristics. More specifically, the compression of
neural data, using transform coding, is investigated. As a transform
code, wavelet coding is used. Wavelet transform is suitable for
characterizing neural signals, because they can describe spikes and
bursty data with accuracy. The compression is done by taking advantage
of the Gaussian characteristics of the neural data and of its wavelets
coefficients as well.


7 PM

Code Assignment in VQ Design for Noisy Channels

Shih-Yu Chang and Chih-Wei Wang

Vector quantization is a very useful technique for source coding in
the communication systems.  As we have learned in class so far, we
always assume the transmission of the quantized information is error
free.  Thus, the overall distortion of the system solely depends on
the quantization error.  However, in practical systems, the channel is
not always perfect, and errors may occur during the transmission.
This leads to the problem of designing VQ for the noisy channels.

In this project, we will focus on the design of the index assignment
for a one and two dimensional quantizers with different sources.  Our
goal is to find a good index assignment method so that the overall
distortion can be as small as possible.  We will propose two methods.
One method is to assign the Frog-in-the-Box(FIB) Codes seperately for
each dimension.  For example, in the 2 dimension case, the first n/2
bits are for one dimension and the rest n/2 bits are for the other
dimension.  The other method is to find a good way (in the sense of
small MSE distortion) to go through all the codevectors so that we
have an ordered set of the codevectors.  Then we simply apply the FIB
Code to this ordered set as the case in one dimension.  Performances
of each method will be compared with experimental results. The
encoding complexity will also be analyzed.


7:25 PM

Improvement on JPEG2000 Core Coding System

Chun-Hao Hsu, Shih-Yi Shih

JPEG2000, a new standard for still image coding, has been released
recently by the JPEG committee.  In contrast to the Discrete Cosine
Transform (DCT) used in the original JPEG standard, JPEG2000
implements the Discrete Wavelet Transform (DWT) which seems to be a
better compromise between computational complexity and performance. In
addition to the wavelet-based coding, JPEG2000 also adopts the new
Embedded Block Coding with Optimized Truncation (EBCOT) scheme, which
consists of a three-pass-based ordering process followed by a
sophisticated MQ binary arithmetic entropy coder and controls the rate
according to the optimal rate allocation criteria.  Armed with these
new techniques, JPEG2000 has made a tremendous improvement compared to
the original JPEG especially at low bit rates.

In this project, we focus on these state-of-the-art techniques and try
to enhance and analyze them by numerous experiments. First, we analyze
the effects of using different lengths of wavelet filters and
different decomposition levels to the MSE performance and
computational complexity in JPEG2000.  Second, due to the fantastic
nature of DWT, we can optimize the perceptual quality other than the
original MSE measure by weighting the MSE distortion of each subband
and pixels differently according to the Human Visual System (HVS).
Third, the optimality of the rate allocation process is investigated.
Fourth, to enhance the scalability of JPEG2000, we make possible the
assignments of any specific perceptual or MSE distortion with a
minimal rate.  Finally, comprehensive comparisons between DWT, KLT,
and DCT are made and analyzed in more detail.


7:50 PM

Progressive image coding with enhanced visibility of edges at low rates

A. Almal, U. Jayakumar, K. Subramanian

Optimizing Perceptual Quality using Edge Enhanced Progressive Image Coding.

In this project we attempt to analyse how the edge information
extracted from an image before coding can be used in the pre and post
processing stages to enhance the whole encoding process in terms of Rate,
PSNR and more significantly the perceptual quality of the image. The
encoding system we are concentrating on are progressive wavelet-based
image coders as these coders in an attempt to achieve the greatest reduction
in mean squared error (MSE) with each bit sent, only send information on the
lowest-frequency wavelet coefficients first. Hence at very low bit rates,
images compressed with these coders are dominated by low frequency information
and blotchy artifacts which are significant at element boundaries,
degrading the perceptual quality of the image.

We in this project present a new progressive image coder which
incorporates edge information with the goal of improving the perceptual
quality of compressed images at very low bit rates where the traditional
system comes up short. The idea is to capture important edges in the original
image and transmit them on a separate bit stream along with a traditional
wavelet coder bit stream.

The performance of the system will be evaluated in terms of the
PSNR and Perceptual quality of the image in comparison to the traditional
progressive coding system with importance placed on the latter. Tradeoffs
if any required will also be determined.

We also try a novel iterative image reconstruction algorithm whose
objective function takes into account the decoded edge information to improve
the perceptual quality by reducing the aforesaid blotchy artifacts and the
prominence of low frequency information.


8:20  

Tradeoffs between Complexity and Rate-Distortion in Making a
Macroblock Decision for H.26L: Evaluation and Analytical Modeling

Niresh Agarwal, Norihiko Sugita, Alan Wilson

Modern video coding algorithms spend a great deal of computation time
experimenting with various local encoding decisions to improve rate
and distortion.  Our project will focus on the tradeoffs between the
Complexity and the Rate-Distortion efficiency of one of the most
recent video coding standards, H.26L, begin developed for inclusion in
the MPEG standard (MPEG-4, Part 10).  This standard involves
sophisticated inter- and intra-frame prediction that involves
macro-block selection.  The macro-block mode decision is made using a
great amount of computation, essentially trying all possible modes and
comparing the resulting rate-distortion tradeoffs.  We will explore
the effects of simplifying the algorithm by fixing the macro-block
decision.  This should greatly reduce the complexity of the algorithm,
but at the cost of increased rate and/or distortion.  We will compute
rate and distortion for both fixed and dynamic mode decisions.  In
addition, we will model the effect of the statistics of the transform
coefficients on rate and distortion.  We will estimate the coefficient
statistics using a selection of sample videos.