Project Description
Interesting classes of signals, from bioinformatics to Internet traffic
analysis, are high dimensional in nature and thus appear highly complex.
However, in many cases such apparent complexity can be explained by only
a few simple variables, as a consequence of coherent structures in the data
that lead to strong correlations across dimensions. This work is concerned
with the characterization of high dimensional data sets through two key
quantities: the data intrinsic dimension ("roughly" characterizes
the number of independent variables needed to generate the signal) and entropy
("roughly" measures the information conveyed by the signal). We propose
a nonparametric graph theoretic approach. Our method simply constructs an
entropic graph - e.g., geodesic-minimal-spanning-tree (GMST), k-nearest
neighbor (k-NN) graph, etc - on sub samples of the data set and
then uses the total edge lengths of these graphs to estimate dimension and
entropy. Figures bellow illustrate these constructions for a 2-dimensional
dataset living in a 3D space.
Fig.2 : Log-log plot of the k-NN total edge
lenght as a function of the number of samples. The slope of the corresponding
linear fit provides an estimate of the intrinsic dimension, while the
intercept with the y-axis determines the intrinsic entropy.
1. J.A. Costa, (Hero advisor), "
Random graphs for structure discovery in high dimensional data,"
PhD Thesis, Dept. EECS, Aug. 2005.
2. J. A. Costa and A. O. Hero, "
Learning intrinsic dimension and intrinsic entropy of high dimensional datasets,"
to appear in
Proc. of EUSIPCO , Vienna, Austria, September, 2004.
3. J. A. Costa and A. O. Hero, "
Geodesic
Entropic Graphs for Dimension and Entropy Estimation in Manifold Learning,"
to appear in
IEEE Trans. on Signal Processing, August, 2004.
4. J. A. Costa and A. O Hero, "
Manifold
Learning Using k-Nearest Neighbor Graphs,"
Proc. of
IEEE Int. Conf. on Acoust. Speech and Signal Processing, Montreal,
Canada, May, 2004.
5. J. A. Costa and A. O Hero, "
Entropic
Graphs for Manifold Learning,"
Proc. of IEEE Asilomar Conf.
on Signals, Systems, and Computers, Pacific Groove, CA, November, 2003.
6. J.A. Costa, A. Girotra and A.O. Hero (2005), "
Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs,,
"
Proc of IEEE Workshop on Statistical Signal Processing (SSP), Bordeaux, July 2005.
7. J.A Costa and A. O. Hero, "
Learning intrinsic dimension and entropy of shapes,
in "Statistics and analysis of shapes, Eds. H. Krim and T. Yezzi, Birkhauser, 2006.
8. J.A. Costa, N. Patwari, A.O. Hero (2006),
"
Distributed Weighted Multidimensional Scaling for Node Localization in Sensor Networks,"
to appear in 2006 in ACM Journal on Networking.
9. S. Grikschat, J.A. Costa and A.O. Hero,
"
Dual rooted-diffusions for clustering and classification on manifolds,"
2006 IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Toulouse France, 2006.
Intrinsic Dimension and Entropy Estimation in Manifold
Learning
webpage.