A FEW POSSIBLE PROJECTS:

1. USER INTERFACE: NATURAL LANGUAGE BASED FORM SELECTION

A natural language query interface is regarded by many as the holy
grail for query interfaces. However, arbitrary natural language query
understanding is widely regarded as "AI complete problem." 
A form-based interface is what is used most frequently, and is very
convenient if the form can express the query desired by the user.
One way to make form-based interfaces expressive is to have many different
forms.  But now the question becomes how can a user find the relevant form
from a set of dozens (even hundreds) of options.

The purpose of the project is to build a natural language query interface
that can select a form that best matches from a given set.  You will have
access to the output of NaLIX, which converts natural language input into an
XQuery output, and to a set of forms, described textually in a
form-description language, and each having an XQuery output format known to
you. 

Contact Yunyao Li (yunyaol@umich.edu) for more details.

2. POST-QUERY MANIPULATION

Users often want to "play" with results returned from query.  Traditional
database systems would do this as a re-submission of a modified query.  But
this is too expensive, particularly across the web.  Instead, we would like
to exploit technologies such as AJAX to permit client-side manipulation.

Again, using MiMI as a specific case, develop an AJAX-based query framework
in which a query is run once to fetch a bunch of data, and then there can be
effective client-side manipulation of the results.

Contact Glenn Tarcea (gtarcea@umich.edu)

3. INFORMATION INTEGRATION:
One really hard problem is matching entries in two database that represent
the same thing.  (Also known as the mailing list merge problem).  Given two
tuples, we can score how well they match based on which components of the
tuples match, albeit imperfectly.  In the context of tuples (with fixed
structure) this problem has been studied extensively.

Now consider matching with structural variation.  Given two XML elements,
whose structure does not agree completely, how can you score a match?  There
are many tree-matching algorithms to help with this task.
How about graph structures -- this starts to get more interesting.

Address one concrete instance of this problem.  Consider biological
"pathways", which are small graphs.  Given two pathway representations from
two different sources, how can you efficiently determine if the two represent
the same biological process?  Examples will be provided from the MiMI
database.
See Adriane Chapman (apchapma@umich.edu) for more info.


4. NEW ACCESS METHODS

Reachability (transitive closure) computation is known to be expensive over
graphs.  (Each edge in a graph is computed as a join).  If we limit the
distance for reachability -- e.g. within distance 3, then the problem becomes
much easier.  Think of how one can do this most efficiently (e.g. with
selections are specified at both end-points, it ma be easier to start from
the two ends and do a join in the middle).  Implement as an access method in
TIMBER, and evaluate performance of alternatives.

Contact Magesh Jayapandian (jmagesh@umich.edu)


5. NEW INDEX TECHNIQUES

a. Implement an effective sub-string match index in Timber.

b. Implement an "ancestor" index, and an access method using this for fast
computation of least common ancestor, in Timber.

See Nuwee Wiwatwattana (nuwee@umich.edu) for more info.


6. KEYWORD-BASED QUERYING

We have developed a technique called "Schema-Free Querying", reading #28.
Loosely speaking, this returns the finest granularity object that covers the
terms specified in the query.  But often we want more -- e.g. given a single
keyword with a protein name, I want not just the name element, but also a
bunch of information about the protein.  Working with the MiMI system, use
Lucene or Lemur to build an inverted index, and then develop an intelligent
technique to return results at a reasonable granularity.

Contact Cong Yu (congy@umich.edu) for more details.


7. WORD SENSE DISAMBIGUATION

Consider a keyword query "Java programmer hire excellent" -- the intention of
the user is clear.  A web search using this query should suppress results
about hiring excellent vacation cottages on the island of Java, for example,
no matter how many times the terms :hire", "Excellent" etc. are repeated.
This can be accomplished if we have pre-determined that there are (at least)
two different senses for the word "Java".  Querying then becomes a two-step
process.  First determine the desired sense of the query term Java.  Then
query only for documents in which Java is used in that sense (with the other
sense not counting at all).

As keyword queries become popular on structured data sets, similar techniques
are likely to become important.  Given a database, develop a classifier, based
on the values in the database, to place a query keyword into an appropriate
class. 
Contact Arnab Nandi (arnab@umich.edu) for more.


8. RANKING RESULTS

Read the paper on "ObjectRank".  Extend these ideas for phrase searching.
Contact Arnab Nandi (arnab@umich.edu) for more details.