A FEW POSSIBLE PROJECTS: 1. USER INTERFACE: NATURAL LANGUAGE BASED FORM SELECTION A natural language query interface is regarded by many as the holy grail for query interfaces. However, arbitrary natural language query understanding is widely regarded as "AI complete problem." A form-based interface is what is used most frequently, and is very convenient if the form can express the query desired by the user. One way to make form-based interfaces expressive is to have many different forms. But now the question becomes how can a user find the relevant form from a set of dozens (even hundreds) of options. The purpose of the project is to build a natural language query interface that can select a form that best matches from a given set. You will have access to the output of NaLIX, which converts natural language input into an XQuery output, and to a set of forms, described textually in a form-description language, and each having an XQuery output format known to you. Contact Yunyao Li (yunyaol@umich.edu) for more details. 2. POST-QUERY MANIPULATION Users often want to "play" with results returned from query. Traditional database systems would do this as a re-submission of a modified query. But this is too expensive, particularly across the web. Instead, we would like to exploit technologies such as AJAX to permit client-side manipulation. Again, using MiMI as a specific case, develop an AJAX-based query framework in which a query is run once to fetch a bunch of data, and then there can be effective client-side manipulation of the results. Contact Glenn Tarcea (gtarcea@umich.edu) 3. INFORMATION INTEGRATION: One really hard problem is matching entries in two database that represent the same thing. (Also known as the mailing list merge problem). Given two tuples, we can score how well they match based on which components of the tuples match, albeit imperfectly. In the context of tuples (with fixed structure) this problem has been studied extensively. Now consider matching with structural variation. Given two XML elements, whose structure does not agree completely, how can you score a match? There are many tree-matching algorithms to help with this task. How about graph structures -- this starts to get more interesting. Address one concrete instance of this problem. Consider biological "pathways", which are small graphs. Given two pathway representations from two different sources, how can you efficiently determine if the two represent the same biological process? Examples will be provided from the MiMI database. See Adriane Chapman (apchapma@umich.edu) for more info. 4. NEW ACCESS METHODS Reachability (transitive closure) computation is known to be expensive over graphs. (Each edge in a graph is computed as a join). If we limit the distance for reachability -- e.g. within distance 3, then the problem becomes much easier. Think of how one can do this most efficiently (e.g. with selections are specified at both end-points, it ma be easier to start from the two ends and do a join in the middle). Implement as an access method in TIMBER, and evaluate performance of alternatives. Contact Magesh Jayapandian (jmagesh@umich.edu) 5. NEW INDEX TECHNIQUES a. Implement an effective sub-string match index in Timber. b. Implement an "ancestor" index, and an access method using this for fast computation of least common ancestor, in Timber. See Nuwee Wiwatwattana (nuwee@umich.edu) for more info. 6. KEYWORD-BASED QUERYING We have developed a technique called "Schema-Free Querying", reading #28. Loosely speaking, this returns the finest granularity object that covers the terms specified in the query. But often we want more -- e.g. given a single keyword with a protein name, I want not just the name element, but also a bunch of information about the protein. Working with the MiMI system, use Lucene or Lemur to build an inverted index, and then develop an intelligent technique to return results at a reasonable granularity. Contact Cong Yu (congy@umich.edu) for more details. 7. WORD SENSE DISAMBIGUATION Consider a keyword query "Java programmer hire excellent" -- the intention of the user is clear. A web search using this query should suppress results about hiring excellent vacation cottages on the island of Java, for example, no matter how many times the terms :hire", "Excellent" etc. are repeated. This can be accomplished if we have pre-determined that there are (at least) two different senses for the word "Java". Querying then becomes a two-step process. First determine the desired sense of the query term Java. Then query only for documents in which Java is used in that sense (with the other sense not counting at all). As keyword queries become popular on structured data sets, similar techniques are likely to become important. Given a database, develop a classifier, based on the values in the database, to place a query keyword into an appropriate class. Contact Arnab Nandi (arnab@umich.edu) for more. 8. RANKING RESULTS Read the paper on "ObjectRank". Extend these ideas for phrase searching. Contact Arnab Nandi (arnab@umich.edu) for more details.