Search engines are increasingly offering results that are based on a semantically rich interpretation of the user's intent and the content available to satisfy that intent. A natural question is to ask how far along we are in understanding content on the web. The Semantic Web seeks to enable publication of data with rich markups that facilitate automated interpretation; Yahoo!'s Search Monkey is an example of a service in this spirit. However, there is much useful data that is not semantically marked up, and many domains in which the coverage of existing structured data feeds is low. In this talk, I will discuss the goal of constructing a web of "concepts" (a term I use to denote entities, categories of entities, and relationships) by starting with the current view of the web (as a collection of hyperlinked pages, or documents, each seen as a bag of words).
We need to extract concept-centric metadata for a broad and deep set of important concepts, and stitch it together to create a semantically rich aggregate view of all the information available on the web for each concept instance. The goal of building and maintaining such a web of concepts presents many challenges, but also offers the promise of enabling many powerful applications, including novel search and information discovery paradigms. In this talk, I will describe a research agenda towards this goal and discuss related work, including the PSOX project at Yahoo!.
Dr. Ramakrishnan's work has influenced query optimization in commercial database systems and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received
the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Manage- ment Systems" (with Johannes Gehrke). Dr. Ramakrishnan received his B.Tech. from IIT Madras in 1983 and his Ph.D. from the University of Texas at Austin in 1987.