About the EventSearch engines are increasingly offering results that are based on a semantically rich interpretation of the user's intent and the content available to satisfy that intent. A natural question is to ask how far along we are in understanding content on the web. The Semantic Web seeks to enable publication of data with rich markups that facilitate automated interpretation; Yahoo!'s Search Monkey is an example of a service in this spirit. However, there is much useful data that is not semantically marked up, and many domains in which the coverage of existing structured data feeds is low. In this talk, I will discuss the goal of constructing a web of "concepts" (a term I use to denote entities, categories of entities, and relationships) by starting with the current view of the web (as a collection of hyperlinked pages, or documents, each seen as a bag of words).
We need to extract concept-centric metadata for a broad and deep set of important concepts, and stitch it together to create a semantically rich aggregate view of all the information available on the web for each concept instance. The goal of building and maintaining such a web of concepts presents many challenges, but also offers the promise of enabling many powerful applications, including novel search and information discovery paradigms. In this talk, I will describe a research agenda towards this goal and discuss related work, including the PSOX project at Yahoo!. |