Michael J. Cafarella
|
Michael Cafarella
Assistant Professor
Computer Science and Engineering
2260 Hayward Ave.
Ann Arbor, MI 48109
Office: 4709 CSE
Phone: 734-764-9418
You can send me mail using michjc, found at umich dot edu.
|
I am a new professor in the Software Systems Lab in Computer Science and Engineering at the University of Michigan. I'm also a member of the Database Group. My research interests are in databases, information extraction, and data mining. I am particularly interested in applying data mining techniques to Web data and scientific applications.
If you share some of the same interests and are near the University of Michigan, you may want to check out the MIDAS (Michigan Data Sciences) group.
Note To Students: I'm interested in taking on new students, both graduate students and undergrads, to work on research projects. If you're at Michigan, send me an email and we'll chat! If you're not currently a student at Michigan but would still like to work together, please read this short document.
News
- The Second University of Michigan Workshop on Data, Text, Web, and Social Network Mining took place on April 22, 2011. Dragomir Radev and I organized it. More here.
- I've received a CAREER award to work on a structured data search engine. Many thanks, NSF!
- Eaman Jahani, Chris Re and I have a paper appearing in VLDB 2011: Automatic Optimization of MapReduce programs. Final copy to be posted soon...
- Thanks to the NSF, Google, GE, and Yahoo! for their recent support of our research.
Teaching
I'm teaching EECS584 in the fall of 2011. In winter 2012, I will teach EECS485; you can find last year's syllabus here.
Publications
2011
- Structured Data on the Web. Michael J. Cafarella, Alon Halevy, and Jayant Madhavan. Communications of the ACM 54(2): 72-79, 2011.
- Automatic Optimization of MapReduce Programs. Eaman Jahani, Michael J. Cafarella, and Christopher Re. To appear: VLDB 2011. Seattle, WA.
2010
2009
2008
- Ontology-driven, Unsupervised Instance Population. Luke K. McDowell and Michael Cafarella. Journal of Web Semantics 6(3): 218-236, 2008.
- Uncovering the Relational Web. Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of the Eleventh International Workshop on the Web and Databases (WebDB), June 2008. Vancouver, Canada.
- WebTables: Exploring the Power of Tables on the Web. Michael J. Cafarella, Alon Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. Proceedings of VLDB 2008, August 2008. Auckland, New Zealand.
- Data Management Projects at Google. Michael Cafarella, Edward Chang, Andrew Fikes, Alon Halevy, Wilson Hsieh, Alberto Lerner, Jayant Madhavan, S. Muthukrishnan. SIGMOD Record, 37(1), 2008.
- Web-Scale Extraction of Structured Data. Michael J. Cafarella, Jayant Madhavan, Alon Halevy. SIGMOD Record 37(4): 55-61, 2008.
2007
- Navigating Extracted Data with Schema Discovery. Michael J. Cafarella, Dan Suciu, Oren Etzioni. Proceedings of the Tenth International Workshop on the Web and Databases (WebDB), June 2007. Beijing, China.
- Structured Querying of Web Text: A Technical Challenge. Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko. Proceedings of the Conference on Innovative Data Systems Research (CIDR) 2007. Asilomar, CA.
- Open Information Extraction from the Web. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), January 2007. Hyderabad, India.
2006
2005
- KnowItNow: Fast, Scalable Information Extraction from the Web. Michael J. Cafarella, Doug Downey, Stephen Soderland, and Oren Etzioni. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Vancouver, 2005.
- A Search Engine for Natural Language Applications. Michael J. Cafarella, Oren Etzioni. Proceedings of the 14th International World Wide Web Conference (WWW 2005).
- Unsupervised named-entity extraction from the Web: An experimental study. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates. In Artificial Intelligence 165, pp. 91-134. 2005.
2004
- Methods for Domain-Independent Information Extraction
from the Web: An Experimental Comparison. Oren Etzioni, Michael Cafarella,
Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S.
Weld, Alexander Yates. Proceedings of AAAI 2004.
- Web-scale Information Extraction in KnowItAll.
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria
Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander
Yates. Proceedings of the 13th International World Wide Web Conference (WWW 2004).
- Building Nutch: Open Source Search by Mike Cafarella and Doug Cutting. ACM Queue, 2(2), April 2004.
Students
My current PhD students:
Short Bio
I earned my Ph.D. from the University of Washington in 2009, with advisors Oren Etzioni and Dan Suciu. I also worked with Alon Halevy at Google during an extended internship there. Before graduate school I worked for a couple of startups in California: Marimba, which did software distribution infrastructure; and Tellme Networks, which did (and does) voice recognition phone services. With Doug Cutting, I also costarted the Nutch and Hadoop open-source projects; I worked on them for many years but am no longer actively developing.
Miscellaneous
2011