Defense Event

Information Extraction on Para-Relational Data

Shirley Zhe Chen

Thursday, December 03, 2015
09:30am - 11:00am
3316 EECS

Add to Google Calendar

About the Event

Para-relational data refers to a type of nearly-relational data that share the important qualities of relational data but do not present themselves in a relational format. Para-relational data often conveys highly valuable information and is widely used in many different areas. If we are able to convert para-relational data into the relational format, many existing tools can be leveraged for a variety of interesting applications, such as data analysis with relational query and data integration applications. In response, we have developed four standalone systems and each of which addresses a specific type of para-relational data. Senbazuru is a prototype spreadsheet database management system that is able to extract relational information from a large number of spreadsheets; Anthias suggests an extension on the system Senbazuru in order to convert a broader range of spreadsheets into a relational format; Lyretail is an extraction system that aims to detect long-tail dictionary entities on webpages; finally, DiagramFlyer is a web-based search system that obtains a large number of diagrams automatically extracted from the web-crawled PDFs. Together, these four systems demonstrate that converting para-relational data into the relational format is possible today, and also suggest directions for future systems.

Additional Information

Sponsor(s): Michael Cafarella

Open to: Public