Senbazuru is a prototype spreadsheet database management system (SSDBMS), which is able to extract relational information from spreadsheets. It opens up opportunities for integration among spreadsheets and with relational sources.

Senbazuru allows users to search for relevant spreadsheets in a large corpus, probabilistically constructs a relational version of the data, and offers relational operations over the resulting extracted data (including select and join).

Our demonstration is available on two clients: a JavaScript website and a touch interface on the iPad.

Search spreadsheets.

Using a textual search-and-rank interface, Senbazuru allows a user to quickly locate relevant datasets in a huge Web spreadsheet corpus.

Extract spreadsheets.

Spreadsheets often exhibit a implicit hierarchical structure between attributes and values. For example, the numeric value 28.7 is annotated by a number of surrounding attributes, including "1990", "45 to 64 years", "White", "Male" and "Total smoker".

Senbazuru automatically infers the implicit attribute structure in a spreadsheets and emits a high-quality attribute hierarchy based on a probabilistic model.

Repair spreadsheets.

Our repair interface allows users to manually repair extraction errors. Meanwhile, Senbazuru automatically exploits commonalities among errors to probabilistically re-apply one user fix to other similar mistakes.

Query spreadsheets.

Senbazuru supports basic relational operators which the user can apply to spreadsheet-derived relational tables, such as selection (i.e., filtering) and join. Users are not required to write SQL statements and can apply the operations via the interface.

Integrate spreadsheets.

Senbazuru allows users to integrate two arbitrary spreadsheet-derived relations. A user drags and releases a column to indicate the join key. Senbazuru also allows user to explore the spreadsheet data via visualization tools which work on the derived relational tables.


This project is supported by National Science Foundation grants IIS-1054913 and IIS-1064606, as well as gifts from Dow Chemical, Yahoo!, and Google.