CSE Seminar or Event

Toward a Healthier ML Development Ecosystem, for Just Four Pizzas a Day

Ce Zhang

Assistant Professor
ETH Zurich
Tuesday, December 04, 2018
1:30pm - 3:00pm
BBB 1690

Add to Google Calendar

About the Event

When training a machine learning model becomes fast, and model selection and hyper-parameter tuning become automatic, will non-CS experts finally have the tool they need to build ML applications all by themselves? In this talk, I will focus on those users who are still struggling -- not because of the speed and the lack of automation of an ML system, but because it is so powerful that it is easily misused as an "overfitting machine." For many of these users, the quality of their ML applications might actually decrease with these powerful tools without proper guidelines and feedback. In particular, I will introduce two systems, ease.ml/ci and ease.ml/meter, which we built as an early attempt at an ML system that tries to enforce the right user behavior during the development process of ML applications. The first, ease.ml/ci, is a "continuous integration engine" for ML that gives developers a pass/fail signal for each developed ML model depending on whether they satisfy certain predefined properties over the "true distribution". The second, ease.ml/meter, is a system that continuously returns some notion of the "degree of overfitting" to the developer. From the technical perspective, both systems build upon the classic theory of answering adaptive statistical queries. I will also discuss a set of simple but novel optimizations specific to the application scenarios of each system that bring down the cost by up to one order of magnitude compared with off-the-shelf results. For many real-world use cases of both systems, providing one adaptive signal per day for a month only requires up to 96K labeled examples, the equivalent in cost, in some applications, as low as four 35cm Domino's Pizzas per day.


Ce is an Assistant Professor in Computer Science at ETH Zurich. He believes that by making data—along with the processing of data—easily accessible to non-CS users, we have the potential to make the world a better place. His current research focuses on building data systems to support machine learning and help facilitate other sciences. Before joining ETH, Ce was advised by Christopher Ré. He finished his PhD round-tripping between the University of Wisconsin-Madison and Stanford University, and spent another year as a postdoctoral researcher at Stanford. His PhD work produced DeepDive, a trained data system for automatic knowledge-base construction. He participated in the research efforts that won the SIGMOD Best Paper Award (2014) and SIGMOD Research Highlight Award (2015), and was featured in special issues including the Science magazine (2017), the Communications of the ACM (2017), “Best of VLDB” (2015), and the Nature magazine (2015).

Additional Information

Contact: Barzan Mozafari

Email: mozafari @ umich.edu

Sponsor(s): Software, AI

Faculty Sponsor: Barzan Mozafari

Open to: Public