Approximate computation and implicit regularization in large-scale data analysis
ICSI and UC Berkeley
Thursday, April 24, 2014|
11:00am - 12:00pm
About the Event
Statisticians and computer scientists adopt very different perspectives on algorithms and data. A concept that lies at the heart of this disconnect is that of regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. Typically, it is implemented by adding some sort of norm constraint to an objective function and then exactly optimizing the modified objective function. In this talk, I will describe two examples illustrating how approximate computation, in and of itself, can implicitly lead to regularization.
The first example is a very precise theoretical characterization of how approximation procedures like computing the PageRank of a graph (a problem that originally arose to rank nodes in Web-scale graphs) exactly solve a regularized version of the problem of computing the leading nontrivial eigenvector of a graph Laplacian. Although this is a graph problem, it can be given a statistical interpretation (analogous to the manner in which Ridge regression and Lasso regression can be interpreted in terms of a Gaussian prior or a Laplace prior) which I will also describe. The second example is an empirical illustration of how spectral and flow-based approximation algorithms for computing approximations to the intractable graph partitioning problem (a problem that arises when trying to find communities in small and large graphs) implicitly provide more regular solutions than would be provided by an algorithm that exactly solved the intractable problem. These two examples grew out of our large-scale empirical analysis of the properties of large social and information networks, and so I will explain how understanding the statistical properties implicit within scalable worst-case approximation algorithms was essential for obtaining our downstream conclusions.
Michael Mahoney works on algorithmic and statistical aspects of modernlarge-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation’s program on the Theoretical Foundations of Big Data Analysis.
Contact: Grant Schoenebeck
Open to: Public