About the Event
The computational task that lies in the core of many machine learning
problems is the minimization of a cost function called the training
error. This problem is frequently solved by local search algorithms
such as gradient descent. The training error can usually be expressed
as a sum over many terms, each corresponding to the loss of the model
on a single training example. We show that the iteratively minimizing
a cost function of this form by local search is closely related to the
Imagine you are a shepherd in charge of a large herd of sheep and your
goal is to concentrate the sheep in a particular small area by
nightfall. Your influence on the sheep movements is represented by
vectors which define the direction in which you want each sheep
to move. The lengths of the vectors correspond to the fraction of your
"energy" that you spend on moving the particular sheep, and these
lengths sum to one. The sheep then have to respond by moving in a way
that has a slight correlation with the influence direction on average.
We characterize the min/max solution to this game and show that by
taking the appropriate small-step/continuous-time limit, this solution
can be characterized by a stochastic differential equation.
By solving this differential equation we re-derive some known boosting
as well as design some new ones with desirable properties.