About the EventIn this talk, I present two new families of subgradient methods for
online learning and optimization. The first family, which we call
Adagrad, dynamically incorporates knowledge of the geometry of the
data observed in earlier iterations to perform more informative
gradient-based learning. Metaphorically, the adaptation allows us to
find needles in haystacks in the form of very predictive but rarely
seen features. Our apparatus adaptively modifying the learning
algorithm, which significantly simplifies setting a learning rate and
results in guarantees that are provably as good as the best
gradient-based algorithm, chosen in hindsight. We experimentally study
the algorithms, showing that adaptive subgradient methods
significantly outperform state-of-the-art, yet non-adaptive,
subgradient algorithms. In the second part of the talk, I discuss a
family of algorithms that incorporate randomization to smooth the
problem being solved. Rather than take advantage of the geometry of
the data, these techniques take advantage of the stochasticity
inherent in online learning and statistical optimization problems. Our
randomization techniques directly lead to easily parallelizable
algorithms, and we provide guarantees on the convergence of the
resulting optimization procedures. These guarantees are order optimal
in both the amount of parallelization and number of samples available
to the algorithm, and to the best of our knowledge, these are the
first such guarantees. We conclude with experimental results that
demonstrate the effectiveness of the proposed algorithms. |