About the Event
In many settings we experience the world as a sequence of observations. For instance, we might observe the daily weather, read the words in a document, or receive a series of frames from a security camera. We would like to extract from these data a model of the world that allows us to make predictions: the probability of rain on Friday, the next word likely to be typed by a smartphone user, or whether to call the police.
A popular tool for sequence modeling problems is the hidden Markov model (HMM), which posits a latent state that evolves over time. HMMs are usually trained by EM, an iterative procedure that can be slow and does not guarantee convergence to anything more than a local optimum. Recently, however, Hsu et al showed that under certain assumptions a so-called spectral learning algorithm can be used to learn HMMs quickly, exactly, and in closed form.
Unfortunately, the assumptions of Hsu et al are rather unrealistic: their method requires that the observed data are generated by an HMM whose number of states (rank) is known. In practice we rarely know this true rank, and even if we did, for computational and statistical reasons we usually want to fit a lower rank model. This means that during the learning algorithm we must "throw out" the smallest singular values of a particular matrix. Intuitively, this seems like a reasonable approach.
However, in this talk we will describe a surprising result: that even when the singular values thrown out are arbitrarily small, the resulting prediction errors can be arbitrarily large. We identify two distinct possible causes for this bad behavior, and illustrate them with simple examples. We then show that these two causes are essentially complete; if they do not occur, the prediction error is bounded in terms of the magnitudes of the omitted singular values.