Adaptive Cognitive Orthotics: Combining Reinforcement Learning and Constraint-Based Temporal Reasoning (pdf) by Matthew Rudary, Satinder Singh and Martha Pollack.
Abstract:
Reminder systems support people with impaired prospective memory and/or
executive function, by providing them with reminders of their
functional daily activities. We integrate temporal constraint reasoning with reinforcement learning (RL) to build an adaptive reminder system and in a simulated environment demonstrate that it can personalize to a user and adapt to both short- and long-term changes. In addition to advancing the application domain, our integrated algorithm
contributes to research on temporal constraint reasoning by showing
how RL can select an optimal policy from amongst a set of
temporally consistent ones, and it contributes to the work on RL by
showing how temporal constraint reasoning can be used to dramatically
reduce the space of actions from which an RL agent needs to learn.
Humans can use speech and sounds to interact with pre-linguistic children and pets such as dogs with a remarkable degree of flexibility and richness. The goal for this project is to build machines that humans can interact with similarly. We are starting with a (Sony AIBO) dog that can listen to humans and learn from them using a simple form of real-time pitch extraction.
Abstract: Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.
Abstract: We report on the use of reinforcement learning with Cobot, a software agent residing in the well known online community LambdaMOO. Our initial work on Cobot (Isbell etal. 2000) provided him with the ability to collect social statistics and report them to users. Here we describe our application of RL to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users. Cobot learned nontrivial preferences for a number of users, modifying this behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.
Abstract: We describe CobotDS, a spoken dialogue system providing access to a well known internet chat server called LambdaMOO. CobotDS provides rea-time, two-way, natural language communication between a phone user and the multiple users in the text environment. We describe a number of challenging design issues we faced, and our use of summarization, social filtering and personalized grammars in tacking them. We report a number of empirical findings from a small user study.