Reinforcement Learning Publications
Refereed Conference and Journal Papers
On Discovery and Learning of Models with Predictive State Representations of State for Agents with Continuous Actions and Observations by David Wingate and Satinder Singh. In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
pdf.
Relational Knowledge with Predictive State Representations by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.
pdf.
An Experts Algorithm for Transfer Learning by Erik Talvitie and Satinder Singh. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), 2007.
pdf.
Cobot in LambdaMOO: An Adaptive Social Statistics Agent by Charles Isbell, Michael Kearns, Satinder Singh, Christian Shelton, Peter Stone and Dave Kormann. In Journal of Autonomous Agents and Multi-Agent Systems, 13(3), pages 327-354, 2006.
pdf.
Mixtures of Predictive Linear Gaussian Models for Nonlinear Stochastic Dynamical Systems by David Wingate and Satinder Singh. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
pdf.
Using Homomorphisms to Transfer Options Across Reinforcement Learning Domains by Vishal Soni and Satinder Singh. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
pdf.
Kernel Predictive Linear-Gaussian Models for Nonlinear Stochastic Dynamical Systems by David Wingate and Satinder Singh. In Proceedings of the 23rd International Conference on Machine Learning (ICML), 1017-1024, 2006.
pdf.
Predictive linear-Gaussian models of controlled stochastic dynamical systems by Matthew Rudary and Satinder Singh. In Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006.
pdf.
Predictive linear-Gaussian models of stochastic dynamical systems by Matthew Rudary, Satinder Singh and David Wingate. In Proceedings of the Uncertainty in Artificial Intelligence (UAI), pages 501-508, 2005.
pdf.
Predictive State Representations with Options by Britton Wolfe and Satinder Singh. In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1025-1032, 2006.
pdf.
Reinforcement Learning of Hierarchical Skills on the Sony Aibo Robot by Vishal Soni and Satinder Singh. In Proceedings of the 5th International Conference on Development and Learning (ICDL), 2006.
pdf.
Combining Memory and Landmarks with Predictive State Representations by Michael R. James, Britton Wolfe and Satinder Singh. In Proceedings of 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 7340739, 2005.
pdf.
Planning in Models that Combine Memory with Predictive Representations of State by Michael R. James and Satinder Singh. In Proceedings of 20th National Conference on Artificial Intelligence (AAAI), 2005.
pdf.
Learning Predictive State Representations in Dynamical Systems Without Reset by Britton Wolfe, Michael R. James and Satinder Singh. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 985-992, 2005.
pdf.
Intrinsically Motivated Reinforcement Learning by Satinder Singh, Andrew G. Barto and Nuttapong Chentanez. To appear in Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), 2005.
pdf.
Predictive linear-Gaussian models of stochastic dynamical systems by Matthew Rudary, Satinder Singh and David Wingate. In Proceedings of the Uncertainty in Artificial Intelligence 21 (UAI), 2005.
pdf.
Intrinsically Motivated Learning of Hierarchical Collections of Skills by Andrew G. Barto, Satinder Singh, and Nuttapong Chentanez. To appear in Proceedings of International Conference on Developmental Learning (ICDL), 2004.
pdf.
Predictive State Representations: A New Theory for Modeling Dynamical Systems by Satinder Singh, Michael R. James and Matthew R. Rudary. In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
pdf.
Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset by Michael James and Satinder Singh. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 417-424, 2004.
pdf.
Adaptive Cognitive Orthotics: Combining Reinforcement Learning and Constraint-Based Temporal Reasoning by Matthew Rudary, Satinder Singh and Martha Pollack. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 719-726, 2004.
pdf.
A Nonlinear Predictive State Representation by Matthew Rudary and Satinder Singh. In Advances in Neural Information Processing Systems 16 (NIPS), pages 855-862, 2004.
pdf.
Planning with Predictive State Representations by Michael R. James, Satinder Singh and Michael Littman. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA),pages 304-311, 2004.
pdf.
Learning Predictive State Representations by Satinder Singh, Michael Littman, Nicholas Jong, David Pardoe and Peter Stone. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 712-719, 2003.
gzipped postscript.
Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System by Satinder Singh, Diane Litman, Michael Kearns and Marilyn Walker. In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002.
gzipped postscript pdf.
Near-Optimal Reinforcement Learning in Polynomial Time by Michael Kearns and Satinder Singh. In Machine Learning journal, Volume 49, Issue 2, pages 209-232, 2002.
( shorter version appears in ICML 1998).
gzipped postscript pdf.
Predictive Representations of State by Michael Littman, Richard Sutton and Satinder Singh. In Advances in Neural Information Processing Systems 14 (NIPS), pages 1555-1561, 2002.
gzipped postscript pdf.
Cobot: A Social Reinforcement Learning Agent by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone. In Advances in Neural Information Processing Systems 14 (NIPS) pages 1393-1400, 2002.
gzipped postscript pdf.
A Social Reinforcement Learning Agent by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone. In Proceedings of the Fifth International Conference on Autonomous Agents (AGENTS), pages 377-384, 2001.
Winner of Best Paper Award.
gzipped postscript.
Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System by Satinder Singh, Michael Kearns, Diane Litman, and Marilyn Walker. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 645-651, 2000.
gzipped postscript pdf.
Automatic Optimization of Dialogue Management by Diane Litman, Michael Kearns, Satinder Singh and Marilyn Walker. In Proceedings of the 18th International Conference on Computational Linguistics (COLING), pages 502-508, 2000.
gzipped postscript pdf.
Eligibility Traces for Off-Policy Policy Evaluation by Doina Precup, Richard Sutton, and Satinder Singh. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 759-766, 2000.
gzipped postscript pdf.
"Bias-Variance" Error Bounds for Temporal Difference Updates by Michael Kearns and Satinder Singh. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT), pages 142-147, 2000.
gzipped postscript.
Reinforcement Learning for Spoken Dialogue Systems by Satinder Singh, Michael Kearns, Diane Litman and Marilyn Walker. In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
gzipped postscript.
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms by Satinder Singh, Tommi Jaakkola, Michael Littman, and Csaba Szpesvari. In Machine Learning Journal, vol 38(3), pages 287-308, 2000.
gzipped postscript.
Policy Gradient Methods for Reinforcement Learning with Function Approximation by Richard Sutton, Dave McAllester, Satinder Singh and Yishay Mansour. In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
gzipped postscript.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning by Richard Sutton, Doina Precup and Satinder Singh. In Artificial Intelligence Journal, Volume 112, pages 181-211, 1999.
gzipped postscript.
Approximate Planning for Factored POMDPs using Belief State Simplification by Dave McAllester and Satinder Singh. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 409-416, 1999.
gzipped postscript.
On the Complexity of Policy Iteration by Yishay Mansour and Satinder Singh. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 401-408, 1999.
gzipped postscript.
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms by Michael Kearns and Satinder Singh. In Advances in Neural Information Processing Systems 11 (NIPS), pages 996-1002, 1999.
gzipped postscript.
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes by John K. Williams and Satinder Singh. In Advances in Neural Information Processing Systems 11 (NIPS), pages 1073-1079, 1999.
gzipped postscript.
Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning by Timothy Brown, Hong Tong, and Satinder Singh. In Advances in Neural Information Processing Systems 11 (NIPS), pages 982-988, 1999.
gzipped postscript.
Improved switching among temporally abstract actions by Richard Sutton, Satinder Singh, Doina Precup and Balaraman Ravindran. In Advances in Neural Information Processing Systems 11 (NIPS), pages 1066-1072, 1999.
gzipped postscript.
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes by John Loch and Satinder Singh. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 323-331, 1998.
gzipped postscript.
Near-Optimal Reinforcement Learning in Polynomial Time by Michael Kearns and Satinder Singh. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 260-268, 1998.
gzipped postscript.
Intra-Option Learning about Temporally Abstract Actions by Richard Sutton, Doina Precup and Satinder Singh. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 556-564, 1998.
gzipped postscript.
Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors by Doina Precup, Richard Sutton, and Satinder Singh. In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 382-393. 1998.
gzipped postscript.
How to Dynamically Merge Markov Decision Processes by Satinder Singh and David Cohn. In Advances in Neural Information Processing Systems 10 (NIPS), pages 1057-1063, 1998.
gzipped postscript pdf.
Analytical Mean Squared Error Curves for Temporal Difference Learning by Satinder Singh and Peter Dayan. In Machine Learning Journal, Volume 32, Issue 1, pages 5-40, 1998.
gzipped postscript.
A shorter version appears in the NIPS 9 Proceedings.
Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems by Satinder Singh and Dimitri Bertsekas. In Advances in Neural Information Processing Systems 9 (NIPS), pages 974-980, 1997.
gzipped postscript
Analytical Mean Squared Error Curves for Temporal Difference Learning by Satinder Singh and Peter Dayan. In Advances in Neural Information Processing Systems 9 (NIPS), pages 1054-1060, 1997.
gzipped postscript.
Reinforcement Learning with Replacing Eligibility Traces by Satinder Singh and Richard Sutton. In Machine Learning journal, Volume 22, Issue 1, pages 123-158, 1996.
gzipped postscript abstract.
Learning Curve Bounds for Markov Decision Processes with Undiscounted Rewards by Lawrence Saul and Satinder Singh. In Proceedings of 9th Annual Conference on Computational Learning Theory (COLT), pages 147-156, 1996.
gzipped postscript.
Improving Policies Without Measuring Merits by Peter Dayan and Satinder Singh. In Advances in Neural Information Processing Systems 8 (NIPS), pages 1059-1065, 1996.
gzipped postscript.
Markov Decision Processes in Large State Spaces by Lawrence Saul and Satinder Singh. In Proceedings of 8th Annual Workshop on Computational Learning Theory (COLT), pages 281-288, 1995.
gzipped postscript.
Learning to Act using Real-Time Dynamic Programming by Andrew Barto, Steve Bradtke and Satinder Singh. In Artificial Intelligence, Volume 72, pages 81-138, 1995.
gzipped postscript.
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms by Tommi Jaakkola, Michael Jordan and Satinder Singh. In Neural Computation, Volume 6, Number 6, pages 1185-1201, 1994.
gzipped postscript.
Reinforcement Learning With Soft State Aggregation by Satinder Singh, Tommi Jaakkola and Michael Jordan. In Advances in Neural Information Processing Systems 7 (NIPS), pages 361-368, 1995.
gzipped postscript pdf.
Stochastic Convergence of Iterative DP Algorithms by Tommi Jaakkola, Michael Jordan and Satinder Singh. In Advances in Neural Information Processing Systems 6 (NIPS), pages 703-710, 1994.
gzipped postscript pdf.
Reinforcement Learning Algorithm for Partially Observable Markov Problems by Tommi Jaakkola, Satinder Singh and Michael Jordan. In Advances in Neural Information Processing Systems 7 (NIPS), pages 345-352, 1995.
gzipped postscript pdf.
Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes by Satinder Singh. In Proceedings of the Twelth National Conference on Artificial Intelligence (AAAI), pages 700-705, 1994.
gzipped postscript.
Learning Without State-Estimation in Partially Observable Markovian Decision Processes by Satinder Singh, Tommi Jaakkola and Michael Jordan. In Machine Learning: Proceedings of the Eleventh International Conference (ICML), pages 284-292, 1994.
gzipped postscript pdf.
Robust Reinforcement Learning in Motion Planning by Satinder Singh, Andrew Barto, Roderic Grupen, and Christopher Connolly. In Advances in Neural Information Processing Systems 6 (NIPS), pages 655-662, 1994.
gzipped postscript.( 68 KBytes)
An Upper Bound on the Loss from Approximate Optimal-Value Functions by Satinder Singh and Richard Yee. In Machine Learning, Volume 16, Issue 3, pages 227-233, 1994.
gzipped postscript.
Distributed Representation of Limb Motor Programs in Arrays of Adjustable Pattern Generators by Neil Berthier, Satinder Singh, Andrew Barto, and Jim Houk. In Journal of Cognitive Neuroscience, vol 5:1, pages 56-78, 1993.
Reinforcement Learning with a Hierarchy of Abstract Models by Satinder Singh. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI), pages 202-207, 1992.
gzipped postscript.
Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models by Satinder Singh. In Proceedings of the Ninth Machine Learning Conference, pages 406-415, 1992.
gzipped postscript.
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks by Satinder Singh. In Machine Learning Journal, Volume 8, Issue 3, pages 323-339, 1992.
gzipped postscript.
The Efficient Learning of Multiple Task Sequences by Satinder Singh. In Advances in Neural Information Processing Systems 4 (NIPS), pages 251-258, 1992.
gzipped postscript.
Transfer of Learning Across Compositions of Sequential Tasks by Satinder Singh. In Machine Learning: Proceedings of the Eighth International Workshop, pages 348-352, 1991.
gzipped postscript.
Refereed Workshop Papers
Cobot in LambdaMOO: A Social Statistics Agent by Charles Isbell, Michael Kearns, Dave Korman, Satinder Singh, and Peter Stone. In Workshop on Interactive Robotics and Entertainment (WIRE), 2000. (this is an early workshop version of the AAAI paper with the same title)
gzipped postscript.
Hierarchical Optimal Control of MDPs by Amy McGovern, Doina Precup, Balaraman Ravindran, Satinder Singh and Richard Sutton. In Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, 1998.
gzipped postscript pdf.
Planning with Closed-Loop Macro Actions by Doina Precup, Richard Sutton and Satinder Singh. In Proceedings of AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
gzipped postscript.
Long Term Potentiation, Navigation and Dynamic Programming by Peter Dayan and Satinder Singh. In Proceedings of Computation and Neural Systems Meeting (CNS) 1996.
gzipped postscript.
On Step-Size and Bias in Temporal-Difference Learning by Richard Sutton and Satinder Singh. In Proceedings of Eighth Yale Workshop on Adaptive and Learning Systems, 1994.
gzipped postscript pdf abstract.
Soft Dynamic Programming Algorithms: Convergence Proofs by Satinder Singh. In Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.
gzipped postscript.
Reinforcement Learning and Dynamic Programming by Andrew Barto and Satinder Singh. In Proceedings of Sixth Yale Workshop on Adaptive and Learning Systems, 1990.
Magazine Articles, Book Chapters and Others
Reinforcement Learning for 3 vs. 2 Keepaway by Peter Stone and R. Sutton and Satinder Singh. In RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, and G. Kraetszchmar, Eds., Springer Verlag.
pdf file.
An earlier version appeared in the Proceedings of the RoboCup-2000 Workshop, Melbourne, Australia.
How to Make Software Agents Do the Right Thing: An Introduction to Reinforcement Learning by Satinder Singh, Peter Norvig and David Cohn. In Dr. Dobbs journal, March issue, 1997.
gzipped postscript [html version].
On the Computational Economics of Reinforcement Learning by Andrew Barto and Satinder Singh. In Proceedings of Connectionist Summer School, 1990.
gzipped postscript.
An Adaptive Sensorimotor Network Inspired by the Physiology of the Cerebellum by Jim Houk, Satinder Singh, Charles Fisher, and Andrew Barto. Appears as a chapter in WT Miller, RS Sutton, and PJ Werbos, editors, Neural Network for Control, pages 301-348, 1989.
An Almost Tutorial on RL (extracted from my Thesis)
An (Almost) Tutorial on Reinforcement Learning. gzipped postscript. Extracted from my 1993 thesis.
Unpublished Papers
Asynchronous Modified Policy Iteration with Single-sided Updates. Satinder Singh and Vijay Gullapalli. Working Paper, 1993.
gzipped postscript.