Reinforcement Learning Publications

Go back to publications main page.

  • Improving Predictive State Representations via Gradient Descent.
    by Nan Jiang, Alex Kulesza, and Satinder Singh.
    In Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.
    pdf.

  • Action-Conditional Video Prediction Using Deep Networks in ATARI Games.
    by Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
    In Neural Information Processing Systems, 2015.
    online videos
    arxiv pdf, NIPS pdf, NIPS Appendix pdf.

  • Abstraction Selection in Model-Based Reinforcement Learning.
    by Nan Jiang, Alex Kulesza, and Satinder Singh.
    In 32nd International Conference on Machine Learning (ICML), 2015.
    pdf.

  • The Dependence of Effective Planning Horizon on Model Accuracy.
    by Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis.
    In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2015.
    Best Paper Award
    pdf.

  • Low-Ranks Spectral Learning with Weighted Loss Functions.
    by Alex Kulesza, Nan Jiang, and Satinder Singh.
    In Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
    pdf.

  • Spectral Learning of Predictive State Representations with Insufficient Statistics.
    by Alex Kulesza, Nan Jiang, and Satinder Singh.
    In Twenty-Ninth AAAI Conference, 2015.
    pdf.

  • Optimal Rewards for Cooperative Agents.
    by Bingyao Liu, Satinder Singh, Richard Lewis, and Shiyin Qin.
    In IEEE Transactions on Autonomous Mental Development, Vol 6, Issue 4, 2014.
    pdf.

  • Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.
    by Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang.
    In Neural Information Processing Systems (NIPS), 2014.
    pdf.

  • Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory.
    by Michael Shvartsman, Richard L Lewis, and Satinder Singh.
    In Cognitive Modeling and Computational Linguistics (CMCL), 2014.
    pdf.

  • The Potential Impact of Intelligent Systems for Mobile Health Self-Management Support: Monte-Carlo Simulations of Text Message Support for Medication Adherence.
    by John Piette, Karen Farris, Sean Newman, Larry An, Jeremy Sussman, and Satinder Singh.
    In Annals of Behavioral Medicine, 2014.
    pdf.

  • Low-Rank Spectral Learning.
    by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
    In Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
    pdf.

  • Improving UCT Planning via Approximate Homomorphisms.
    by Nan Jiang, Satinder Singh, and Richard Lewis.
    In 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2014.
    pdf.

  • Utility Maximization and Bounds on Human Information Processing.
    by Andrew Howes, Richard L Lewis, and Satinder Singh.
    In Topics in Cognitive Science, Volume 6, Issue 2, pages 198-203, 2014.
    pdf.

  • Computational Rationality: Linking Mechanism and Behavior Through Utility Maximization.
    by Richard L Lewis, Andrew Howes, and Satinder Singh.
    In Topics in Cognitive Science, Volume 6, Issue 2, pages 279-311, 2014.
    pdf.

  • Reward Mapping for Transfer in Long-Lived Agents.
    by Xiaoxiao Guo, Satinder Singh, and Richard L Lewis.
    In Advances in Neural Information Processing Systems (NIPS), 26, 2013.
    pdf.

  • The adaptive nature of eye-movements in linguistic tasks: How payoff and architecture shape speed-accuracy tradeoffs.
    by Richard L Lewis, Michael Shvartsman, and Satinder Singh.
    In Topics in Cognitive Science, Vol. 5, Issue 3, pages 581-610, 2013.
    pdf.

  • Linking Context to Evaluation in the Design of Safety Critical Interfaces.
    by Michael Feary, Dorritt Billman, Xiuli Chen, Andrew Howes, Richard Lewis, Lance Sherry, and Satinder Singh.
    In Proceedings of Human-Computer Interaction International, 2013.
    pdf.

  • An Exploration of Low-Rank Spectral Learning.
    by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
    In ICML Workshop on Spectral Learning, 2013.
    pdf.

  • Optimal Rewards in Multiagent Teams
    by Bingyao Liu, Satinder Singh, Richard L. Lewis, and Syiyin Qin
    In International Conference on Development and Learning-EpiRob, 2012.
    pdf.

  • Strong Mitigation: Nesting Search for Good Policies within Search for Good Reward
    by Jeshua Bratman, Satinder Singh, Richard Lewis, and Jonathan Sorg.
    In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
    pdf.

  • Planning Delayed-Response Queries and Transient Policies under Reward Uncertainty
    by Rob Cohn, Edmund Durfee and Satinder Singh.
    In Proceedings of the Seventh Annual Workshop on Multiagent Sequential Decision-Making Under Uncertainty (MSDM), held in conjunction with AAMAS, 2012.
    pdf.

  • Planning and Evaluating Multiagent Influences Under Reward Uncertainty (Extended Abstract)
    by Stefan Witwicki, Inn-Tung Chen, Edmund Durfee and Satinder Singh.
    In 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
    pdf.

  • Learning to Make Predictions in Partially Observable Environments without a Generative Model
    by Erik Talvitie and Satinder Singh.
    In Journal of Artificial Intelligence Research, vol 42, pages 353-392, 2011.
    pdf.

  • Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI), 2011.
    pdf.

  • Reward Design via Online Gradient Ascent
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Neural Information Processing Systems (NIPS), 2010.
    pdf.

  • Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective
    by Satinder Singh, Richard Lewis, Andrew Barto, and Jonathan Sorg.
    In IEEE Transactions on Autonomous Mental Development, Vol 2, No 2, 2010.
    pdf

  • Modeling Multiple-mode Systems with Predictive State Representations
    by Britton Wolfe, Michael James and Satinder Singh.
    In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, 2010.
    pdf

  • Variance-Based Rewards for Approximate Bayesian Reinforcement Learning
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
    pdf

  • Internal Rewards Mitigate Agent Boundedness
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.
    pdf

  • Linear Options
    by Jonathan Sorg and Satinder Singh.
    In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010.
    (Finalist for Pragnesh Jay Modi Best Student Paper Award)
    pdf

  • Transfer via Soft Homomorphisms
    by Jonathan Sorg and Satinder Singh.
    In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
    pdf

  • SarsaLandmark: an Algorithm for Learning in POMDPs with Landmarks
    by Michael R. James and Satinder Singh.
    In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
    pdf

  • Where Do Rewards Come From?
    by Satinder Singh, Richard L. Lewis and Andrew G. Barto.
    In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci), 2009.
    pdf

  • Maintaining Predictions Over Time Without a Model
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009.
    pdf

  • Simple Local Models for Complex Dynamical Systems
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), 2008.
    pdf

  • Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State
    by David Wingate and Satinder Singh.
    In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 1176-1183, 2008.
    pdf

  • Building Incomplete but Accurate Models
    by Erik Talvitie, Britton Wolfe and Satinder Singh.
    In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
    pdf

  • Predictive Linear-Gaussian Models of Stochastic Dynamical Systems with Vector-Value Actions and Observations
    by Matthew Rudary and Satinder Singh.
    In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
    pdf

  • Approximate Predictive State Representations
    by Britton Wolfe, Michael R. James and Satinder Singh.
    In Procedings of the 2008 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2008.
    (Finalist for Pragnesh Jay Modi Best Student Paper Award)
    pdf

  • On Discovery and Learning of Models with Predictive State Representations of State for Agents with Continuous Actions and Observations
    by David Wingate and Satinder Singh.
    In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
    pdf

  • Relational Knowledge with Predictive State Representations
    by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh.
    In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2035-2040, 2007.
    pdf

  • An Experts Algorithm for Transfer Learning
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 1065-1070, 2007.
    pdf

  • Exponential Family Predictive Representations of State
    by David Wnigate and Satinder Singh.
    In Proceedings of the Advances in Neural Information Processing Systems, 20 (NIPS), pages 1617-1624, 2007.
    pdf

  • Cobot in LambdaMOO: An Adaptive Social Statistics Agent
    by Charles Isbell, Michael Kearns, Satinder Singh, Christian Shelton, Peter Stone and Dave Kormann.
    In Journal of Autonomous Agents and Multi-Agent Systems, 13(3), pages 327-354, 2006.
    pdf

  • Mixtures of Predictive Linear Gaussian Models for Nonlinear Stochastic Dynamical Systems
    by David Wingate and Satinder Singh.
    In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
    pdf

  • Using Homomorphisms to Transfer Options Across Reinforcement Learning Domains
    by Vishal Soni and Satinder Singh.
    In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
    pdf

  • Kernel Predictive Linear-Gaussian Models for Nonlinear Stochastic Dynamical Systems
    by David Wingate and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1017-1024, 2006.
    pdf

  • Predictive linear-Gaussian models of controlled stochastic dynamical systems
    by Matthew Rudary and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 777-784, 2006.
    pdf

  • Predictive State Representations with Options
    by Britton Wolfe and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1025-1032, 2006.
    pdf

  • Optimal Coordinated Planning Amongst Self-Interested Agents with Private State
    by Ruggiero Cavallo, David C. Parkes and Satinder Singh.
    In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
    pdf

  • Reinforcement Learning of Hierarchical Skills on the Sony Aibo Robot
    by Vishal Soni and Satinder Singh.
    In Proceedings of the 5th International Conference on Development and Learning (ICDL), 2006.
    pdf

  • Off-policy Learning with Options and Recognizers
    by Doina Precup, Richard Sutton, Cosmin Paduraru, Anna Koop and Satinder Singh.
    In Proceedings of Advances in Neural Information Processing Systems 18 (NIPS), pages 1097-1104, 2006.
    pdf

  • Intrinsically Motivated Reinforcement Learning
    by Satinder Singh, Andrew G. Barto and Nuttapong Chentanez.
    In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1281-1288, 2005.
    pdf

  • Approximately Efficient Online Mechanism Design
    by David Parkes, Satinder Singh and Dimah Yanovsky.
    In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1049-1056, 2005.
    pdf

  • Predictive linear-Gaussian models of stochastic dynamical systems
    by Matthew Rudary, Satinder Singh and David Wingate.
    In Proceedings of the Uncertainty in Artificial Intelligence (UAI), pages 501-508, 2005.
    pdf

  • Combining Memory and Landmarks with Predictive State Representations
    by Michael R. James, Britton Wolfe and Satinder Singh.
    In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005.
    pdf

  • Learning Payoff Functions in Infinite Games
    by Yevgeniy Vorobeychik, Michael Wellman and Satinder Singh.
    In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005
    pdf
    (An expanded version was later published in the Machine Learning Journal; pdf)

  • Planning in Models that Combine Memory with Predictive Representations of State
    by Michael R. James and Satinder Singh.
    In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), pages 987-992, 2005.
    pdf

  • Learning Predictive State Representations in Dynamical Systems Without Reset
    by Britton Wolfe, Michael R. James and Satinder Singh.
    In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
    pdf

  • Intrinsically Motivated Learning of Hierarchical Collections of Skills
    by Andrew G. Barto, Satinder Singh, and Nuttapong Chentanez.
    In Proceedings of International Conference on Developmental Learning (ICDL), 2004.
    pdf

  • Predictive State Representations: A New Theory for Modeling Dynamical Systems
    by Satinder Singh, Michael R. James and Matthew R. Rudary.
    In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
    pdf

  • Planning with Predictive State Representations
    by Michael R. James, Satinder Singh and Michael Littman.
    In Proceedings of the International Conference on Machine Learning and Applications (ICMLA), pages 304-311, 2004.
    pdf

  • Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
    by Michael James and Satinder Singh.
    In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 417-424, 2004.
    pdf

  • Adaptive Cognitive Orthotics: Combining Reinforcement Learning and Constraint-Based Temporal Reasoning
    by Matthew Rudary, Satinder Singh and Martha Pollack.
    In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 719-726, 2004.
    pdf

  • A Nonlinear Predictive State Representation
    by Matthew Rudary and Satinder Singh.
    In Advances in Neural Information Processing Systems 16 (NIPS), pages 855-862, 2004.
    pdf

  • Learning Predictive State Representations
    by Satinder Singh, Michael Littman, Nicholas Jong, David Pardoe and Peter Stone.
    In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 712-719, 2003.
    pdf

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System
    by Satinder Singh, Diane Litman, Michael Kearns and Marilyn Walker.
    In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002.
    pdf

  • CobotDS: A Spoken Dialogue System for Chat
    by Michael Kearns, Charles Isbell, Satinder Singh, Diane Litman, and J. Howe.
    In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pages 435-430, 2002.
    pdf

  • Near-Optimal Reinforcement Learning in Polynomial Time
    by Michael Kearns and Satinder Singh.
    In Machine Learning journal, Volume 49, Issue 2, pages 209-232, 2002.
    ( shorter version appears in ICML 1998).
    pdf

  • Predictive Representations of State
    by Michael Littman, Richard Sutton and Satinder Singh.
    In Advances in Neural Information Processing Systems 14 (NIPS), pages 1555-1561, 2002.
    pdf

  • Cobot: A Social Reinforcement Learning Agent
    by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
    In Advances in Neural Information Processing Systems 14 (NIPS) pages 1393-1400, 2002.
    pdf

  • A Social Reinforcement Learning Agent
    by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
    In Proceedings of the Fifth International Conference on Autonomous Agents (AGENTS), pages 377-384, 2001.
    Winner of Best Paper Award.
    pdf

  • Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System
    by Satinder Singh, Michael Kearns, Diane Litman, and Marilyn Walker.
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 645-651, 2000.
    pdf

  • Automatic Optimization of Dialogue Management
    by Diane Litman, Michael Kearns, Satinder Singh and Marilyn Walker.
    In Proceedings of the 18th International Conference on Computational Linguistics (COLING), pages 502-508, 2000.
    pdf

  • Eligibility Traces for Off-Policy Policy Evaluation
    by Doina Precup, Richard Sutton, and Satinder Singh.
    In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 759-766, 2000.
    pdf

  • "Bias-Variance" Error Bounds for Temporal Difference Updates
    by Michael Kearns and Satinder Singh.
    In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT), pages 142-147, 2000.
    pdf

  • Reinforcement Learning for Spoken Dialogue Systems
    by Satinder Singh, Michael Kearns, Diane Litman and Marilyn Walker.
    In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
    pdf

  • Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
    by Satinder Singh, Tommi Jaakkola, Michael Littman, and Csaba Szpesvari.
    In Machine Learning Journal, vol 38(3), pages 287-308, 2000.
    pdf

  • Policy Gradient Methods for Reinforcement Learning with Function Approximation
    by Richard Sutton, Dave McAllester, Satinder Singh and Yishay Mansour.
    In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
    pdf

  • Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
    by Richard Sutton, Doina Precup and Satinder Singh.
    In Artificial Intelligence Journal, Volume 112, pages 181-211, 1999.
    pdf

  • Approximate Planning for Factored POMDPs using Belief State Simplification
    by Dave McAllester and Satinder Singh.
    In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 409-416, 1999.
    pdf

  • On the Complexity of Policy Iteration
    by Yishay Mansour and Satinder Singh.
    In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 401-408, 1999.
    pdf

  • Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
    by Michael Kearns and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 996-1002, 1999.
    pdf

  • Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
    by John K. Williams and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 1073-1079, 1999.
    pdf

  • Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning
    by Timothy Brown, Hong Tong, and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 982-988, 1999.
    pdf

  • Improved switching among temporally abstract actions
    by Richard Sutton, Satinder Singh, Doina Precup and Balaraman Ravindran.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 1066-1072, 1999.
    pdf

  • Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
    by John Loch and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 323-331, 1998.
    pdf

  • Near-Optimal Reinforcement Learning in Polynomial Time
    by Michael Kearns and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 260-268, 1998.
    pdf

  • Intra-Option Learning about Temporally Abstract Actions
    by Richard Sutton, Doina Precup and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 556-564, 1998.
    pdf

  • Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors
    by Doina Precup, Richard Sutton, and Satinder Singh.
    In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 382-393. 1998.
    pdf

  • Analytical Mean Squared Error Curves for Temporal Difference Learning
    by Satinder Singh and Peter Dayan.
    In Machine Learning Journal, Volume 32, Issue 1, pages 5-40, 1998.
    pdf.
    A shorter version appears in the NIPS 9 Proceedings

  • Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
    by Satinder Singh and Dimitri Bertsekas.
    In Advances in Neural Information Processing Systems 9 (NIPS), pages 974-980, 1997.
    pdf

  • Analytical Mean Squared Error Curves for Temporal Difference Learning
    by Satinder Singh and Peter Dayan.
    In Advances in Neural Information Processing Systems 9 (NIPS), pages 1054-1060, 1997.
    pdf

  • Reinforcement Learning with Replacing Eligibility Traces
    by Satinder Singh and Richard Sutton.
    In Machine Learning journal, Volume 22, Issue 1, pages 123-158, 1996.
    pdf abstract

  • Learning Curve Bounds for Markov Decision Processes with Undiscounted Rewards
    by Lawrence Saul and Satinder Singh.
    In Proceedings of 9th Annual Conference on Computational Learning Theory (COLT), pages 147-156, 1996.
    pdf

  • Long Term Potentiation, Navigation and Dynamic Programming
    by Peter Dayan and Satinder Singh.
    In Proceedings of Computation and Neural Systems Meeting (CNS) 1996.
    pdf

  • Improving Policies Without Measuring Merits
    by Peter Dayan and Satinder Singh.
    In Advances in Neural Information Processing Systems 8 (NIPS), pages 1059-1065, 1996.
    pdf

  • Markov Decision Processes in Large State Spaces
    by Lawrence Saul and Satinder Singh.
    In Proceedings of 8th Annual Workshop on Computational Learning Theory (COLT), pages 281-288, 1995.
    pdf

  • Learning to Act using Real-Time Dynamic Programming
    by Andrew Barto, Steve Bradtke and Satinder Singh.
    In Artificial Intelligence, Volume 72, pages 81-138, 1995.
    pdf

  • On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
    by Tommi Jaakkola, Michael Jordan and Satinder Singh.
    In Neural Computation, Volume 6, Number 6, pages 1185-1201, 1994.
    pdf

  • Reinforcement Learning With Soft State Aggregation
    by Satinder Singh, Tommi Jaakkola and Michael Jordan.
    In Advances in Neural Information Processing Systems 7 (NIPS), pages 361-368, 1995.
    pdf

  • Stochastic Convergence of Iterative DP Algorithms
    by Tommi Jaakkola, Michael Jordan and Satinder Singh.
    In Advances in Neural Information Processing Systems 6 (NIPS), pages 703-710, 1994.
    pdf

  • Reinforcement Learning Algorithm for Partially Observable Markov Problems
    by Tommi Jaakkola, Satinder Singh and Michael Jordan.
    In Advances in Neural Information Processing Systems 7 (NIPS), pages 345-352, 1995.
    pdf

  • Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
    by Satinder Singh.
    In Proceedings of the Twelth National Conference on Artificial Intelligence (AAAI), pages 700-705, 1994.
    pdf

  • Learning Without State-Estimation in Partially Observable Markovian Decision Processes
    by Satinder Singh, Tommi Jaakkola and Michael Jordan.
    In Machine Learning: Proceedings of the Eleventh International Conference (ICML), pages 284-292, 1994.
    pdf

  • Robust Reinforcement Learning in Motion Planning
    by Satinder Singh, Andrew Barto, Roderic Grupen, and Christopher Connolly.
    In Advances in Neural Information Processing Systems 6 (NIPS), pages 655-662, 1994.
    pdf

  • An Upper Bound on the Loss from Approximate Optimal-Value Functions
    by Satinder Singh and Richard Yee.
    In Machine Learning, Volume 16, Issue 3, pages 227-233, 1994.
    pdf

  • Reinforcement Learning with a Hierarchy of Abstract Models
    by Satinder Singh.
    In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI), pages 202-207, 1992.
    pdf

  • Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models
    by Satinder Singh.
    In Proceedings of the Ninth Machine Learning Conference, pages 406-415, 1992.
    pdf

  • Transfer of Learning by Composing Solutions of Elemental Sequential Tasks
    by Satinder Singh.
    In Machine Learning Journal, Volume 8, Issue 3, pages 323-339, 1992.
    pdf

  • The Efficient Learning of Multiple Task Sequences
    by Satinder Singh.
    In Advances in Neural Information Processing Systems 4 (NIPS), pages 251-258, 1992.
    pdf

  • Transfer of Learning Across Compositions of Sequential Tasks
    by Satinder Singh.
    In Machine Learning: Proceedings of the Eighth International Workshop, pages 348-352, 1991.
    pdf

    Refereed Workshop Papers

  • Hierarchical Optimal Control of MDPs
    by Amy McGovern, Doina Precup, Balaraman Ravindran, Satinder Singh and Richard Sutton.
    In Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, 1998.
    pdf

  • Planning with Closed-Loop Macro Actions
    by Doina Precup, Richard Sutton and Satinder Singh.
    In Proceedings of AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
    pdf

  • On Step-Size and Bias in Temporal-Difference Learning
    by Richard Sutton and Satinder Singh.
    In Proceedings of Eighth Yale Workshop on Adaptive and Learning Systems, 1994.
    pdf abstract

  • Reinforcement Learning and Dynamic Programming
    by Andrew Barto and Satinder Singh.
    In Proceedings of Sixth Yale Workshop on Adaptive and Learning Systems, 1990.

    Magazine Articles, Book Chapters and Others

  • Reinforcement Learning for 3 vs. 2 Keepaway
    by Peter Stone and R. Sutton and Satinder Singh.
    In RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, and G. Kraetszchmar, Eds., Springer Verlag.
    pdf.
    An earlier version appeared in the Proceedings of the RoboCup-2000 Workshop, Melbourne, Australia

  • Soft Dynamic Programming Algorithms: Convergence Proofs
    by Satinder Singh.
    In Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.
    pdf

  • On the Computational Economics of Reinforcement Learning
    by Andrew Barto and Satinder Singh.
    In Proceedings of Connectionist Summer School, 1990.
    pdf

    My one paper in a non-technical journal!

  • How to Make Software Agents Do the Right Thing: An Introduction to Reinforcement Learning
    by Satinder Singh, Peter Norvig and David Cohn.
    In Dr. Dobbs journal, March issue, 1997.

    pdf
    [html version]

    An Almost Tutorial on RL (extracted from my Thesis)

  • An (Almost) Tutorial on Reinforcement Learning
    . gzipped postscript. Extracted from my 1993 thesis

    Going Nowhere Papers

  • Asynchronous Modified Policy Iteration with Single-sided Updates
    . Satinder Singh and Vijay Gullapalli. Working Paper, 1993.
    pdf