Reinforcement Learning reinforcement learning Satinder Singh's Home Page

Reinforcement Learning Publications

Improving Predictive State Representations via Gradient Descent.
by Nan Jiang, Alex Kulesza, and Satinder Singh.
In Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.
pdf.

Action-Conditional Video Prediction Using Deep Networks in ATARI Games.
by Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
In Neural Information Processing Systems, 2015.
online videos
arxiv pdf, NIPS pdf, NIPS Appendix pdf.

Abstraction Selection in Model-Based Reinforcement Learning.
by Nan Jiang, Alex Kulesza, and Satinder Singh.
In 32nd International Conference on Machine Learning (ICML), 2015.
pdf.

The Dependence of Effective Planning Horizon on Model Accuracy.
by Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis.
In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2015.
Best Paper Award
pdf.

Low-Ranks Spectral Learning with Weighted Loss Functions.
by Alex Kulesza, Nan Jiang, and Satinder Singh.
In Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
pdf.

Spectral Learning of Predictive State Representations with Insufficient Statistics.
by Alex Kulesza, Nan Jiang, and Satinder Singh.
In Twenty-Ninth AAAI Conference, 2015.
pdf.

Optimal Rewards for Cooperative Agents.
by Bingyao Liu, Satinder Singh, Richard Lewis, and Shiyin Qin.
In IEEE Transactions on Autonomous Mental Development, Vol 6, Issue 4, 2014.
pdf.

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.
by Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang.
In Neural Information Processing Systems (NIPS), 2014.
pdf.

Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory.
by Michael Shvartsman, Richard L Lewis, and Satinder Singh.
In Cognitive Modeling and Computational Linguistics (CMCL), 2014.
pdf.

The Potential Impact of Intelligent Systems for Mobile Health Self-Management Support: Monte-Carlo Simulations of Text Message Support for Medication Adherence.
by John Piette, Karen Farris, Sean Newman, Larry An, Jeremy Sussman, and Satinder Singh.
In Annals of Behavioral Medicine, 2014.
pdf.

Low-Rank Spectral Learning.
by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
In Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
pdf.

Improving UCT Planning via Approximate Homomorphisms.
by Nan Jiang, Satinder Singh, and Richard Lewis.
In 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2014.
pdf.

Utility Maximization and Bounds on Human Information Processing.
by Andrew Howes, Richard L Lewis, and Satinder Singh.
In Topics in Cognitive Science, Volume 6, Issue 2, pages 198-203, 2014.
pdf.

Computational Rationality: Linking Mechanism and Behavior Through Utility Maximization.
by Richard L Lewis, Andrew Howes, and Satinder Singh.
In Topics in Cognitive Science, Volume 6, Issue 2, pages 279-311, 2014.
pdf.

Reward Mapping for Transfer in Long-Lived Agents.
by Xiaoxiao Guo, Satinder Singh, and Richard L Lewis.
In Advances in Neural Information Processing Systems (NIPS), 26, 2013.
pdf.

The adaptive nature of eye-movements in linguistic tasks: How payoff and architecture shape speed-accuracy tradeoffs.
by Richard L Lewis, Michael Shvartsman, and Satinder Singh.
In Topics in Cognitive Science, Vol. 5, Issue 3, pages 581-610, 2013.
pdf.

Linking Context to Evaluation in the Design of Safety Critical Interfaces.
by Michael Feary, Dorritt Billman, Xiuli Chen, Andrew Howes, Richard Lewis, Lance Sherry, and Satinder Singh.
In Proceedings of Human-Computer Interaction International, 2013.
pdf.

An Exploration of Low-Rank Spectral Learning.
by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
In ICML Workshop on Spectral Learning, 2013.
pdf.

Optimal Rewards in Multiagent Teams
by Bingyao Liu, Satinder Singh, Richard L. Lewis, and Syiyin Qin
In International Conference on Development and Learning-EpiRob, 2012.
pdf.

Strong Mitigation: Nesting Search for Good Policies within Search for Good Reward
by Jeshua Bratman, Satinder Singh, Richard Lewis, and Jonathan Sorg.
In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
pdf.

Planning Delayed-Response Queries and Transient Policies under Reward Uncertainty
by Rob Cohn, Edmund Durfee and Satinder Singh.
In Proceedings of the Seventh Annual Workshop on Multiagent Sequential Decision-Making Under Uncertainty (MSDM), held in conjunction with AAMAS, 2012.
pdf.

Planning and Evaluating Multiagent Influences Under Reward Uncertainty (Extended Abstract)
by Stefan Witwicki, Inn-Tung Chen, Edmund Durfee and Satinder Singh.
In 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
pdf.

Learning to Make Predictions in Partially Observable Environments without a Generative Model
by Erik Talvitie and Satinder Singh.
In Journal of Artificial Intelligence Research, vol 42, pages 353-392, 2011.
pdf.

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents
by Jonathan Sorg, Satinder Singh, and Richard Lewis.
In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI), 2011.
pdf.

Reward Design via Online Gradient Ascent
by Jonathan Sorg, Satinder Singh, and Richard Lewis.
In Neural Information Processing Systems (NIPS), 2010.
pdf.

Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective
by Satinder Singh, Richard Lewis, Andrew Barto, and Jonathan Sorg.
In IEEE Transactions on Autonomous Mental Development, Vol 2, No 2, 2010.
pdf

Modeling Multiple-mode Systems with Predictive State Representations
by Britton Wolfe, Michael James and Satinder Singh.
In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, 2010.
pdf

Variance-Based Rewards for Approximate Bayesian Reinforcement Learning
by Jonathan Sorg, Satinder Singh, and Richard Lewis.
In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
pdf

Internal Rewards Mitigate Agent Boundedness
by Jonathan Sorg, Satinder Singh, and Richard Lewis.
In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.
pdf

Linear Options
by Jonathan Sorg and Satinder Singh.
In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010.
(Finalist for Pragnesh Jay Modi Best Student Paper Award)
pdf

Transfer via Soft Homomorphisms
by Jonathan Sorg and Satinder Singh.
In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
pdf

SarsaLandmark: an Algorithm for Learning in POMDPs with Landmarks
by Michael R. James and Satinder Singh.
In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
pdf

Where Do Rewards Come From?
by Satinder Singh, Richard L. Lewis and Andrew G. Barto.
In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci), 2009.
pdf

Maintaining Predictions Over Time Without a Model
by Erik Talvitie and Satinder Singh.
In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009.
pdf

Simple Local Models for Complex Dynamical Systems
by Erik Talvitie and Satinder Singh.
In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), 2008.
pdf

Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State
by David Wingate and Satinder Singh.
In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 1176-1183, 2008.
pdf

Building Incomplete but Accurate Models
by Erik Talvitie, Britton Wolfe and Satinder Singh.
In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
pdf

Predictive Linear-Gaussian Models of Stochastic Dynamical Systems with Vector-Value Actions and Observations
by Matthew Rudary and Satinder Singh.
In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
pdf

Approximate Predictive State Representations
by Britton Wolfe, Michael R. James and Satinder Singh.
In Procedings of the 2008 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2008.
(Finalist for Pragnesh Jay Modi Best Student Paper Award)
pdf

On Discovery and Learning of Models with Predictive State Representations of State for Agents with Continuous Actions and Observations
by David Wingate and Satinder Singh.
In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
pdf

Relational Knowledge with Predictive State Representations
by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh.
In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2035-2040, 2007.
pdf

An Experts Algorithm for Transfer Learning
by Erik Talvitie and Satinder Singh.
In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 1065-1070, 2007.
pdf

Exponential Family Predictive Representations of State
by David Wnigate and Satinder Singh.
In Proceedings of the Advances in Neural Information Processing Systems, 20 (NIPS), pages 1617-1624, 2007.
pdf

Cobot in LambdaMOO: An Adaptive Social Statistics Agent
by Charles Isbell, Michael Kearns, Satinder Singh, Christian Shelton, Peter Stone and Dave Kormann.
In Journal of Autonomous Agents and Multi-Agent Systems, 13(3), pages 327-354, 2006.
pdf

Mixtures of Predictive Linear Gaussian Models for Nonlinear Stochastic Dynamical Systems
by David Wingate and Satinder Singh.
In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
pdf

Using Homomorphisms to Transfer Options Across Reinforcement Learning Domains
by Vishal Soni and Satinder Singh.
In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
pdf

Kernel Predictive Linear-Gaussian Models for Nonlinear Stochastic Dynamical Systems
by David Wingate and Satinder Singh.
In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1017-1024, 2006.
pdf

Predictive linear-Gaussian models of controlled stochastic dynamical systems
by Matthew Rudary and Satinder Singh.
In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 777-784, 2006.
pdf

Predictive State Representations with Options
by Britton Wolfe and Satinder Singh.
In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1025-1032, 2006.
pdf

Optimal Coordinated Planning Amongst Self-Interested Agents with Private State
by Ruggiero Cavallo, David C. Parkes and Satinder Singh.
In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
pdf

Reinforcement Learning of Hierarchical Skills on the Sony Aibo Robot
by Vishal Soni and Satinder Singh.
In Proceedings of the 5th International Conference on Development and Learning (ICDL), 2006.
pdf

Off-policy Learning with Options and Recognizers
by Doina Precup, Richard Sutton, Cosmin Paduraru, Anna Koop and Satinder Singh.
In Proceedings of Advances in Neural Information Processing Systems 18 (NIPS), pages 1097-1104, 2006.
pdf

Intrinsically Motivated Reinforcement Learning
by Satinder Singh, Andrew G. Barto and Nuttapong Chentanez.
In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1281-1288, 2005.
pdf

Approximately Efficient Online Mechanism Design
by David Parkes, Satinder Singh and Dimah Yanovsky.
In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1049-1056, 2005.
pdf

Predictive linear-Gaussian models of stochastic dynamical systems
by Matthew Rudary, Satinder Singh and David Wingate.
In Proceedings of the Uncertainty in Artificial Intelligence (UAI), pages 501-508, 2005.
pdf

Combining Memory and Landmarks with Predictive State Representations
by Michael R. James, Britton Wolfe and Satinder Singh.
In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005.
pdf

Learning Payoff Functions in Infinite Games
by Yevgeniy Vorobeychik, Michael Wellman and Satinder Singh.
In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005
pdf
(An expanded version was later published in the Machine Learning Journal; pdf)

Planning in Models that Combine Memory with Predictive Representations of State
by Michael R. James and Satinder Singh.
In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), pages 987-992, 2005.
pdf

Learning Predictive State Representations in Dynamical Systems Without Reset
by Britton Wolfe, Michael R. James and Satinder Singh.
In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
pdf

Intrinsically Motivated Learning of Hierarchical Collections of Skills
by Andrew G. Barto, Satinder Singh, and Nuttapong Chentanez.
In Proceedings of International Conference on Developmental Learning (ICDL), 2004.
pdf

Predictive State Representations: A New Theory for Modeling Dynamical Systems
by Satinder Singh, Michael R. James and Matthew R. Rudary.
In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
pdf

Planning with Predictive State Representations
by Michael R. James, Satinder Singh and Michael Littman.
In Proceedings of the International Conference on Machine Learning and Applications (ICMLA), pages 304-311, 2004.
pdf

Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
by Michael James and Satinder Singh.
In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 417-424, 2004.
pdf

Adaptive Cognitive Orthotics: Combining Reinforcement Learning and Constraint-Based Temporal Reasoning
by Matthew Rudary, Satinder Singh and Martha Pollack.
In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 719-726, 2004.
pdf

A Nonlinear Predictive State Representation
by Matthew Rudary and Satinder Singh.
In Advances in Neural Information Processing Systems 16 (NIPS), pages 855-862, 2004.
pdf

Learning Predictive State Representations
by Satinder Singh, Michael Littman, Nicholas Jong, David Pardoe and Peter Stone.
In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 712-719, 2003.
pdf

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System
by Satinder Singh, Diane Litman, Michael Kearns and Marilyn Walker.
In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002.
pdf

CobotDS: A Spoken Dialogue System for Chat
by Michael Kearns, Charles Isbell, Satinder Singh, Diane Litman, and J. Howe.
In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pages 435-430, 2002.
pdf

Near-Optimal Reinforcement Learning in Polynomial Time
by Michael Kearns and Satinder Singh.
In Machine Learning journal, Volume 49, Issue 2, pages 209-232, 2002.
( shorter version appears in ICML 1998).
pdf

Predictive Representations of State
by Michael Littman, Richard Sutton and Satinder Singh.
In Advances in Neural Information Processing Systems 14 (NIPS), pages 1555-1561, 2002.
pdf

Cobot: A Social Reinforcement Learning Agent
by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
In Advances in Neural Information Processing Systems 14 (NIPS) pages 1393-1400, 2002.
pdf

A Social Reinforcement Learning Agent
by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
In Proceedings of the Fifth International Conference on Autonomous Agents (AGENTS), pages 377-384, 2001.
Winner of Best Paper Award.
pdf

Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System
by Satinder Singh, Michael Kearns, Diane Litman, and Marilyn Walker.
In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 645-651, 2000.
pdf

Automatic Optimization of Dialogue Management
by Diane Litman, Michael Kearns, Satinder Singh and Marilyn Walker.
In Proceedings of the 18th International Conference on Computational Linguistics (COLING), pages 502-508, 2000.
pdf

Eligibility Traces for Off-Policy Policy Evaluation
by Doina Precup, Richard Sutton, and Satinder Singh.
In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 759-766, 2000.
pdf

"Bias-Variance" Error Bounds for Temporal Difference Updates
by Michael Kearns and Satinder Singh.
In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT), pages 142-147, 2000.
pdf

Reinforcement Learning for Spoken Dialogue Systems
by Satinder Singh, Michael Kearns, Diane Litman and Marilyn Walker.
In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
pdf

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
by Satinder Singh, Tommi Jaakkola, Michael Littman, and Csaba Szpesvari.
In Machine Learning Journal, vol 38(3), pages 287-308, 2000.
pdf

Policy Gradient Methods for Reinforcement Learning with Function Approximation
by Richard Sutton, Dave McAllester, Satinder Singh and Yishay Mansour.
In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
pdf

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
by Richard Sutton, Doina Precup and Satinder Singh.
In Artificial Intelligence Journal, Volume 112, pages 181-211, 1999.
pdf

Approximate Planning for Factored POMDPs using Belief State Simplification
by Dave McAllester and Satinder Singh.
In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 409-416, 1999.
pdf

On the Complexity of Policy Iteration
by Yishay Mansour and Satinder Singh.
In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 401-408, 1999.
pdf

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
by Michael Kearns and Satinder Singh.
In Advances in Neural Information Processing Systems 11 (NIPS), pages 996-1002, 1999.
pdf

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
by John K. Williams and Satinder Singh.
In Advances in Neural Information Processing Systems 11 (NIPS), pages 1073-1079, 1999.
pdf

Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning
by Timothy Brown, Hong Tong, and Satinder Singh.
In Advances in Neural Information Processing Systems 11 (NIPS), pages 982-988, 1999.
pdf

Improved switching among temporally abstract actions
by Richard Sutton, Satinder Singh, Doina Precup and Balaraman Ravindran.
In Advances in Neural Information Processing Systems 11 (NIPS), pages 1066-1072, 1999.
pdf

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
by John Loch and Satinder Singh.
In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 323-331, 1998.
pdf

Near-Optimal Reinforcement Learning in Polynomial Time
by Michael Kearns and Satinder Singh.
In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 260-268, 1998.
pdf

Intra-Option Learning about Temporally Abstract Actions
by Richard Sutton, Doina Precup and Satinder Singh.
In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 556-564, 1998.
pdf

Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors
by Doina Precup, Richard Sutton, and Satinder Singh.
In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 382-393. 1998.
pdf

Analytical Mean Squared Error Curves for Temporal Difference Learning
by Satinder Singh and Peter Dayan.
In Machine Learning Journal, Volume 32, Issue 1, pages 5-40, 1998.
pdf.
A shorter version appears in the NIPS 9 Proceedings

Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
by Satinder Singh and Dimitri Bertsekas.
In Advances in Neural Information Processing Systems 9 (NIPS), pages 974-980, 1997.
pdf

Analytical Mean Squared Error Curves for Temporal Difference Learning
by Satinder Singh and Peter Dayan.
In Advances in Neural Information Processing Systems 9 (NIPS), pages 1054-1060, 1997.
pdf

Reinforcement Learning with Replacing Eligibility Traces
by Satinder Singh and Richard Sutton.
In Machine Learning journal, Volume 22, Issue 1, pages 123-158, 1996.
pdf abstract

Learning Curve Bounds for Markov Decision Processes with Undiscounted Rewards
by Lawrence Saul and Satinder Singh.
In Proceedings of 9th Annual Conference on Computational Learning Theory (COLT), pages 147-156, 1996.
pdf

Long Term Potentiation, Navigation and Dynamic Programming
by Peter Dayan and Satinder Singh.
In Proceedings of Computation and Neural Systems Meeting (CNS) 1996.
pdf

Improving Policies Without Measuring Merits
by Peter Dayan and Satinder Singh.
In Advances in Neural Information Processing Systems 8 (NIPS), pages 1059-1065, 1996.
pdf

Markov Decision Processes in Large State Spaces
by Lawrence Saul and Satinder Singh.
In Proceedings of 8th Annual Workshop on Computational Learning Theory (COLT), pages 281-288, 1995.
pdf

Learning to Act using Real-Time Dynamic Programming
by Andrew Barto, Steve Bradtke and Satinder Singh.
In Artificial Intelligence, Volume 72, pages 81-138, 1995.
pdf

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
by Tommi Jaakkola, Michael Jordan and Satinder Singh.
In Neural Computation, Volume 6, Number 6, pages 1185-1201, 1994.
pdf

Reinforcement Learning With Soft State Aggregation
by Satinder Singh, Tommi Jaakkola and Michael Jordan.
In Advances in Neural Information Processing Systems 7 (NIPS), pages 361-368, 1995.
pdf

Stochastic Convergence of Iterative DP Algorithms
by Tommi Jaakkola, Michael Jordan and Satinder Singh.
In Advances in Neural Information Processing Systems 6 (NIPS), pages 703-710, 1994.
pdf

Reinforcement Learning Algorithm for Partially Observable Markov Problems
by Tommi Jaakkola, Satinder Singh and Michael Jordan.
In Advances in Neural Information Processing Systems 7 (NIPS), pages 345-352, 1995.
pdf

Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
by Satinder Singh.
In Proceedings of the Twelth National Conference on Artificial Intelligence (AAAI), pages 700-705, 1994.
pdf

Learning Without State-Estimation in Partially Observable Markovian Decision Processes
by Satinder Singh, Tommi Jaakkola and Michael Jordan.
In Machine Learning: Proceedings of the Eleventh International Conference (ICML), pages 284-292, 1994.
pdf

Robust Reinforcement Learning in Motion Planning
by Satinder Singh, Andrew Barto, Roderic Grupen, and Christopher Connolly.
In Advances in Neural Information Processing Systems 6 (NIPS), pages 655-662, 1994.
pdf

An Upper Bound on the Loss from Approximate Optimal-Value Functions
by Satinder Singh and Richard Yee.
In Machine Learning, Volume 16, Issue 3, pages 227-233, 1994.
pdf

Reinforcement Learning with a Hierarchy of Abstract Models
by Satinder Singh.
In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI), pages 202-207, 1992.
pdf

Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models
by Satinder Singh.
In Proceedings of the Ninth Machine Learning Conference, pages 406-415, 1992.
pdf

Transfer of Learning by Composing Solutions of Elemental Sequential Tasks
by Satinder Singh.
In Machine Learning Journal, Volume 8, Issue 3, pages 323-339, 1992.
pdf

The Efficient Learning of Multiple Task Sequences
by Satinder Singh.
In Advances in Neural Information Processing Systems 4 (NIPS), pages 251-258, 1992.
pdf

Transfer of Learning Across Compositions of Sequential Tasks
by Satinder Singh.
In Machine Learning: Proceedings of the Eighth International Workshop, pages 348-352, 1991.
pdf

Refereed Workshop Papers

Hierarchical Optimal Control of MDPs
by Amy McGovern, Doina Precup, Balaraman Ravindran, Satinder Singh and Richard Sutton.
In Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, 1998.
pdf

Planning with Closed-Loop Macro Actions
by Doina Precup, Richard Sutton and Satinder Singh.
In Proceedings of AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
pdf

On Step-Size and Bias in Temporal-Difference Learning
by Richard Sutton and Satinder Singh.
In Proceedings of Eighth Yale Workshop on Adaptive and Learning Systems, 1994.
pdf abstract

Reinforcement Learning and Dynamic Programming
by Andrew Barto and Satinder Singh.
In Proceedings of Sixth Yale Workshop on Adaptive and Learning Systems, 1990.

Magazine Articles, Book Chapters and Others

Reinforcement Learning for 3 vs. 2 Keepaway
by Peter Stone and R. Sutton and Satinder Singh.
In RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, and G. Kraetszchmar, Eds., Springer Verlag.
pdf.
An earlier version appeared in the Proceedings of the RoboCup-2000 Workshop, Melbourne, Australia

Soft Dynamic Programming Algorithms: Convergence Proofs
by Satinder Singh.
In Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.
pdf

On the Computational Economics of Reinforcement Learning
by Andrew Barto and Satinder Singh.
In Proceedings of Connectionist Summer School, 1990.
pdf

My one paper in a non-technical journal!

How to Make Software Agents Do the Right Thing: An Introduction to Reinforcement Learning
by Satinder Singh, Peter Norvig and David Cohn.
In Dr. Dobbs journal, March issue, 1997.

pdf [html version]

An Almost Tutorial on RL (extracted from my Thesis)

An (Almost) Tutorial on Reinforcement Learning
. gzipped postscript. Extracted from my 1993 thesis

Going Nowhere Papers

Asynchronous Modified Policy Iteration with Single-sided Updates
. Satinder Singh and Vijay Gullapalli. Working Paper, 1993.
pdf