AI Seminar

On Mediating Policies

Erik Talvitie

Tuesday, March 14, 2006
4:00pm - 5:30pm
3725 Beyster Bldg. (Stained-glass Conference Room)

Add to Google Calendar
Snacks provided.

About the Event

Faced with a difficult task, an agent may have access to a number of suggestions about how to behave, which could aid it in doing well quickly. These might come as advice from an external source, or from knowledge gained while solving similar problems. The agent's goal then would be to quickly identify the best source of advice while avoiding, as much as possible, taking bad advice. We introduce a setting in which a mediator agent must choose amongst a set of ``experts'' advising it on actions to take on an unknown and unobserved Markov Decision Process (MDP). We provide an algorithm which, when the experts are stationary policies and the MDP is unichain, will achieve a return that competes favorably with that of each expert in a number of steps polynomial in its mixing time and other natural parameters. We also present empirical results that illustrate the strengths and weaknesses of our algorithm in practice and demonstrate its applicability in two transfer learning domains.

Additional Information

Contact: Bob Marinier

Email: rmarinie@umich.edu

Sponsor(s): AI Lab

Open to: Public