TITLE:
A Universal Scheme for Learning
SPEAKER: Ciamac Moallemi (Stanford University)
DATE: 3:00 - 4:00 P.M., Thursday August 18, 2005
LOCATION: Tahoe, 3U (PA)
ABSTRACT:
Consider an agent interacting with an unmodeled environment. At each time, the
agent makes an observation, takes an action, and incurs a cost. Its actions can
influence future observations and costs. The goal is to choose actions so as to
minimize the long-run average cost. We propose an algorithm based on ideas
from the Lempel-Ziv scheme for universal data compression combined with dynamic
programming. We establish that, if there exists an integer $K$ such that the
future is conditionally independent of the past given a window of $K$
consecutive actions and observations, average cost under this algorithm
converges to the optimum.
Joint work with Vivek Farias, Ben Van Roy, and Tsachy Weissman.
|
|
|