|  TITLE: 
A Universal Scheme for Learning 
 SPEAKER: Ciamac Moallemi (Stanford University)
 
 DATE: 3:00 - 4:00 P.M., Thursday August 18, 2005
 
 LOCATION: Tahoe, 3U (PA)
 
 
 ABSTRACT:
 Consider an agent interacting with an unmodeled environment. At each time, the 
agent makes an observation, takes an action, and incurs a cost. Its actions can 
influence future observations and costs. The goal is to choose actions so as to 
minimize the long-run average cost.  We propose an algorithm based on ideas 
from the Lempel-Ziv scheme for universal data compression combined with dynamic 
programming. We establish that, if there exists an integer $K$ such that the 
future is conditionally independent of the past given a window of $K$ 
consecutive actions and observations, average cost under this algorithm 
converges to the optimum.
 
 Joint work with Vivek Farias, Ben Van Roy, and Tsachy Weissman.
 
 
 
 
  |  |  |