DOLCIT Seminar

Tuesday July 9, 2019 4:00 PM

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for TD Learning

Speaker: R. Srikant, University of Illinois at Urbana-Champaign
Location: Annenberg 213

Temporal difference learning is a widely-used algorithm to estimate the value function of an MDP under a given policy. Here, we consider TD learning with linear function approximation and a constant learning rate, and obtain bounds on its finite-time performance. Motivated by these bounds, we will present a heuristic to adapt the learning rate to achieve fast convergence. Joint work with Lei Ying and Harsh Gupta.

Series RSRG/DOLCIT Seminar Series

Contact: Kamyar Azizzadenesheli