Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret

Yeoneung Kim, Gihun Kim, Jiwhan Park, Insoon Yang

Research output: Contribution to journalConference articlepeer-review

Abstract

We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of O(T). Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an O(T) regret bound without relying on the restrictive assumptions that are often used in the literature.

Original languageEnglish
Pages (from-to)378-391
Number of pages14
JournalProceedings of Machine Learning Research
Volume283
StatePublished - 2025
Event7th Annual Learning for Dynamics and Control Conference, L4DC 2025 - Ann Arbor, United States
Duration: 4 Jun 20256 Jun 2025

Keywords

  • MCMC
  • Online learning
  • linear systems
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret'. Together they form a unique fingerprint.

Cite this