An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Victor G. Lopez; Matthias A. Müller

doi:10.48550/arXiv.2303.17819

Details

Original language	English
Title of host publication	2023 62nd IEEE Conference on Decision and Control, CDC 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	13-19
Number of pages	7
ISBN (electronic)	9798350301243
Publication status	Published - 2023
Event	62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore Duration: 13 Dec 2023 → 15 Dec 2023

Publication series

Name	Proceedings of the IEEE Conference on Decision and Control
ISSN (Print)	0743-1546
ISSN (electronic)	2576-2370

Abstract

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

ASJC Scopus subject areas

Engineering(all)
Control and Systems Engineering
Mathematics(all)
Modelling and Simulation
Mathematics(all)
Control and Optimization

Cite this

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. / Lopez, Victor G.; Müller, Matthias A.
2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Lopez, VG & Müller, MA 2023, An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Proceedings of the IEEE Conference on Decision and Control, Institute of Electrical and Electronics Engineers Inc., pp. 13-19, 62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapore, 13 Dec 2023. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256

Lopez, V. G., & Müller, M. A. (2023). An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023 (pp. 13-19). (Proceedings of the IEEE Conference on Decision and Control). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256

Lopez VG, Müller MA. An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 13-19. (Proceedings of the IEEE Conference on Decision and Control). doi: 10.48550/arXiv.2303.17819, 10.1109/CDC49753.2023.10384256

Lopez, Victor G. ; Müller, Matthias A. / An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Download

@inproceedings{9a6ff5fd88d14fc686e05b4765efb615,

title = "An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem",

abstract = "In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.",

author = "Lopez, {Victor G.} and M{\"u}ller, {Matthias A.}",

year = "2023",

doi = "10.48550/arXiv.2303.17819",

language = "English",

series = "Proceedings of the IEEE Conference on Decision and Control",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "13--19",

booktitle = "2023 62nd IEEE Conference on Decision and Control, CDC 2023",

address = "United States",

note = "62nd IEEE Conference on Decision and Control, CDC 2023 ; Conference date: 13-12-2023 Through 15-12-2023",

}

Download

TY - GEN

T1 - An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

AU - Lopez, Victor G.

AU - Müller, Matthias A.

PY - 2023

Y1 - 2023

N2 - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

AB - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

UR - http://www.scopus.com/inward/record.url?scp=85184817776&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2303.17819

DO - 10.48550/arXiv.2303.17819

M3 - Conference contribution

AN - SCOPUS:85184817776

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 13

EP - 19

BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 62nd IEEE Conference on Decision and Control, CDC 2023

Y2 - 13 December 2023 through 15 December 2023

ER -

Research@Leibniz University

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Authors

Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

Robust Control of Constrained Linear Systems using Online Convex Optimization and a Reference Governor

Directly Deposited Thin-Film Strain Gauges for Force Measurement at Guide Carriages

Distributed MPC for Self-Organized Cooperation of Multi-Agent Systems

Informed Circular Fields: A Global Reactive Obstacle Avoidance Framework for Robotic Manipulator

Distributed Economic MPC with Adaptive Terminal Weights

Robust Control of Constrained Linear Systems using Online Convex Optimization and a Reference Governor

Directly Deposited Thin-Film Strain Gauges for Force Measurement at Guide Carriages

Distributed MPC for Self-Organized Cooperation of Multi-Agent Systems

Informed Circular Fields: A Global Reactive Obstacle Avoidance Framework for Robotic Manipulator

Distributed Economic MPC with Adaptive Terminal Weights

Robust Control of Constrained Linear Systems using Online Convex Optimization and a Reference Governor