An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publication2023 62nd IEEE Conference on Decision and Control, CDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages13-19
Number of pages7
ISBN (electronic)9798350301243
Publication statusPublished - 2023
Event62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore
Duration: 13 Dec 202315 Dec 2023

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (electronic)2576-2370

Abstract

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

ASJC Scopus subject areas

Cite this

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. / Lopez, Victor G.; Müller, Matthias A.
2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 13-19 (Proceedings of the IEEE Conference on Decision and Control).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Lopez, VG & Müller, MA 2023, An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. in 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Proceedings of the IEEE Conference on Decision and Control, Institute of Electrical and Electronics Engineers Inc., pp. 13-19, 62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapore, 13 Dec 2023. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256
Lopez, V. G., & Müller, M. A. (2023). An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023 (pp. 13-19). (Proceedings of the IEEE Conference on Decision and Control). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arXiv.2303.17819, https://doi.org/10.1109/CDC49753.2023.10384256
Lopez VG, Müller MA. An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. In 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 13-19. (Proceedings of the IEEE Conference on Decision and Control). doi: 10.48550/arXiv.2303.17819, 10.1109/CDC49753.2023.10384256
Lopez, Victor G. ; Müller, Matthias A. / An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem. 2023 62nd IEEE Conference on Decision and Control, CDC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 13-19 (Proceedings of the IEEE Conference on Decision and Control).
Download
@inproceedings{9a6ff5fd88d14fc686e05b4765efb615,
title = "An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem",
abstract = "In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.",
author = "Lopez, {Victor G.} and M{\"u}ller, {Matthias A.}",
year = "2023",
doi = "10.48550/arXiv.2303.17819",
language = "English",
series = "Proceedings of the IEEE Conference on Decision and Control",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "13--19",
booktitle = "2023 62nd IEEE Conference on Decision and Control, CDC 2023",
address = "United States",
note = "62nd IEEE Conference on Decision and Control, CDC 2023 ; Conference date: 13-12-2023 Through 15-12-2023",

}

Download

TY - GEN

T1 - An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

AU - Lopez, Victor G.

AU - Müller, Matthias A.

PY - 2023

Y1 - 2023

N2 - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

AB - In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time linear quadratic regulator (LQR) problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently ex-citing input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. A method to determine an initial stabilizing policy using only measured data is proposed. Finally, the advantages of the proposed method are tested via simulation.

UR - http://www.scopus.com/inward/record.url?scp=85184817776&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2303.17819

DO - 10.48550/arXiv.2303.17819

M3 - Conference contribution

AN - SCOPUS:85184817776

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 13

EP - 19

BT - 2023 62nd IEEE Conference on Decision and Control, CDC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 62nd IEEE Conference on Decision and Control, CDC 2023

Y2 - 13 December 2023 through 15 December 2023

ER -

By the same author(s)