Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)2922-2933
Seitenumfang12
FachzeitschriftIEEE Transactions on Automatic Control
Jahrgang68
Ausgabenummer5
PublikationsstatusVeröffentlicht - 10 Jan. 2023

Abstract

This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.

ASJC Scopus Sachgebiete

Zitieren

Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems. / Lopez, Victor G.; Alsalti, Mohammad; Muller, Matthias A.
in: IEEE Transactions on Automatic Control, Jahrgang 68, Nr. 5, 10.01.2023, S. 2922-2933.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Lopez VG, Alsalti M, Muller MA. Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems. IEEE Transactions on Automatic Control. 2023 Jan 10;68(5):2922-2933. doi: 10.1109/TAC.2023.3235967, 10.1109/TAC.2023.3235967
Download
@article{8facc18bf93b422dae746d175a60ea80,
title = "Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems",
abstract = "This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.",
keywords = "Convergence, Data models, Data-based control, Heuristic algorithms, Linear systems, optimal control, Prediction algorithms, Q-learning, reinforcement learning, Trajectory, reinforcement learning (RL)",
author = "Lopez, {Victor G.} and Mohammad Alsalti and Muller, {Matthias A.}",
year = "2023",
month = jan,
day = "10",
doi = "10.1109/TAC.2023.3235967",
language = "English",
volume = "68",
pages = "2922--2933",
journal = "IEEE Transactions on Automatic Control",
issn = "0018-9286",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

Download

TY - JOUR

T1 - Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems

AU - Lopez, Victor G.

AU - Alsalti, Mohammad

AU - Muller, Matthias A.

PY - 2023/1/10

Y1 - 2023/1/10

N2 - This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.

AB - This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.

KW - Convergence

KW - Data models

KW - Data-based control

KW - Heuristic algorithms

KW - Linear systems

KW - optimal control

KW - Prediction algorithms

KW - Q-learning

KW - reinforcement learning

KW - Trajectory

KW - reinforcement learning (RL)

UR - http://www.scopus.com/inward/record.url?scp=85147441001&partnerID=8YFLogxK

U2 - 10.1109/TAC.2023.3235967

DO - 10.1109/TAC.2023.3235967

M3 - Article

AN - SCOPUS:85147441001

VL - 68

SP - 2922

EP - 2933

JO - IEEE Transactions on Automatic Control

JF - IEEE Transactions on Automatic Control

SN - 0018-9286

IS - 5

ER -

Von denselben Autoren