Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 2922-2933 |
Seitenumfang | 12 |
Fachzeitschrift | IEEE Transactions on Automatic Control |
Jahrgang | 68 |
Ausgabenummer | 5 |
Publikationsstatus | Veröffentlicht - 10 Jan. 2023 |
Abstract
This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.
ASJC Scopus Sachgebiete
- Ingenieurwesen (insg.)
- Elektrotechnik und Elektronik
- Ingenieurwesen (insg.)
- Steuerungs- und Systemtechnik
- Informatik (insg.)
- Angewandte Informatik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: IEEE Transactions on Automatic Control, Jahrgang 68, Nr. 5, 10.01.2023, S. 2922-2933.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - Efficient Off-Policy Q-Learning for Data-Based Discrete-Time LQR Problems
AU - Lopez, Victor G.
AU - Alsalti, Mohammad
AU - Muller, Matthias A.
PY - 2023/1/10
Y1 - 2023/1/10
N2 - This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.
AB - This paper introduces and analyzes an improved Q-learning algorithm for discrete-time linear time-invariant systems. The proposed method does not require any knowledge of the system dynamics, and it enjoys significant efficiency advantages over other data-based optimal control methods in the literature. This algorithm can be fully executed off-line, as it does not require to apply the current estimate of the optimal input to the system as in on-policy algorithms. It is shown that a persistently exciting input, defined from an easily tested matrix rank condition, guarantees the convergence of the algorithm. A data-based method is proposed to design the initial stabilizing feedback gain that the algorithm requires. Robustness of the algorithm in the presence of noisy measurements is analyzed. We compare the proposed algorithm in simulation to different direct and indirect data-based control design methods.
KW - Convergence
KW - Data models
KW - Data-based control
KW - Heuristic algorithms
KW - Linear systems
KW - optimal control
KW - Prediction algorithms
KW - Q-learning
KW - reinforcement learning
KW - Trajectory
KW - reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=85147441001&partnerID=8YFLogxK
U2 - 10.1109/TAC.2023.3235967
DO - 10.1109/TAC.2023.3235967
M3 - Article
AN - SCOPUS:85147441001
VL - 68
SP - 2922
EP - 2933
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
SN - 0018-9286
IS - 5
ER -