Details
Original language | English |
---|---|
Number of pages | 16 |
Journal | Neural Computing and Applications |
Early online date | 10 Jan 2025 |
Publication status | E-pub ahead of print - 10 Jan 2025 |
Abstract
Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.
Keywords
- Imperfect knowledge, Plan-based reward shaping, Reinforcement learning, Reward shaping
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Neural Computing and Applications, 10.01.2025.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks
AU - Müller, Henrik
AU - Berg, Lukas
AU - Kudenko, Daniel
N1 - Publisher Copyright: © The Author(s) 2025.
PY - 2025/1/10
Y1 - 2025/1/10
N2 - Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.
AB - Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.
KW - Imperfect knowledge
KW - Plan-based reward shaping
KW - Reinforcement learning
KW - Reward shaping
UR - http://www.scopus.com/inward/record.url?scp=85217155760&partnerID=8YFLogxK
U2 - 10.1007/s00521-024-10615-2
DO - 10.1007/s00521-024-10615-2
M3 - Article
AN - SCOPUS:85217155760
JO - Neural Computing and Applications
JF - Neural Computing and Applications
SN - 0941-0643
ER -