Loading [MathJax]/extensions/tex2jax.js

Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Henrik Müller
  • Lukas Berg
  • Daniel Kudenko

Research Organisations

Details

Original languageEnglish
Number of pages16
JournalNeural Computing and Applications
Early online date10 Jan 2025
Publication statusE-pub ahead of print - 10 Jan 2025

Abstract

Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.

Keywords

    Imperfect knowledge, Plan-based reward shaping, Reinforcement learning, Reward shaping

ASJC Scopus subject areas

Cite this

Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks. / Müller, Henrik; Berg, Lukas; Kudenko, Daniel.
In: Neural Computing and Applications, 10.01.2025.

Research output: Contribution to journalArticleResearchpeer review

Müller H, Berg L, Kudenko D. Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks. Neural Computing and Applications. 2025 Jan 10. Epub 2025 Jan 10. doi: 10.1007/s00521-024-10615-2
Download
@article{1730f4e6dfa44de298053265fbe039d2,
title = "Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks",
abstract = "Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.",
keywords = "Imperfect knowledge, Plan-based reward shaping, Reinforcement learning, Reward shaping",
author = "Henrik M{\"u}ller and Lukas Berg and Daniel Kudenko",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",
year = "2025",
month = jan,
day = "10",
doi = "10.1007/s00521-024-10615-2",
language = "English",
journal = "Neural Computing and Applications",
issn = "0941-0643",
publisher = "Springer London",

}

Download

TY - JOUR

T1 - Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks

AU - Müller, Henrik

AU - Berg, Lukas

AU - Kudenko, Daniel

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/1/10

Y1 - 2025/1/10

N2 - Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.

AB - Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce exponential plan-based reward shaping, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.

KW - Imperfect knowledge

KW - Plan-based reward shaping

KW - Reinforcement learning

KW - Reward shaping

UR - http://www.scopus.com/inward/record.url?scp=85217155760&partnerID=8YFLogxK

U2 - 10.1007/s00521-024-10615-2

DO - 10.1007/s00521-024-10615-2

M3 - Article

AN - SCOPUS:85217155760

JO - Neural Computing and Applications

JF - Neural Computing and Applications

SN - 0941-0643

ER -