Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning

Sofanit Wubeshet Beyene, Ji Hyeong Han

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to be applied directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft actor-critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task. We propose a prioritized hindsight with dual experience replay to improve the data storage and sampling technique, which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separates the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both a single-task setting and multi-task setting.

Original languageEnglish
Article number4192
JournalElectronics (Switzerland)
Volume11
Issue number24
DOIs
StatePublished - Dec 2022

Keywords

  • deep reinforcement learning
  • experience replay
  • meta-reinforcement learning
  • robot manipulator
  • transfer learning

Fingerprint

Dive into the research topics of 'Prioritized Hindsight with Dual Buffer for Meta-Reinforcement Learning'. Together they form a unique fingerprint.

Cite this