Policy-based deep reinforcement learning for sparse reward environment

Myeong Seop Kim, Jung Su Kim

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Sparse reward environment is the main problems encountered by reinforcement learning. When there are many specific tasks that the agent must go through to reach the final goal, the reward signal becomes very sparse in the environment. And this situation makes reinforcement learning less effective. To overcome this, we give the agent an intrinsic reward to induce the agent to explore more. With this reward setting, the agent can continue to search for reward signal and learn another action that is better than the best action which is currently known. In this paper, we describe the implementation of the proposed method and estimate its performance. For the learning algorithm, we use Proximal Policy Optimization(PPO) and train the agent in a distributed environment. The agent is trained to solve the game of Tetris that is a representative sparse reward problem.

Original languageEnglish
Pages (from-to)506-514
Number of pages9
JournalTransactions of the Korean Institute of Electrical Engineers
Volume70
Issue number3
DOIs
StatePublished - Mar 2021

Keywords

  • Reinforcement learning
  • Sparse reward problem

Fingerprint

Dive into the research topics of 'Policy-based deep reinforcement learning for sparse reward environment'. Together they form a unique fingerprint.

Cite this