Abstract
Sparse reward environment is the main problems encountered by reinforcement learning. When there are many specific tasks that the agent must go through to reach the final goal, the reward signal becomes very sparse in the environment. And this situation makes reinforcement learning less effective. To overcome this, we give the agent an intrinsic reward to induce the agent to explore more. With this reward setting, the agent can continue to search for reward signal and learn another action that is better than the best action which is currently known. In this paper, we describe the implementation of the proposed method and estimate its performance. For the learning algorithm, we use Proximal Policy Optimization(PPO) and train the agent in a distributed environment. The agent is trained to solve the game of Tetris that is a representative sparse reward problem.
| Original language | English |
|---|---|
| Pages (from-to) | 506-514 |
| Number of pages | 9 |
| Journal | Transactions of the Korean Institute of Electrical Engineers |
| Volume | 70 |
| Issue number | 3 |
| DOIs | |
| State | Published - Mar 2021 |
Keywords
- Reinforcement learning
- Sparse reward problem
Fingerprint
Dive into the research topics of 'Policy-based deep reinforcement learning for sparse reward environment'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver