TY - JOUR
T1 - Tetris Bot using Deep Reinforcement Learning
AU - Park, Kwan Woo
AU - Kim, Jung Su
N1 - Publisher Copyright:
© ICROS 2022.
PY - 2022
Y1 - 2022
N2 - In this paper, we develop an artificial intelligence Tetris robot that plays the Tetris game autonomously. The Tetris robot consists of a game agent that learns how to play the Tetris game using reinforcement learning, and hardware that plays the actual game. To develop a game agent using deep reinforcement learning, the Markov decision process was defined and a policy-based deep reinforcement learning was applied. In this paper, the Tetris game agent was trained by applying the PPO (Proximal Policy Optimization) algorithm. In particular, the multi-agent learning method was employed for the PPO learning. For learning, the PPO-based game agent took the game screen as an input and applied the action to the game through software to play the Tetris game 500,000 times. In order for the robot to play the actual game, the neural network corresponding to the learned game agent was stored in Jetson Xavier and the motor and camera were used. In other words, the standalone Tetris robot, separate from the computer where the Tetris game is running, consists of a Jetson Xaiver, one camera, one Arduino MEGA, three servo motors, and three fingers. To evaluate the performance of the robot, the value function of the game agent was presented, and the performance of the actual robot was verified through demonstration.
AB - In this paper, we develop an artificial intelligence Tetris robot that plays the Tetris game autonomously. The Tetris robot consists of a game agent that learns how to play the Tetris game using reinforcement learning, and hardware that plays the actual game. To develop a game agent using deep reinforcement learning, the Markov decision process was defined and a policy-based deep reinforcement learning was applied. In this paper, the Tetris game agent was trained by applying the PPO (Proximal Policy Optimization) algorithm. In particular, the multi-agent learning method was employed for the PPO learning. For learning, the PPO-based game agent took the game screen as an input and applied the action to the game through software to play the Tetris game 500,000 times. In order for the robot to play the actual game, the neural network corresponding to the learned game agent was stored in Jetson Xavier and the motor and camera were used. In other words, the standalone Tetris robot, separate from the computer where the Tetris game is running, consists of a Jetson Xaiver, one camera, one Arduino MEGA, three servo motors, and three fingers. To evaluate the performance of the robot, the value function of the game agent was presented, and the performance of the actual robot was verified through demonstration.
KW - Deep Reinforcement Learning
KW - Multi-agent learning
KW - PPO (Proximal Policy Optimization)
KW - Tetris
UR - http://www.scopus.com/inward/record.url?scp=85143615249&partnerID=8YFLogxK
U2 - 10.5302/J.ICROS.2022.22.0140
DO - 10.5302/J.ICROS.2022.22.0140
M3 - Article
AN - SCOPUS:85143615249
SN - 1976-5622
VL - 28
SP - 1155
EP - 1160
JO - Journal of Institute of Control, Robotics and Systems
JF - Journal of Institute of Control, Robotics and Systems
IS - 12
ER -