TY - JOUR
T1 - Automated Hyperparameter Tuning in Reinforcement Learning for Quadrupedal Robot Locomotion
AU - Kim, Myeong Seop
AU - Kim, Jung Su
AU - Park, Jae Han
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2024/1
Y1 - 2024/1
N2 - In reinforcement learning, the reward function has a significant impact on the performance of the agent. However, determining the appropriate value of this reward function requires many attempts and trials. Although many automated reinforcement learning methods have been proposed to find an appropriate reward function, their proof is lacking in complex environments such as quadrupedal locomotion. In this paper, we propose a method to automatically tune the scale of the dominant reward functions in reinforcement learning of a quadrupedal robot. Reinforcement learning of the quadruped robot is very sensitive to the reward function, and recent outstanding research results have put a lot of effort into reward shaping. In this paper, we propose an automated reward shaping method that automatically adjusts the reward function scale appropriately. We select some dominant reward functions, arrange their weights in a certain unit, and then calculate their gait scores so that we can select the agent with the highest score. This gait score was defined to reflect the stable walking of the quadrupedal robot. Additionally, quadrupedal locomotion learning requires reward functions of different scales depending on the robot’s size and shape. Therefore, we evaluate the performance of the proposed method on two different robots.
AB - In reinforcement learning, the reward function has a significant impact on the performance of the agent. However, determining the appropriate value of this reward function requires many attempts and trials. Although many automated reinforcement learning methods have been proposed to find an appropriate reward function, their proof is lacking in complex environments such as quadrupedal locomotion. In this paper, we propose a method to automatically tune the scale of the dominant reward functions in reinforcement learning of a quadrupedal robot. Reinforcement learning of the quadruped robot is very sensitive to the reward function, and recent outstanding research results have put a lot of effort into reward shaping. In this paper, we propose an automated reward shaping method that automatically adjusts the reward function scale appropriately. We select some dominant reward functions, arrange their weights in a certain unit, and then calculate their gait scores so that we can select the agent with the highest score. This gait score was defined to reflect the stable walking of the quadrupedal robot. Additionally, quadrupedal locomotion learning requires reward functions of different scales depending on the robot’s size and shape. Therefore, we evaluate the performance of the proposed method on two different robots.
KW - automated machine learning
KW - quadrupedal robot
KW - reinforcement learning
KW - reward shaping
UR - http://www.scopus.com/inward/record.url?scp=85181968556&partnerID=8YFLogxK
U2 - 10.3390/electronics13010116
DO - 10.3390/electronics13010116
M3 - Article
AN - SCOPUS:85181968556
SN - 2079-9292
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 1
M1 - 116
ER -