四足机器人

论文阅读：Energy Minimization based Deep Reinforcement Learning Policies for Quadrupeds

这篇文章相对其他文章算是简单的一篇，里面用到的很多理论和方法都在以前的论文里出现过。这里只是简单的记录一下。

1. RL Framework

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots: the choice of state space has a direct impact on the sim to real transfer, we note that this can be primarily summarized as the fewer dimensions in the state space the easier it is to do a sim to real transfer as the noise and drift increase with an increase in the number of parameters being included in the state space.
State Space:

Action Space: “Learning locomotion skills using DeepRL: does the choice of action space matter? ”: the choice of action space directly effects the learning speed and hence we went ahead with joint angles as the action space representation, further to strongly center all our gait from the neutral standing pose of the robot.

control_scheme

Proximal Policy Optimization Algorithm
We use a Multi Layered Perceptron architecture with 2 hidden layers of size 256 units each and ReLU activation to represent both the actor and critic networks.
Hyper Parameters:

\begin{aligned} \text{Total Reward} &= \text{Energy Cost} + \text{Survival Reward} + \text{Goal Velocity Cost} \\ \text{Energy Cost} &= C_1\tau\omega \\ \text{Survival Reward} &= C_2|v_x^g| + C_3|v_y^g| + C_4|\omega_z^g| \\ \text{Goal Velocity Cost} &= -C_2|v_x-v_x^g| - C_3|v_y-v_y^g| - C_4|\omega_z - \omega_z^g| \\ \end{aligned}

This approach is based on the idea that starting with simpler tasks and gradually increasing the complexity can help the agent learn more efficiently.
There are two ways in which curriculum learning is usually implemented. One is to use a pre-defined set of tasks or environments that are ordered by increasing difficulty. The agent is trained on these tasks in a specific order, with the difficulty of the tasks increasing as the agent progresses through the curriculum. Another approach is to use a dynamic curriculum, where the difficulty of the tasks is adjusted based on the agent’s performance.

We employ a mix of domain randomization and system identification for sim-to-real transfer.

如果觉得文章对你有用，请随意赞赏

论文阅读：Energy Minimization based Deep Reinforcement Learning Policies for Quadrupeds

Rocket Sky

2023-03-10

2023-06-13

CC BY 4.0