Model-based Policy Optimization with Neural Ordinary Differential Equations

  • mironovconst Moscow Institute of Physics and Technology
Keywords: Reinforcement Learning, Deep learning, Manipulation

Abstract

Applying learning-based control methods to real robots presents hard challenges, including the low sample efficiency of model-free reinforcement learning algorithms. The widely adopted approach to tackling this problem uses an environment dynamics model. We propose to use the Neural Ordinary Differential Equations to approximate transition dynamics as this allows for finer control of a trajectory generation process. We check our approach on several tasks from simulation environment including learning 6-DoF robotic arm to open the door, which represents particular challenges for policy search. The NODE model is trained to predict movement of the arm and the door, and is used to generate trajectories for the model-based policy optimization. Our method shows better sample efficiency on this task comparing to the model-free and model-based baseline. It also shows comparable results on several other tasks.

Published
2024-01-22