Variational Autoencoder for Efficient Image Representation in Deep Reinforcement Learning for Mobile Robot Navigation

  • Long Phi Nguyen Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity
  • Giang Truong Dao Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity
  • Truong Tho Do Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity
Keywords: Autonomous Navigation, Reinforcement Learning, Collision Avoidance, Neural Network, Mobile Robot

Abstract

In this paper, we present a deep reinforcement learning (DRL) policy for autonomous mobile robot navigation in cluttered environments. A key challenge in applying DRL to vision-based navigation is the high dimensionality of raw image observations, which often leads to poor sample efficiency and unstable training. To address this issue, we propose a representation learning module based on a variational autoencoder (VAE), trained on simulated depth images to compress visual input into a compact latent representation. This encoding preserves essential spatial information such as obstacle boundaries and free space, while filtering out redundant details, resulting in a more structured and informative observation space. The latent features, combined with odometry data, are then used as input to the DRL policy, enabling the robot to reliably reach specified targets while avoiding collisions. Extensive simulation experiments in diverse and cluttered environments demonstrate that our method achieves a success rate of approximately 66.7%, which is significantly higher than baselines trained directly on raw depth images or with FFT-based encoders under the same training budget. These results highlight the effectiveness of combining VAE-based representation learning with DRL for robust and efficient autonomous navigation.

Published
2025-10-24