Variational Autoencoder for Efficient Image Representation in Deep Reinforcement Learning for Mobile Robot Navigation

Long Phi Nguyen; Giang Truong Dao; Truong Tho Do

Long Phi Nguyen Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity
Giang Truong Dao Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity
Truong Tho Do Center for Environmental Intelligence and College of Engineering & Computer Science, VinUniversity

Keywords: Autonomous Navigation, Reinforcement Learning, Collision Avoidance, Neural Network, Mobile Robot

Abstract

In this paper, we present a deep reinforcement learning (DRL) policy for autonomous mobile robot navigation in cluttered environments. A key challenge in applying DRL to vision-based navigation is the high dimensionality of raw image observations, which often leads to poor sample efficiency and unstable training. To address this issue, we propose a representation learning module based on a variational autoencoder (VAE), trained on simulated depth images to compress visual input into a compact latent representation. This encoding preserves essential spatial information such as obstacle boundaries and free space, while filtering out redundant details, resulting in a more structured and informative observation space. The latent features, combined with odometry data, are then used as input to the DRL policy, enabling the robot to reliably reach specified targets while avoiding collisions. Extensive simulation experiments in diverse and cluttered environments demonstrate that our method achieves a success rate of approximately 66.7%, which is significantly higher than baselines trained directly on raw depth images or with FFT-based encoders under the same training budget. These results highlight the effectiveness of combining VAE-based representation learning with DRL for robust and efficient autonomous navigation.