DHC-R: Evaluating “Distributed Heuristic Communication” and Improving Robustness for Learnable Decentralized PO-MAPF

Keywords: PO-MAPF, reinforcement learning, generalization, out-of-distribution, AI safety

Abstract

Multi-agent pathfinding (MAPF) is a problem of coordinating the movements of multiple agents operating a shared environment that has numerous industrial and research applications. In many practical cases the agents (robots) have limited visibility of the environment and must rely on local observations to make decisions. This scenario, known as partially observable MAPF (PO-MAPF), can be solved through decentralized approaches. In recent years, several learnable algorithms have been proposed for solving PO-MAPF, that leverage the increased availability of data and computational capabilities. However, their performance is oftentimes not validated out-of-distribution (OOD) or in adversarial scenarios, and the code is often not properly open-sourced. In this study, we conduct a comprehensive empirical evaluation of one of the state-of-the-art decentralized PO-MAPF algorithms, Distributed Heuristic Communication (DHC) [1], which incorporates communication between agents. Our experiments reveal that the performance of DHC deteriorates when agents encounter complete packet loss during communication. To address this issue, we propose a novel algorithm called DHC-R (DHC-robust). DHC-R employs a similar architecture to the original DHC but introduces randomness into the Graph Neural Network-based communication block, preventing the passage of some data packets during training. Empirical evaluation confirms that DHC-R outperforms DHC in scenarios with message loss. Open-sourced model weights and the codebase are provided: https://github.com/acforvs/dhc-robust-mapf.

Published
2024-01-22