Refined 3D Object Localization with Monocular Camera using Depth Estimation and Geometric Refinement

Thanh Nguyen Canh; Quang Minh Trinh; Thai-Viet Dang; Phan Xuan Tan; Xiem HoangVan

Thanh Nguyen Canh Japan Advanced Institute of Science and Technology https://orcid.org/0000-0001-6332-1002
Quang Minh Trinh Vietnam National University, University of Engineering and Technology
Thai-Viet Dang Hanoi University of Science and Technology, Vietnam
Phan Xuan Tan Shibaura Institute of Technology
Xiem HoangVan Vietnam National University, University of Engineering and Technology https://orcid.org/0000-0002-7524-6529

Keywords: 3D object localization, depth estimation, intelligent system, robot calibration

Abstract

Accurate 3D object localization is a fundamental requirement for applications in industrial robotics, augmented reality, and autonomous navigation. While traditional multi-view systems are precise, their hardware complexity and cost limit widespread adoption. Monocular vision offers a cost-effective alternative but struggles with the inherent challenge of inferring depth from a 2D image, often leading to significant localization errors. This paper presents a novel methodology that overcomes these limitations, achieving high-precision 3D localization using only a single camera. Our proposed framework integrates three synergistic stages. First, a camera calibration process using a checkerboard pattern corrects lens distortion and establishes a real-world metric coordinate system. Second, we employ the YOLOv8 model for real-time 2D object detection and the ZoeDepth network to generate a dense depth map from the monocular input. To mitigate spatial inaccuracies that arise when objects are positioned off-center, we introduce a geometric object position refinement technique. This method adjusts the object’s center based on its projected image coordinates and depth information. Experimental results demonstrate the superiority of our approach, which achieves an average position error of just 4.6mm. This represents a significant 59.29% improvement in localization accuracy compared to using the standalone YOLOv8 detector, showcasing our method's effectiveness for robust 3D localization.