Efficient Human-Robot Interaction via Deep Perception and flexible Motion Planning

Thanh Nguyen Canh; Thang Tran Viet; Son Tran Duc; Huong Nguyen The; Trang Huyen Dao; Viet-Ha Nguyen; Xiem HoangVan

Thanh Nguyen Canh Japan Advanced Institute of Science and Technology https://orcid.org/0000-0001-6332-1002
Thang Tran Viet Vietnam National University, University of Engineering and Technology
Son Tran Duc Vietnam National University, University of Engineering and Technology
Huong Nguyen The Vietnam National University, University of Engineering and Technology
Trang Huyen Dao Vietnam National University, University of Engineering and Technology
Viet-Ha Nguyen Vietnam National University, University of Engineering and Technology
Xiem HoangVan Vietnam National University, University of Engineering and Technology https://orcid.org/0000-0002-7524-6529

Keywords: Human-Robot Interaction, Action and Emotion Recognition, Deep Learning, Dual Arm Robot, obot Operating System (ROS)

Abstract

Human-Robot Interaction (HRI) is an emergent field propelled by advancements in artificial intelligence, yet achieving seamless human understanding and responsive robot control remains a significant challenge. This paper introduces an efficient, integrated HRI system for a custom-built dual-arm robot, presenting two primary contributions. First, we propose a sophisticated perception system that leverages deep learning to interpret human states. This system employs a Multi-task Cascaded Convolutional Neural Network (MTCNN) for robust face detection, a Deep Convolutional Neural Network (DCNN) for recognizing emotions, and a Long Short-Term Memory (LSTM) network to identify dynamic gestures from image sequences. Second, we detail a flexible dual-arm robot control system built on the Robot Operating System (ROS). This control system utilizes the Rapidly-exploring Random Tree (RRT) algorithm for efficient path planning, enabling the robot to translate recognized human cues into corresponding actions. Comprehensive evaluations on benchmarks, including the WIDER FACE and FER2013 datasets, validate the perception models. The proposed system was validated through both simulation and physical experiments, demonstrating high accuracy in perception and control. The results highlight the framework’s effectiveness in creating fluid and responsive interactions for complex HRI scenarios.