Continuous control algorithms for conveyer belt routing based on multi-agent deep reinforcement learning
Keywords:
routing, multi-agent learning, reinforcement learning, conveyor beltAbstract
Introduction: We consider the problem of routing of piece cargo by a conveyor system. When moving cargo pieces, it is necessary not only to minimize the time of transportation, but also to minimize the energy spent on it. Purpose: Development of a routing algorithm that is adaptive to changes in the topology of the routing graph and is able to optimize the delivery time and the consumed energy. Results: We propose an algorithm based on multi-agent deep reinforcement learning that places agents at the vertices of a conveyor network graph and uses a new state value function. The algorithm has two tunable parameters: the length of the path along which the state value function is calculated, and the learning coefficient. Through the selection of parameters, we have revealed that the optimal values are 2 and 1, respectively. An experimental study of the algorithm using a simulation model has shown that it allows to reduce the number of collisions of moving objects to zero, demonstrates stable results for both optimized scores, and also leads to a lower energy consumption compared with the method used as a baseline. Practical relevance: The proposed algorithm can be used to reduce delivery time and energy when managing conveyor systems.