Method of Multi-Agent Reinforcement Learning in Systems with a Variable Number of Agents
https://doi.org/10.17587/mau.23.507-514
Abstract
Multi-agent reinforcement learning methods are one of the newest and actively developing areas of machine learning. Among the methods of multi-agent reinforcement learning, one of the most promising is the MADDPG method, the advantage of which is the high convergence of the learning process. The disadvantage of the MADDPG method is the need to ensure the equality of the number of agents N at the training stage and the number of agents K at the functioning stage. At the same time, target multi-agent systems (MAS), such as groups of UAVs or mobile ground robots, are systems with a variable number of agents, which does not allow the use of the MADDPG method in them. To solve this problem, the article proposes an improved MADDPG method for multi-agent reinforcement learning in systems with a variable number of agents. The improved MADDPG method is based on the hypothesis that to perform its functions, an agent needs information about the state of not all other MAS agents, but only a few nearest neighbors. Based on this hypothesis, a method of hybrid joint / independent learning of MAS with a variable number of agents is proposed, which involves training a small number of agents N to ensure the functioning of an arbitrary number of agents K, K> N. The experiments have shown that the improved MADDPG method provides an efficiency of MAS functioning com-parable to the original method with varying the number of K agents at the stage of functioning within wide limits.
About the Authors
V. I. PetrenkoRussian Federation
Petrenko Vyacheslav I., Cand. of Tech. Sc., Head of the department of organization and technology of information security
Stavropol, 355017
F. B. Tebueva
Russian Federation
M. M. Gurchinsky
Russian Federation
A. S. Pavlov
Russian Federation
References
1. Kovács G., Yussupova N., Rizvanov D. Resource management simulation using multi-agent approach and semantic constraints, Pollack Period, 2017, vol. 12, no. 1, pp. 45—58.
2. Darintsev O., Migranov A. Task Distribution Module for a Team of Robots Based on Genetic Algorithms: Synthesis Methodology and Testing, Proceedings of the 21st International Conference; Complex Systems: Control and Modeling Problems, CSCMP 2019, 2019, pp. 296—300.
3. Darintsev O. V. et al. Methods of a heterogeneous multiagent robotic system group control, Procedia Computer Science, 2019, vol. 150, pp. 687—694.
4. Wang L., Törngren M., Onori M. Current status and advancement of cyber-physical systems in manufacturing, Journal of Manufacturing Systems, 2015, vol. 37, pp. 517—527.
5. Munasypov R. A., Masalimov K. A. Neural network models for diagnostics of complex technical objects state by example of electrochemical treatment process, Proceedings of the 2nd International Ural Conference on Measurements, UralCon 2017, 2017, pp. 156—160.
6. Bonilla Venegas F. V., Moya M. J., Litvin A., Lukyanov E., Marín Pillajo L. E. Modeling and Simulation of the Robot Mitsubishi RV-2JA controlled by electromyographic signals, Enfoque UTE, vol. 9 (2), pp. 208—222.
7. Vokhmintcev A. V., Melnikov A. V., Mironov K. V., Burlutsky V. V. Reconstruction of Three-Dimensional Maps Based on Closed-Form Solutions of the Variational Problem of Multisensor Data Registration, Reports of the Academy of Sciences, 2019, vol. 484, no. 6, pp. 672—677.
8. Bogdanov A., Dudorov E., Permyakov A., Pronin A., Kutlubaev I. Control system of a manipulator of the anthropomorphic robot FEDOR, Proceedings of the International Conference on Developments in eSystems Engineering, DeSE, 2019, pp. 449—454.
9. Petrenko V., Tebueva F., Antonov V., Untewsky N., Gurchinsky M. Energy-Efficient Path Planning: Designed Software Implementation, Proceedings of the 21st International Workshop on Computer Science and Information Technologies (CSIT 2019), 2019, vol. 3, pp. 112—118.
10. Bogdanov M., Nasyrov D., Dumchikova I., Samigullin A. Processing of Biomedical Data with Machine Learning, Proceedings of the 21st International Workshop on Computer Science and Information Technologies (CSIT 2019), 2019, vol. 3, pp. 6—16.
11. Petrenko V., Tebueva F., Gurchinsky M., Antonov V. A Robotic Complex Control Method Based on Deep Reinforcement Learning of Recurrent Neural Networks for Automatic Harvesting of Greenhouse Crops, Proceedings of the 8th scientific Conference on Information Technologies For Intelligent Decision Making Support (ITIDS 2020), 2020, vol. 174, pp. 340—346.
12. Petrenko V., Tebueva F., Antonov V., Gurchinsky M., Ryabtsev S., Burianov A. Cooperative Motion Planning Method for Two Anthropomorphic Manipulators, Proceedings of the 7th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2019), 2019, vol. 166, pp. 146—151.
13. Petrenko V., Tebueva F., Pavlov A., Antonov V., Kochanov M. Path Planning Method in the Formation of the Configuration of a Multifunctional Modular Robot Using a Swarm Control Strategy, Proceedings of the 7th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2019), 2019, vol. 166, pp. 165—170.
14. Mnih V. et al. Human-level control through deep reinforcement learning, Nature, 2015, vol. 518, pp. 529—533.
15. Petrenko V., Tebueva F., Pavlov A., Svistunov N. Machine Learning Algorithm for Anthropomorphic Manipulator Control System, Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020), 2020, vol. 174, pp. 353—358.
16. Hernandez-Leal P., Kartal B., Taylor M. E. A survey and critique of multiagent deep reinforcement learning, Autonomous Agents and Multi-Agent Systems, 2019, vol. 33, pp. 750—797.
17. Pshikhopov V., Medvedev M., Medvedeva T. Terminal Motion Control of Multicopter Group, Proceedings of the 4th International Conference on Control and Robotics Engineering, ICCRE 2019, 2019, pp. 1—6.
18. Wang H., Zhao H., Ma D., Wei J. Cyber Physical System Framework for UAV Communications, Electrical Engineering and Systems Science, 2020, pp. 1—41.
19. Yusupova N., Rizvanov D., Andrushko D. Cyber-Physical Systems and Reliability Issues, Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020), 2020, vol. 174, pp. 133—137.
20. Fabarisov T.,Yusupova N., Ding K., Morozov A., Janschek K. Model-based stochastic error propagation analysis for cyberphysical systems, Acta Polytechnica Hungarica, 2020, vol. 17, no. 8, pp. 15—28.
21. Valiev E., Yusupova N., Morozov A., Janschek K., Beyer M. Evaluation of the Impact of Random Computing Hardware Faults on the Performance of Convolutional Neural Networks, Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020), 2020, vol. 174, pp. 307—312.
22. Beyer M., Morozov A., Ding K., Ding S., and Janschek K. Quantification of the impact of random hardware faults on safetycritical ai applications: Cnn-based traffic sign recognition case study, Proceedings — 2019 IEEE 30th International Symposium on Software Reliability Engineering Workshops, ISSREW 2019, 2019, pp. 118—119.
23. Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments, Advances in Neural Information Processing Systems, 2017, vol. 2017-December, pp. 1—12.
24. Foerster J., Nardelli N., Farquhar G., Afouras T., Torr P., Kohli P., Shimon Whiteson S. Stabilising experience replay for deep multi-agent reinforcement learning, ICML’17: Proceedings of the 34th International Conference on Machine Learning, 2017, vol. 70, pp. 1146—1155.
25. Gupta J. K., Egorov M., Kochenderfer M. Cooperative Multi-agent Control Using Deep Reinforcement Learning, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10642 LNAI, pp. 1—9.
26. Bloembergen D., Kaisers M., Tuyls K. Lenient frequency adjusted Q-learning, Belgian/Netherlands Artificial Intelligence Conference, 2010, pp. 19—26.
27. Omidshafiei S., Pazis J., Amato C., How J. P., Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, 2017, vol. 6, pp. 4108—4122.
28. Zheng Y., Jianye Hao J., Zhang Z. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11013 LNAI, pp. 1—8.
29. Hong Z. W., Shih-Yang Su S. Y., Shann T. Y., Chang Y. H., Lee C. Y. A deep policy inference Q-network for multi-agent systems, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2018, vol. 2, pp. 1388—1396.
30. Palmer G., Tuyls K., Bloembergen D., Savani R. Lenient multi-agent deep reinforcement learning, Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2018, vol. 1, pp. 1—9.
31. Matignon L., Laurent G. J., Le Fort-Piat N. Hysteretic Q-Learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams, IEEE International Conference on Intelligent Robots and Systems, 2007, pp. 1—7.
32. Hausknecht M., Stone P. Deep recurrent q-learning for partially observable MDPs, AAAI Fall Symposium — Technical Report, 2015, pp. 29—37.
33. Matignon L., Laurent G. J., Le Fort-Piat N. Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems, Knowledge Engineering Review, 2012, vol. 27, no. 1. pp. 1—32.
34. Tan M. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents, Machine Learning Proceedings 1993, 1993, pp. 1—8.
35. Jaderberg M. et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, 2019, vol. 364, no. 6443, pp. 859—865.
36. Espeholt L. et al. IMPALA: Scalable Distributed DeepRL with Importance Weighted Actor-Learner Architectures, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, 2018, vol. 4, pp. 1—10.
37. Lillicrap T. P. et al. Continuous control with deep reinforcement learning, Proceedings of the 4th International Conference on Learning Representations, ICLR 2016 — Conference Track Proceedings, 2016, pp. 1—14.
38. Foerster J. N., Assael Y. M., Nando de Freitas N., Whiteson S. Learning to communicate with deep multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 2016, pp. 1—9.
39. Silver D., Lever G., Heess N., Degris T., Wierstra D., Riedmiller M. Deterministic policy gradient algorithms, Proceedings of the 31st International Conference on Machine Learning, ICML 2014, 2014, vol. 32, pp. 387—395.
Review
For citations:
Petrenko V.I., Tebueva F.B., Gurchinsky M.M., Pavlov A.S. Method of Multi-Agent Reinforcement Learning in Systems with a Variable Number of Agents. Mekhatronika, Avtomatizatsiya, Upravlenie. 2022;23(10):507-514. (In Russ.) https://doi.org/10.17587/mau.23.507-514