基于元生成内在奖励的机器人操作技能学习方法

doi:10.3969/j.issn.1000-1158.2023.06.13

Abstract
Figure/Table
References (16)
Related Citation (15)

Download: PDF (761 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract To address the problem of low learning efficiency for complex tasks under sparse rewards, a meta generative intrinsic reward (MGIR) algorithm was proposed based on the idea of off policy reinforcement learning. And it has been applied to the problem solving of robot operation skills learning. The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks, and evaluated the ability of subtasks. Then, an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward. And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards. Finally, comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.

Key words： metrology robot operation skills learning sparse reward reinforcement learning meta learning generative intrinsic reward

Received: 03 January 2023 Published: 25 June 2023

PACS:	TB93
	TB973

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	WU Pei-liang
	QU You-yuan
	LI Yao
	CHEN Wen-bai
	GAO Guo-wei

Cite this article:

WU Pei-liang,QU You-yuan,LI Yao, et al. A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning[J]. Acta Metrologica Sinica, 2023, 44(6): 923-930.

URL:

http://jlxb.china-csm.org:81/Jwk_jlxb/EN/10.3969/j.issn.1000-1158.2023.06.13 OR http://jlxb.china-csm.org:81/Jwk_jlxb/EN/Y2023/V44/I6/923

［2］	吴培良, 刘瑞军, 李瑶, 等. 一种基于生成对抗网络与模型泛化的机器人推抓技能学习方法［J］. 仪器仪表学报, 2022, 43(5): 244-253.
［3］	Ding Z, Tsai Y Y, Lee W W, et al. Sim-to-Real Transfer for Robotic Manipulation with Tactile Sensory ［C］//IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS). Prague, Czech Republic, 2021.
［4］	Liu R Z, Pang Z J, Meng Z Y, et al. On Efficient Reinforcement Learning for Full-length Game of StarCraft II ［J］. Journal of Artificial Intelligence Research, 2022, 75: 213-260.
［6］	Zou H S, Ren T Z, Yan D, et al. Learning task-distribution reward shaping with meta-learning ［C］//AAAI Conference on Artificial Intelligence. Vancouver, British Columbia, Canada, 2021.
［11］	Marzari L, Pore A, DallAlba D, et al. Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks ［C］//20th International Conference on Advanced Robotics (ICAR). Ljubljana, Slovenia, 2021.
［13］	Bellemare M, Srinivasan S, Ostrovski G, et al. Unifying count-based exploration and intrinsic motivation ［C］//The 30th International Conference on Neural Information Processing Systems. NY, USA, 2016.
［14］	Yu X, Lyu Y, Tsang I. Intrinsic reward driven imitation learning via generative model ［C］//International Conference on Machine Learning. Vienna, AUSTRIA, 2020.
［16］	Schaual T, Horgan D, Gregor K, et al. Universal value function approximators ［C］//International conferenceon machine learning. Lille, France, 2015.
［8］	Zhao R, Sun X, Tresp V. Maximum entropy-regularized multi-goal reinforcement learning ［C］//International Conference on Machine Learning. Long Beach,CA, 2019.
	Wu P L, Liu R J, Li Y, et al. Robot pushing and grasping skill learning method based on generative adversarial network and model generalization ［J］. Chinese Journal of Scientific Instrument, 2022, 43(5): 244-253.
［9］	McLeod M, Lo C, Schlegel M, et al. Continual auxiliary task learning ［J］. Neural Information Processing Systems, 2021, 34: 12549-12562.
［12］	Machado M C, Bellemare M G, Bowling M. Count-based exploration with the successor representation ［C］//AAAI Conference on Artificial Intelligence. New York, USA, 2020.
［1］	Deng S, Xu X, Wu C, et al. 3d affordancenet: A benchmark for visual object affordance understanding ［C］//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 2021.
［5］	Li D, Zhao D, Zhang Q, et al. Reinforcement learning and deep learning based lateral control for autonomous driving ［J］. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98.
［7］	Huang B Y, Tsai S C. Solving hard-exploration problems with counting and replay approach ［J］. Engineering Applications of Artificial Intelligence, 2022, 110: 104701.
［10］	Andrychowicz M, Wolski F, Ray A, et al. Hindsigh experience replay ［J］. Neural Information Processing Systems, 2017, 12(3): 5048-5058.
［15］	Bai C, Liu P, Liu K, et al. Variational dynamic for self-supervised exploration in deep reinforcement learning ［J］. IEEE Transactions on Neural Networks and Learning Systems, 2021, Advance online publication.