机器人模仿学习调研(二)
一、机器人模仿学习的基础概念
- 核心定义:模仿学习使机器人能够从专家演示中学习 [67]。
- 基础方法 —— 行为克隆(BC):将观察直接映射到动作的简单形式。
二、行为克隆(BC)的技术增强方向
1. 架构优化
通过不同网络架构整合历史信息,提升学习效率:
- 相关研究:[12,47,59,77]。
2. 训练目标创新
提出新的优化目标以改进模仿效果:
- 相关研究:[10,18,35,63,104]。
3. 正则化技术
通过正则化方法提升模型泛化能力:
- 相关研究:[71]。
4. 运动原语应用
引入运动原语(如动态运动模型、概率运动模型)表示技能:
- 相关研究:[7,44,55,62,64,97]。
5. 数据预处理
通过数据清洗、增强等预处理提升数据质量:
- 相关研究:[81]。
三、模仿学习的前沿研究方向
1. 多任务与少样本学习
实现机器人在少量演示下的多任务泛化:
- 相关研究:[25,27,30,34,46,50,88,102]。
2. 语言条件模仿学习
结合自然语言指令引导机器人操作:
- 相关研究:[12,47,82,83]。
3. 从游戏数据中模仿
利用机器人 “玩耍” 数据或人类游戏行为学习技能:
- 相关研究:[21,57,74,89]。
4. 基于人类视频的模仿
通过观看人类操作视频学习物理技能:
- 相关研究:[16,24,29,60,69,80,84,96]。
5. 任务特定结构设计
针对特定任务设计结构化学习框架:
- 相关研究:[49,83,103]。
四、算法扩展的泛化能力成果
通过算法优化,系统可泛化至新物体、指令或场景:
- 相关研究:[12,13,28,47,54]。
五、联合训练的最新进展
1. 单臂操作任务
利用不同机器人的真实数据集联合训练,提升单臂操作性能:
- 相关研究:[11,20,31,61,98]。
2. 导航任务
联合训练在机器人导航场景中展现有效性:
- 相关研究:[79]。
[67]:Dean A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In NIPS, 1988.(提出模仿学习可让机器人从专家演示中学习,ALVINN 通过神经网络实现自主陆地车辆控制 ,是早期机器人模仿学习的经典案例 )
[12]:Anthony Brohan. Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.(在模仿学习算法改进方面,涉及新的训练目标、语言条件模仿学习及算法扩展等多方面研究 )
[47]:Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.(聚焦于行为克隆在零样本任务泛化方面的研究,体现对行为克隆技术的拓展 )
[59]:Ajay Mandlekar, Danfei Xu, J. Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Mart’in-Mart’in. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021.(研究从离线人类演示中学习机器人操作的关键因素,涉及整合历史信息等行为克隆增强方法 )
[77]:Nur Muhammad (Mahi) Shafiullah, Zichen Jeff Cui, Ariuntuya Altanzaya, and Lerrel Pinto. Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022. (通过行为 transformers 实现多种模式克隆,是对行为克隆架构改进的探索 )
[10]:H Bharadhwaj, J Vakil, M Sharma, A Gupta, S Tulsiani, and V Kumar. Roboagent: Towards sample efficient robot manipulation with semantic augmentations and action chunking, 2023.(探索提高机器人操作样本效率的方法,涉及新的训练目标设定 )
[18]:Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.(提出扩散策略用于视觉运动策略学习,属于新的训练目标相关研究 )
[35]:Peter R. Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian S. Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021.(提出隐式行为克隆方法,创新训练目标以改进模仿学习 )
[63]:Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, and Lerrel Pinto. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.(研究表征学习在视觉模仿中的有效性,涉及新训练目标和行为克隆改进 )
[104]:Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023.(利用低成本硬件学习精细双手操作,在新训练目标、联合训练等方面有重要贡献 )
[71]:Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Bölöni, and Sergey Levine. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017.(使用端到端学习实现基于视觉的多任务操作,应用正则化方法提升机器人学习效果 )
[7]:Shikhar Bahl, Abhinav Gupta, and Deepak Pathak. Hierarchical neural dynamic policies. RSS, 2021.(提出分层神经动态策略,引入运动原语相关概念改进模仿学习 )
[44]:Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation, 2013.(提出动态运动原语,用于学习运动行为的吸引子模型,是运动原语在机器人学习中的重要应用 )
[55]:Jens Kober and Jan Peters. Learning motor primitives for robotics. In 2009 IEEE International Conference on Robotics and Automation, 2009.(研究机器人运动原语的学习方法,推动运动原语在机器人领域的应用 )
[62]:Alexandros Paraschos, Christian Daniel, Jan Peters, and Gerhard Neumann. Using probabilistic movement primitives in robotics. Autonomous Robots, 42:529–551, 2018.(将概率运动原语应用于机器人,丰富运动原语在机器人学习中的形式 )
[64]:Peter Pastor, Heiko Hoffmann, Tamim Asfour, and Stefan Schaal. Learning and generalization of motor skills by learning from demonstration. 2009 IEEE International Conference on Robotics and Automation, pages 763–768, 2009.(通过演示学习实现运动技能的学习与泛化,与运动原语的应用密切相关 )
[97]:Jingyun Yang, Junwu Zhang, Connor Settle, Akshara Rai, Rika Antonova, and Jeannette Bohg. Learning periodic tasks from human demonstrations. In 2022 International Conference on Robotics and Automation (ICRA), pages 8658–8665. IEEE, 2022.(从人类演示中学习周期性任务,运用运动原语相关技术 )
[81]:Lucy Xiaoyang Shi, Archit Sharma, Tony Z Zhao, and Chelsea Finn. Waypoint-based imitation learning for robotic manipulation. CoRL, 2023.(提出基于路标点的模仿学习方法,涉及数据预处理等技术改进 )
[25]:Sudeep Dasari and Abhinav Kumar Gupta. Transformers for one-shot visual imitation. In Conference on Robot Learning, 2020.(利用 Transformers 实现单样本视觉模仿学习,属于多任务或少样本模仿学习研究 )
[27]:Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, P. Abbeel, and Wojciech Zaremba. One-shot imitation learning. ArXiv, abs/1703.07326, 2017.(专注于单样本模仿学习研究,探索少样本情况下机器人模仿学习的可行性 )
[30]:Peter Englert and Marc Toussaint. Learning manipulation skills from a single demonstration. The International Journal of Robotics Research, 37 (1):137–154, 2018.(研究从单个演示中学习操作技能,是少样本模仿学习的重要探索 )
[34]:Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot visual imitation learning via meta-learning. In Conference on robot learning, 2017.(通过元学习实现单样本视觉模仿学习,推动少样本模仿学习发展 )
[46]:Stephen James, Michael Bloesch, and Andrew J. Davison. Task-embedded control networks for few-shot imitation learning. ArXiv, abs/1810.03237, 2018.(提出任务嵌入控制网络用于少样本模仿学习,为少样本学习提供新方法 )
[50]:Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021.(从单个演示实现从粗到细的模仿学习,属于少样本模仿学习范畴 )
[88]:Eugene Valassakis, Georgios Papagiannis, Norman Di Palo, and Edward Johns. Demonstrate once, imitate immediately (dome): Learning visual servoing for one-shot imitation learning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.(研究单次演示立即模仿的方法,聚焦少样本模仿学习 )
[102]:Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.(通过领域自适应元学习实现从观察人类的单样本模仿,推进少样本模仿学习研究 )
[82]:Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021.(研究机器人操作的 “what” 和 “where” 路径,涉及语言条件模仿学习 )
[83]:Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.(提出感知器 - 行动者模型用于机器人操作,涵盖语言条件模仿学习和任务特定结构研究 )
[21]:Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022.(从机器人非整理数据中生成条件行为,探索从游戏数据中模仿学习的方法 )
[57]:Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent plans from play. In Conference on robot learning, pages 1113–1132. PMLR, 2020.(从机器人的 “玩耍” 中学习潜在计划,属于从游戏数据中模仿学习的研究 )
[74]:Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pages 1838–1849. PMLR, 2023.(研究任务无关的离线强化学习中的潜在计划,与从游戏数据中模仿学习相关 )
[89]:Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, and Anima Anandkumar. Mimicplay: Long-horizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023.(提出 Mimicplay 通过观看人类玩耍实现长期模仿学习,是从游戏数据中模仿学习的新成果 )
[16]:Annie S Chen, Suraj Nair, and Chelsea Finn. Learning generalizable robotic reward functions from "in-the-wild" human videos. arXiv preprint arXiv:2103.16817, 2021.(从人类野外视频中学习可泛化的机器人奖励函数,属于使用人类视频的模仿学习研究 )
[24]:Neha Das, Sarah Bechtle, Todor Davchev, Dinesh Jayaraman, Akshara Rai, and Franziska Meier. Model-based inverse reinforcement learning from visual demonstrations. In Conference on Robot Learning, pages 1930–1942. PMLR, 2021.(基于视觉演示进行基于模型的逆强化学习,涉及使用人类视频作为学习数据 )
[29]:Ashley D Edwards and Charles L Isbell. Perceptual values from observation. arXiv preprint arXiv:1905.07861, 2019.(从观察中获取感知价值,可视为使用人类视频模仿学习的相关探索 )
[60]:Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.(提出通用视觉表征用于机器人操作,与使用人类视频进行模仿学习有关 )
[69]:Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, and Trevor Darrell. Real-world robot learning with masked visual pre-training. CoRL, 2022.(通过掩码视觉预训练进行真实世界机器人学习,使用人类视频等数据 )
[80]:Lin Shao, Toki Migimatsu, Qiang Zhang, Karen Yang, and Jeannette Bohg. Concept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40 (12-14):1419–1434, 2021.(从指令和人类演示中学习操作概念,利用人类视频数据 )
[84]:Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, and Sergey Levine. Avid: Learning multi-stage tasks via pixel-level translation of human videos. arXiv preprint arXiv:1912.04443, 2019.(通过人类视频像素级转换学习多阶段任务,是使用人类视频模仿学习的实例 )
[96]:Haoyu Xiong, Quanzhou Li, Yun-Chun Chen, Homanga Bharadhwaj, Samarth Sinha, and Animesh Garg. Learning by watching: Physical imitation of manipulation skills from human videos. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7827–7834. IEEE, 2021.(通过观看人类视频进行操作技能的物理模仿学习,专注于使用人类视频的模仿学习 )
[49]:Edward Johns. Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613–4619, 2021.(从单个演示实现从粗到细的模仿学习,探索任务特定结构在模仿学习中的应用 )
[103]:Andy Zeng, Peter R. Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020.(提出运输器网络用于机器人操作,涉及任务特定结构研究 )
[13]:Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.(聚焦于视觉 - 语言 - 动作模型,研究如何将网络知识迁移到机器人控制中,进一步探讨机器人在复杂场景下对不同任务和场景的适应与泛化 )
[28]:Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, and Sergey Levine. Bridge data: Boosting generalization of robotic skills with cross-domain datasets. ArXiv, abs/2109.13396, 2021.(探索利用跨域数据集提升机器人技能的泛化能力,研究如何通过数据处理和算法改进,让机器人在不同领域和场景下更好地应用所学技能 )
[47]:Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, Sergey Levine, and Chelsea Finn. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.(专注于机器人模仿学习中的零样本任务泛化,通过行为克隆(BC)技术改进,使机器人能够在未见过的任务上展现一定的执行能力,研究算法扩展对系统泛化能力的影响 )
[54]:Heecheol Kim, Yoshiyuki Ohmura, and Yasuo Kuniyoshi. Robot peels banana with goal-conditioned dual-action deep imitation learning. ArXiv, abs/2203.09749, 2022.(研究基于目标条件的双动作深度模仿学习,让机器人实现剥香蕉等任务,从侧面反映模仿学习算法在特定任务中的应用以及对新场景的适应性 )
[11]:Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Rothörl, José Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, and Nicolas Heess. Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.(研究用于机器人操作的自我改进基础智能体 Robocat,探索通过联合训练等方式提升机器人在操作任务中的表现,尤其是在单臂操作方面 )
[20]:Open X-Embodiment Collaboration. Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.(构建机器人学习数据集和 RT-X 模型,整合多源数据,为机器人学习提供丰富资源,验证联合训练在提升机器人性能方面的作用,包括单臂操作等任务 )
[31]:Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@CoRL2023, 2023.(开发用于一次性学习多种技能的综合机器人数据集 Rh20t,为机器人技能学习提供数据支持,研究联合训练等学习范式对机器人技能获取和性能提升的影响,涉及单臂操作等场景 )
[61]:Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Charles Xu, Jianlan Luo, Tobias Kreiman, You Liang Tan, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.(提出开源的通用机器人策略 Octo,探索如何让机器人具备通用的操作能力,研究联合训练在提升机器人策略性能和适应性方面的效果,包含单臂操作相关研究 )
[98]:Jonathan Heewon Yang, Dorsa Sadigh, and Chelsea Finn. Polybot: Training one policy across robots while embracing variability. In Conference on Robot Learning, pages 2955–2974. PMLR, 2023.(研究跨机器人训练统一策略 Polybot,探索如何在不同机器人上实现策略共享和优化,通过联合训练提升机器人在多种任务上的表现,包括单臂操作任务 )
[79]:Dhruv Shah, Ajay Sridhar, Arjun Bhorkar, Noriaki Hirose, and Sergey Levine. Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226–7233. IEEE, 2023.(开发通用导航模型 GNM,使机器人具备通用导航能力,研究联合训练等技术在机器人导航任务中的应用,验证其在不同机器人和场景下的导航性能提升效果 )