OracleAny2Policy: Learning Visuomotor Policy with Any-Modality（类似AnyGPT）

不到断气不罢休 发表于 2024-12-13 19:49:34

Any2Policy: Learning Visuomotor Policy with Any-Modality（类似AnyGPT）

发表时间：NeurIPS 2024
论文链接：https://readpaper.com/pdf-annotate/note?pdfId=2598959255168534016&noteId=2598960522854466816
作者单位：Midea Group
Motivation：Current robotic learning methodologies often focus on single-modal task specification and observation, thereby limiting their ability to process rich multi-modal information.（从多模态的角度切入）
https://i-blog.csdnimg.cn/direct/20b98eb73f2f48a4abf1f5195a413205.pngAny2Policy 框架旨在处置惩罚多模态输入，分别在指令和观察级别单独或串联容纳它们。
我们设计了嵌入式对齐模块，旨在同步不同模态之间的特征，以及指令和观察，确保不同输入类型的无缝和有效的集成。
解决方法：为了解决这一限定，我们提出了一个名为 Any-to-Policy Embodied Agents 的端到端通用多模态体系。该体系使机器人能够使用各种模式处置惩罚使命，无论是在文本图像、音频图像、文本点云等组合中。
实现方式：我们的创新方法包括训练一个通用模态网络，该网络顺应各种输入，并与计谋网络毗连以进行有效控制。https://i-blog.csdnimg.cn/direct/45d9446672804540a21dfd2c056a6ee6.png
In summary, our contributions are the follows:
• We introduce any-to-policy models that enable a unified embodied agent to process various combinations of modalities, effectively facilitating instruction and perception of the world.
• We present novel embodied alignment learning techniques designed to seamlessly align instructions and observations, enhancing both the effectiveness and efficiency of policy learning.
• We offer a multi-modal dataset tailored for robotics, encompassing 30 distinct tasks. This dataset covers a wide spectrum of modalities in both instruction and observation.
实验：我们组装了一个包含30个机器人使命的综合真实数据集。
a real-world setting using our own collected dataset。
Simulation Evaluation: Franka Kitchen [ 92] uses text-image and ManiSkill2.
结论：该框架有效地处置惩罚并响应机器人使命的多模态数据。整个框架与其多模态数据集相结合，代表了表现 AI 领域的庞大进步。

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

IT评测·应用市场-qidao123.com技术社区's Archiver

Any2Policy: Learning Visuomotor Policy with Any-Modality（类似AnyGPT）