In summary, our contributions are the follows:
• We introduce any-to-policy models that enable a unified embodied agent to process various combinations of modalities, effectively facilitating instruction and perception of the world.
• We present novel embodied alignment learning techniques designed to seamlessly align instructions and observations, enhancing both the effectiveness and efficiency of policy learning.
• We offer a multi-modal dataset tailored for robotics, encompassing 30 distinct tasks. This dataset covers a wide spectrum of modalities in both instruction and observation.
实验:我们组装了一个包含30个机器人使命的综合真实数据集。
a real-world setting using our own collected dataset。 Simulation Evaluation: Franka Kitchen [ 92] uses text-image and ManiSkill2. 结论:该框架有效地处置惩罚并响应机器人使命的多模态数据。整个框架与其多模态数据集相结合,代表了表现 AI 领域的庞大进步。