谷歌推出的SIMA 2智能体通过Gemini系统,能够在虚拟环境中进行推理并执行相应操作。

内容总结:
谷歌旗下人工智能公司DeepMind于近日公布了其新一代通用AI智能体SIMA 2的研究预览版。该智能体融合了谷歌大语言模型Gemini的语言与推理能力,实现了从机械执行指令到主动感知、理解并交互环境的跨越式突破。
作为继AlphaFold等里程碑项目后的最新成果,SIMA系列通过数百小时游戏视频训练掌握了类人的3D环境交互能力。今年3月发布的初代版本虽能跨游戏执行基础指令,但在复杂任务完成率上仅达人类水平(71%)的44%。而搭载Gemini 2.5 flash-lite模型的SIMA 2实现了性能翻倍,不仅能适应未知环境,更具备通过自主经验积累持续进化的能力。
“这是通向通用人工智能与实用机器人技术的重要一步。”DeepMind高级研究员乔·马里诺在发布会上强调。演示显示,SIMA 2在《无人深空》游戏中能准确描述岩石星球地貌并操作求救信标,更能理解“走向熟番茄颜色房子”的抽象指令——通过内部推理将“熟番茄=红色”转化为行动依据。其创新性还体现在能解析表情符号指令,如接收“🪓🌲”后执行砍树操作。
研究团队特别指出,SIMA 2的突破性在于实现了“具身智能”——通过虚拟身体感知环境并采取行动,这区别于仅处理日程、代码的传统AI。神经科学背景的研究员简·王解释道:“它需要真正理解场景与用户意图,并以符合常识的方式响应,这种能力构建极具挑战性。”
该系统的自我进化机制尤为引人注目:通过另一组Gemini模型生成新任务并建立奖励评分体系,智能体可基于AI反馈而非人类数据,以“试错学习”方式持续优化行为模式。DeepMind资深工程师弗雷德里克·贝斯补充道,这种高层认知推理能力,正是未来家庭机器人理解“检查橱柜豆罐头存量”等复杂指令的核心基础。
尽管团队未公布SIMA 2在实体机器人系统的具体应用时间表,但明确表示这项技术展示了通向通用机器人的可行路径。目前发布的预览版旨在激发全球科研机构探索潜在合作方向,推动人工智能在三维空间交互领域迈向新纪元。
中文翻译:
谷歌旗下DeepMind团队于3月28日公布了其通用人工智能代理SIMA的二代版本研究预览。这款新一代AI智能体融合了谷歌大语言模型Gemini的语言与推理能力,实现了从单纯执行指令到理解环境并主动交互的跨越。
与DeepMind多个项目(包括AlphaFold)类似,初代SIMA通过数百小时游戏视频数据训练,掌握了像人类一样玩多款3D游戏的技能,甚至能驾驭未经专门训练的游戏。2024年3月发布的SIMA 1虽能在各类虚拟环境中执行基础指令,但完成复杂任务的成功率仅31%,远低于人类的71%。
“SIMA 2实现了能力层面的阶跃式提升,”DeepMind高级研究科学家乔·马里诺在发布会上表示,“它具备更强通用性,能在陌生环境中完成复杂任务,更重要的是拥有自我进化能力——能根据实际经验持续优化,这为开发通用机器人及AGI系统迈出关键一步。”
SIMA 2搭载Gemini 2.5 Flash-Lite模型。AGI(人工通用智能)被DeepMind定义为能执行广泛智力任务、学习新技能并实现跨领域知识迁移的系统。
DeepMind研究人员指出,开发“具身智能体”对实现通用智能至关重要。马里诺解释,具身智能体通过虚拟或实体载体与环境交互(如同机器人或人类般感知输入并采取行动),而非具身智能体仅能处理日历、笔记或执行代码。
拥有神经科学背景的DeepMind研究科学家简·王向TechCrunch强调,SIMA 2的意义远超游戏范畴:“我们要求它真正理解场景动态与用户指令,并以符合常识的方式响应——这实际上极具挑战性。”
(广告段落:Disrupt 2026技术大会预告信息,此处保留核心内容)
加入Disrupt 2026候补名单即可优先获取早鸟票。往届大会汇聚谷歌云、Netflix、微软等250多位行业领袖,呈现200多场赋能会议,更有数百家跨领域创新企业参展。
通过集成Gemini模型,SIMA 2的性能较前代提升一倍,成功融合了先进语言推理能力与具身交互技能。马里诺在《无人深空》游戏中演示了该智能体如何描述岩石星球地表环境,并通过识别与交互应对求救信标。SIMA 2还运用Gemini进行内部推理:当被要求前往“熟番茄颜色房屋”时,它能演绎“熟番茄即红色,故应寻找红屋”的思维过程并精准执行。
基于Gemini的赋能还使SIMA 2能理解表情符号指令。马里诺举例:“输入🪓🌲,它就会去砍树。”在DeepMind世界模型Genie生成的新摄影写实场景中,该智能体也能准确识别长椅、树木、蝴蝶等物体并实施交互。
马里诺补充说,Gemini还使系统无需大量人工数据即可自我优化。相比完全依赖人类游戏数据训练的初代,SIMA 2以其为基准构建初始模型后,通过另一个Gemini模型生成新任务,再由独立奖励模型评估执行效果。利用这些自主产生的经验作为训练数据,智能体能够像人类般在试错中学习,借助AI反馈而非人工指导逐步掌握新行为模式。
DeepMind将SIMA 2视为通向通用机器人的重要台阶。“要实现机器人在现实世界执行任务,系统需具备两大要素,”DeepMind高级研究工程师弗雷德里克·贝斯在发布会上分析,“其一是对现实世界的高层认知与任务理解能力,其二是推理规划能力。”以指令家用仿生机器人核查橱柜豆罐头存量为例,系统需理解所有相关概念(豆子、橱柜),并规划移动路径。贝斯指出SIMA 2更侧重于这类高层行为规划,而非关节控制等底层动作。
关于在实体机器人系统应用SIMA 2的具体时间表,团队未予透露。贝斯向TechCrunch说明,DeepMind最新发布的机器人基础模型(具备物理世界推理与多步骤任务规划能力)采用与SIMA不同的独立训练方案。
虽然SIMA 2完整版发布尚无时间表,但简·王表示,展示阶段性成果旨在向业界揭示技术潜力,探索更多合作与应用可能。
英文来源:
Google DeepMind shared on Thursday a research preview of SIMA 2, the next generation of its generalist AI agent that integrates the language and reasoning powers of Gemini, Google’s large language model, to move beyond simply following instructions to understanding and interacting with its environment.
Like many of DeepMind’s projects, including AlphaFold, the first version of SIMA was trained on hundreds of hours of video game data to learn how to play multiple 3D games like a human, even some games it wasn’t trained on. SIMA 1, unveiled in March 2024, could follow basic instructions across a wide range of virtual environments, but it only had a 31% success rate for completing complex tasks, compared to 71% for humans.
“SIMA 2 is a step change and improvement in capabilities over SIMA 1,” Joe Marino, senior research scientist at DeepMind, said in a press briefing. “It’s a more general agent. It can complete complex tasks in previously unseen environments. And it’s a self-improving agent. So it can actually self-improve based on its own experience, which is a step towards more general-purpose robots and AGI systems more generally.”
SIMA 2 is powered by the Gemini 2.5 flash-lite model, and AGI refers to artificial general intelligence, which DeepMind defines as a system capable of a wide range of intellectual tasks with the ability to learn new skills and generalize knowledge across different areas.
Working with so-called “embodied agents” is crucial to generalized intelligence, DeepMind’s researchers say. Marino explained that an embodied agent interacts with a physical or virtual world via a body – observing inputs and taking actions much like a robot or human would – whereas a non-embodied agent might interact with your calendar, take notes, or execute code.
Jane Wang, a research scientist at DeepMind with a background in neuroscience, told TechCrunch that SIMA 2 goes far beyond gameplay.
“We’re asking it to actually understand what’s happening, understand what the user is asking it to do, and then be able to respond in a common-sense way that’s actually quite difficult,” Wang said.
Join the Disrupt 2026 Waitlist
Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector.
Join the Disrupt 2026 Waitlist
Add yourself to the Disrupt 2026 waitlist to be first in line when Early Bird tickets drop. Past Disrupts have brought Google Cloud, Netflix, Microsoft, Box, Phia, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, and Vinod Khosla to the stages — part of 250+ industry leaders driving 200+ sessions built to fuel your growth and sharpen your edge. Plus, meet the hundreds of startups innovating across every sector.
By integrating Gemini, SIMA 2 doubled its predecessor’s performance, uniting Gemini’s advanced language and reasoning abilities with the embodied skills developed through training.
Marino demoed SIMA 2 in No Man’s Sky, where the agent described its surroundings – a rocky planet surface – and determined its next steps by recognizing and interacting with a distress beacon. SIMA 2 also uses Gemini to reason internally. In another game, when asked to walk to the house that’s the color of a ripe tomato, the agent showed its thinking – ripe tomatoes are red, therefore I should go to the red house – then found and approached it.
Being Gemini-powered also means SIMA 2 follows instructions based on emojis: “You instruct it 🪓🌲, and it’ll go chop down a tree,” Marino said.
Marino also demonstrated how SIMA 2 can navigate newly generated photorealistic worlds produced by Genie, DeepMind’s world model, correctly identifying and interacting with objects like benches, trees, and butterflies.
Gemini also enables self-improvement without much human data, Marino added. Where SIMA 1 was trained entirely on human gameplay, SIMA 2 uses it as a baseline to provide a strong initial model. When the team puts the agent into a new environment, it asks another Gemini model to create new tasks and a separate reward model to score the agent’s attempts. Using these self-generated experiences as training data, the agent learns from its own mistakes and gradually performs better, essentially teaching itself new behaviors through trial and error as a human would, guided by AI-based feedback instead of humans.
DeepMind sees SIMA 2 as a step toward unlocking more general-purpose robots.
“If we think of what a system needs to do to perform tasks in the real world, like a robot, I think there are two components of it,” Frederic Besse, senior staff research engineer at DeepMind, said during a press briefing. “First, there is a high-level understanding of the real world and what needs to be done, as well as some reasoning.”
If you ask a humanoid robot in your house to go check how many cans of beans you have in the cupboard, the system needs to understand all of the different concepts – what beans are, what a cupboard is – and navigate to that location. Besse says SIMA 2 touches more on that high-level behavior than it does on lower-level actions, which he refers to as controlling things like physical joints and wheels.
The team declined to share a specific timeline for implementing SIMA 2 in physical robotics systems. Besse told TechCrunch that DeepMind’s recently unveiled robotics foundation models – which can also reason about the physical world and create multi-step plans to complete a mission – were trained differently and separately from SIMA.
While there’s also no timeline for releasing more than a preview of SIMA 2, Wang told TechCrunch the goal is to show the world what DeepMind has been working on and see what kinds of collaborations and potential uses are possible.
文章标题:谷歌推出的SIMA 2智能体通过Gemini系统,能够在虚拟环境中进行推理并执行相应操作。
文章链接:https://www.qimuai.cn/?post=2046
本站文章均为原创,未经授权请勿用于任何商业用途