人工智能代理如何重新定义通用设计以提升无障碍性

qimuai 发布于 2026-2-6 08:01 阅读：2 一手编译

内容来源：https://research.google/blog/how-ai-agents-can-redefine-universal-design-to-increase-accessibility/

内容总结：

谷歌发布“原生自适应界面”框架，AI智能体或将重塑无障碍设计新范式

2026年2月5日，谷歌研究院工程副总裁玛丽安·克罗克与首席AI无障碍项目经理萨姆·塞帕共同发表文章，阐述了团队在AI赋能无障碍技术领域的最新突破。文章指出，通过引入多模态AI智能体，人机交互界面正从“静态通用”向“动态自适应”演进，有望为全球超过13亿残障人士带来更包容的数字体验。

核心理念：与社区共创，拒绝“无我们参与的设计”

谷歌团队强调，其技术研发始终遵循“没有我们的参与，就不要做关于我们的决定”这一原则，将残障群体作为共同设计者纳入产品开发全周期。这一做法不仅确保了解决方案能切实回应真实需求，也为残障社区创造了经济赋能与就业机会。目前，谷歌已与罗切斯特理工学院国家聋人技术研究所、美国ARC等多家组织合作，推动开发解决实际痛点的自适应AI工具。

技术框架：从“补丁式”适配到“原生式”自适应

传统数字产品往往在功能发布后才追加无障碍适配层，存在“无障碍鸿沟”。为此，谷歌提出了“原生自适应界面”框架，其核心是利用多模态AI智能体系统，让界面本身具备实时感知、理解和响应用户个性化需求的能力。

智能协调中枢：在网页阅读原型中，一个中央“协调器”扮演战略阅读管理器角色，可自主理解文档内容，并调度专家子智能体执行具体任务，用户无需在复杂菜单中手动寻找功能。
多模态情景感知：基于Gemini模型处理语音、视觉和文本的多模态能力，谷歌开发了可将实时视频转化为交互式音频描述的原型。用户能主动询问环境细节，将被动接收信息转为对话式探索，显著降低认知负荷。

已验证应用原型：AI成为“实时协作者”

目前，多个应用原型已展示出该框架的潜力：

StreetReaderAI：为视障用户服务的虚拟向导。系统通过AI描述器持续分析环境视觉与地理数据，结合AI对话功能回答具体问题。例如，用户经过地标后询问“刚才的公交站在哪？”，系统能基于历史视觉帧精确定位：“公交站在您身后约12米处。”
多模态智能体视频播放器：将传统静态音频描述升级为交互式对话。用户可实时调整描述细节，或随时暂停提问，如“角色穿着什么？”。系统通过预生成视觉描述密集索引与检索增强生成技术，实现快速精准响应。
语法实验室：由罗切斯特理工学院国家聋人技术研究所开发的AI双语学习平台，提供美式手语视频讲解、英文字幕、语音解说和文字稿等多种形式语法指导。AI根据用户互动个性化定制学习内容与路径，适应不同语言偏好与优势。

“路缘石效应”：普惠技术惠及更广泛人群

文章指出，基于该框架开发的功能常产生显著的“路缘石效应”——为特定需求设计的技术最终惠及大众。例如，为视障用户开发的语音界面同样帮助明眼用户实现多任务处理；为学习障碍者设计的信息整合工具能协助职场人士快速处理信息；为听障学生打造的AI导师也能为所有学生提供个性化学习路径。

展望：人机协作开启无障碍“黄金时代”

谷歌团队认为，我们正进入AI助力无障碍领域的“黄金时代”。多模态AI的自适应能力使得界面能实时响应人类能力的多样性。这不仅关乎工具的使用，更是与社区共同创造的过程。通过携手残障社群共建技术，有望开启一个“善行循环”，不断拓展技术可能性的边界，让每个人都能在数字世界中自如参与。

（本文基于谷歌研究院2026年2月5日发布的技术展望文章整理）

中文翻译：

人工智能体如何重塑通用设计以提升可及性
2026年2月5日
玛丽安·克罗克（工程副总裁）与萨姆·塞帕（首席人工智能可及性项目经理），谷歌研究院

谷歌研究院的"原生自适应界面"通过与可及性社区共同开发，嵌入多模态人工智能工具以适应用户的独特需求，重新定义了通用设计。

在谷歌，我们坚信技术应为所有人服务，而可及性正是这一理念的核心。我们的团队始终与各类社区协作，在开发初期便将可及性融入产品设计中，与残障人士共同打造适合他们的产品。如今，生成式人工智能让我们有机会使工具更具个性化和适应性。

全球残障人士占总人口的16%。借助生成式人工智能的适应能力，我们有机会通过"与我们同行，不替我们做主"的技术开发理念，为全球13亿人提供更优质的服务。我们相信，技术应像使用者一样独特。我们正致力于创造一个所有界面都能随个人偏好动态调整、与人和谐协作的世界。

在此，我们自豪地介绍"原生自适应界面"——一个通过多模态人工智能工具构建更高可及性应用的框架。NAI推动用户界面设计超越"一刀切"模式，转向基于情境的智能决策。它以动态、智能体驱动的模块取代静态导航，将数字架构从被动工具转化为主动协作伙伴。

通过严格的原型验证，我们已为通用设计开辟出一条新兴路径。我们的目标是打造对残障人士更具内在可及性的数字环境。

社区投入：与我们同行，不替我们做主
基于"与我们同行，不替我们做主"这一长期倡导原则，我们将社区主导的协同设计持续融入开发周期。

通过邀请残障社区成员作为协同设计者参与早期开发，我们确保其生活经验与专业知识成为解决方案的核心。在谷歌慈善基金会的支持下，罗切斯特理工学院国家聋人技术研究所、美国Arc协会、英国RNID听力组织、Team Gleason渐冻症基金会等机构，正在开发能解决实际痛点的自适应人工智能工具。这些组织认识到，真正精通人类多元沟通方式的人工智能工具具有变革性潜力。

此外，这种协同设计模式促进了残障社区的经济赋能与就业机会，确保技术受益者同时成为技术成功的共享者。

研究方向：为可及性而设计
早期研究发现，"可及性鸿沟"是数字公平的主要障碍——即新功能发布与其辅助支持层开发之间的时间差。为弥合这一鸿沟，我们正从被动工具转向与界面原生的智能体系统。

研究支柱：运用多系统智能体提升可及性
多模态人工智能工具是构建可及性界面的重要路径。在网页可读性等原型测试中，我们验证了以"中央协调器"作为战略阅读管理者的模型：用户无需在复杂菜单中摸索，协调器通过理解文档内容并将任务分配给专业子智能体来提升可及性。

摘要智能体：解析复杂文档，将关键任务分配给专家子智能体，使深层信息清晰可及。
设置智能体：动态处理界面调整（如文字缩放）。
测试表明，这种模块化设计让用户能更直观地操作系统，专业任务始终由对应智能体处理，用户无需费力寻找"正确"按钮。

迈向多模态流畅交互
我们的研究还致力于超越基础文本转语音，实现多模态流畅交互。借助Gemini模型同步处理语音、视觉和文本的能力，我们开发了将实时视频转化为即时交互式音频描述的原型。这不仅是对场景的描述，更是情境感知的突破。在协同设计测试中，我们观察到用户通过实时询问环境细节，能降低认知负荷，将被动接收转化为主动对话式探索。

已验证原型：人机交互的"顶点"时刻
通过严谨的原型测试，我们验证了该架构在解决复杂交互挑战方面的潜力。在这些"顶点"时刻，多模态人工智能工具能精准解读并响应用户的细微需求：

StreetReaderAI（街景阅读助手）：为盲人与低视力用户设计的虚拟向导，通过视觉分析AI与问答AI的协同，解决物理空间导航难题。系统保持上下文记忆，用户可随时回溯提问（如"刚才的公交站在哪？"），智能体将基于历史视觉帧提供精确指引。
多模态智能体视频播放器：将传统静态音频描述升级为交互式对话。基于Gemini模型构建，用户可实时调整描述细节或暂停提问（如"角色穿着什么？"）。系统采用两阶段流程：离线生成视觉描述"密集索引"，再通过检索增强生成技术实现播放中的快速精准响应。
语法实验室：罗切斯特理工学院国家聋人技术研究所在谷歌慈善基金会支持下，开发了融合美国手语与英语的双语AI学习平台。该平台通过多模态可及形式（手语视频、英文字幕、语音解说、文字稿）提供语法指导，自适应AI工具会根据用户交互生成个性化学习内容，确保按最佳语言偏好获取知识。该成果近期由BBC商业制片团队拍摄成片展示。

路缘坡效应
采用NAI框架的应用常产生显著的"路缘坡效应"——为极端限制条件设计的功能往往惠及更广泛群体。正如人行道斜坡最初为轮椅使用者设计，却便利了婴儿车家长与行李箱旅客，基于NAI框架的AI工具也为大众创造了更优体验：

语音界面：为盲人设计的语音交互，同样帮助明眼用户实现多任务处理。
信息合成工具：为学习障碍者开发的工具，助力忙碌专业人士快速解析信息。
个性化学习：为听障用户打造的AI导师，能为所有学生定制学习路径。

结语：可及性的黄金时代
我们正进入人工智能赋能可及性的"黄金时代"。借助多模态AI的适应能力，我们有望构建能实时响应人类能力多样性的用户界面。

这个时代不仅关乎设备使用，更关乎与技术使用社区的深度协作。通过与残障社区共同创造技术，我们将点燃善意的循环，在开拓可能性的过程中不断扩展技术的疆界。

致谢
谷歌慈善基金会的慷慨支持使这项工作成为可能，其对我们愿景的承诺具有变革性意义。我们荣幸地与谷歌研究院AI团队、全民产品团队、BBC商业制片团队、罗切斯特理工学院国家聋人技术研究所、美国Arc协会、英国RNID听力组织及Team Gleason渐冻症基金会的专业团队并肩合作。

英文来源：

How AI agents can redefine universal design to increase accessibility
February 5, 2026
Marian Croak, VP Engineering, and Sam Sepah, Lead AI Accessibility PgM, Google Research
Google Research's Natively Adaptive Interfaces (NAI) redefine universal design by embedding multimodal AI tools that adapt to the user's unique needs, co-developed with the accessibility community.
At Google, we believe in building for everyone and accessibility (A11y) is a key part of that. Our teams work with communities to build products with and for people with disabilities, incorporating accessibility from the beginning of the development process. Today, generative AI provides us with the opportunity to make our tools even more personal and adaptive.
People with disabilities make up 16% of the world’s population. With the adaptive capabilities of generative AI, we have an opportunity to better serve 1.3 billion people globally by adopting a "Nothing About Us Without Us" approach to our tech development. We believe technology should be as unique as the person using it. We’re creating a world where every interface shapes itself to your preferences, working in harmony with you, exactly as you are.
In this blog, we are proud to introduce Natively Adaptive Interfaces (NAI), a framework for creating more accessible applications through multimodal AI tools. With NAI, UI design can move beyond one-size-fits-all towards context-informed decisions. NAI replaces static navigation with dynamic, agent-driven modules, transforming digital architecture from a passive tool into an active collaborator.
Following rigorous prototyping to validate this framework, we have an emerging path toward universal design. Our goal is to create environments that are more inherently accessible to people with disabilities.
Community investments: Nothing About Us, Without Us
Building on the long-standing advocacy principle of "Nothing About Us, Without Us," we continue to integrate community-led co-design into our own development lifecycles.
By working with individuals from disability communities and engaging them as co-designers from the start, we can ensure their lived experiences and expertise are at the heart of the solutions being built. With support from Google.org, organizations like the Rochester Institute of Technology’s National Technical Institute for the Deaf (RIT/NTID), The Arc of the United States, RNID, and Team Gleason are building adaptive AI tools that solve real-world friction points for their communities. These organizations recognize the transformative potential for impact of AI tools that are natively fluent in the diverse ways humanity communicates.
Furthermore, this co-design approach drives economic empowerment and fosters employment opportunities within the disability community, ensuring that the people informing the technology are also rewarded for its success.
Our research direction: Designing for accessibility
In our early research, we found that a significant barrier to digital equity is the "accessibility gap", i.e., the delay between the release of a new feature and the creation of an assistive layer for it. To close this gap, we are shifting from reactive tools to agentic systems that are native to the interface.
Research pillar: Using multi-system agents to improve accessibility
Multimodal AI tools provide one of the most promising paths to building accessible interfaces. In specific prototypes, such as our work with web readability, we’ve tested a model where a central Orchestrator acts as a strategic reading manager.
Instead of a user navigating a complex maze of menus, the Orchestrator maintains shared context — understanding the document and making it more accessible by delegating the tasks to expert sub-agents.

The Summarization Agent: It masters complex documents by breaking down information and delegating key tasks to expert sub-agents, making even the deepest insights clear and accessible.
The Settings agent: Handles UI adjustments, such as scaling text, dynamically.
By testing this modular approach,our research shows users can interact with systems more intuitively, ensuring that specialized tasks are always handled by the right expert without the user needing to hunt for the "correct" button.
Toward multimodal fluency
Our research also focuses on moving beyond basic text-to-speech toward multimodal fluency. By leveraging Gemini’s ability to process voice, vision, and text simultaneously, we’ve built prototypes that can turn live video into immediate, interactive audio descriptions.
This isn't just about describing a scene; it’s about situational awareness. In our co-design sessions, we’ve observed how allowing users to interactively query their environment — asking for specific visual details as they happen — can reduce cognitive load and transform a passive experience into an active, conversational exploration.
Proven prototypes: The "vertex" of human interaction
We validated this architecture through rigorous prototyping, aiming to solve complex interaction challenges with opportunities for improvement. In these "vertex" moments, our research showed that multimodal AI tools could accurately interpret and respond to the nuanced, specific needs of users.
StreetReaderAI: A virtual guide for blind and low-vision (BLV) users, navigating physical spaces can be a significant barrier to social participation. StreetReaderAI addresses this by employing two interactive AI subsystems: an AI Describer that constantly analyzes visual and geographic data, and an AI Chat that answers specific questions. Because the system maintains context, a user can walk past a landmark and later ask, "Wait, where was that bus stop?" The agent recalls the previous visual frame and provides precise guidance: "The bus stop is behind you, approximately 12 meters away."
Multimodal Agent Video Player (MAVP): Passive listening standard Audio Descriptions (AD) provide a narrated track of visual elements, but they are often static. The MAVP prototype transforms video into an interactive, user-led dialogue. Built with Gemini models, MAVP allows users to verbally adjust descriptive detail in real-time or pause to ask questions like, "What is the character wearing?" The system uses a two-stage pipeline: it first generates a "dense index" of visual descriptions offline, then uses retrieval-augmented generation (RAG) to provide fast, high-accuracy responses during playback.
Grammar Laboratory: RIT/NTID, with support from Google.org, is building Grammar Laboratory, a bilingual (American Sign Language and English) AI-powered learning platform that provides tutoring and feedback on students’ English writing. It offers grammar instruction through multiple accessible formats, including: video explanations of English grammar rules delivered in ASL, captions in written English, spoken English narration, and written transcripts. Students interface with an adaptive AI tool that creates bespoke content and customizes their learning experience based on their interaction, ensuring that users can engage with the content in the format that best suits their language preferences and strengths. To highlight this impact, Grammar Laboratory was recently highlighted in a film produced for us by BBC StoryWorks Commercial Productions.
The curb-cut effect
Applications utilizing the NAI framework often experience a strong "curb-cut effect" — the phenomenon wherein features designed for extreme constraints benefit a much broader group. Just as sidewalk ramps were originally designed for wheelchair users but improved life for parents with strollers and travelers with luggage, AI tools built with the NAI framework create superior experiences for many. For example:
Universal utility: Voice interfaces built for blind users can be incredibly useful for sighted users who are multitasking.
Synthesis tools: Tools designed to support those with learning disabilities can help busy professionals parse information more quickly.
Personalized learning: AI-powered tutors built for deaf and hard of hearing users can create custom learning journeys for all students.
Conclusion: The golden age of access
We are entering a "golden age" of what is possible with AI for accessibility. With the adaptive power of multimodal AI, we have the opportunity to build user interfaces that adjust in real-time to the vast variety of human ability.
This era is about more than just using a device; it is about working directly with the communities who use these technologies. By building technology with and for the disability community, we can ignite a cycle of helpfulness that expands the horizon of what is possible by creating it.
Acknowledgements
Our work is made possible through the generous support of Google.org, whose commitment to our vision has been transformative. We are honored to work alongside dedicated teams from Google Research AI, Product For All (P4A), BBCWorks, Rochester Institute of Technology’s National Technical Institute for the Deaf (RIT/NTID), The Arc of the United States, RNID, and Team Gleason.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读