珊瑚神经处理单元：面向边缘AI的全栈平台

qimuai 发布于 2025-10-16 08:02 阅读：9 一手编译

内容来源：https://research.google/blog/coral-npu-a-full-stack-platform-for-edge-ai/

内容总结：

谷歌发布开源边缘AI平台Coral NPU 推动终端设备智能化变革

【北京时间2025年10月15日讯】谷歌研究院工程总监比利·拉特利奇今日正式发布Coral NPU——一套全栈式开源边缘人工智能平台。该平台致力于解决制约低功耗终端设备实现持续AI能力的性能瓶颈、生态碎片化及隐私安全三大核心挑战。

随着生成式AI重塑技术格局，行业焦点正从云端智能转向终端智能。要实现实时翻译、环境感知等真正个性化的AI助手功能，必须突破电池受限设备的算力限制。Coral NPU采用与谷歌深度思维联合研发的AI优先架构，通过颠覆传统芯片设计逻辑，将矩阵计算引擎置于标量计算之上，为可穿戴设备、AR眼镜等终端提供全天候AI支持。

该架构基于开放指令集RISC-V开发，包含标量核心、矢量执行单元及矩阵执行单元三大组件。基础设计可实现每秒5120亿次运算性能，功耗仅数毫瓦级别。其统一开发工具链支持TensorFlow、PyTorch等主流机器学习框架，显著降低开发门槛。

在隐私保护方面，该平台正集成CHERI等新兴安全技术，通过硬件级内存隔离构建可信数据沙箱。目前谷歌已与物联网计算巨头Synaptics达成战略合作，后者推出的Astra SL2610系列物联网处理器率先搭载了Coral NPU架构。

业界分析指出，该开放平台将有效解决边缘AI领域长期存在的生态碎片化问题，为消费级和工业级物联网设备提供标准化智能底座，加速全天候环境感知、实时音频处理、低功耗视觉识别等场景的创新应用落地。

中文翻译：

Coral NPU：面向终端人工智能的全栈平台
2025年10月15日
Billy Rutledge，谷歌研究院工程总监

我们正式推出Coral NPU——一个全栈开源平台，旨在解决制约低功耗终端设备和可穿戴设备实现强大、常时在线人工智能的核心性能、碎片化与隐私保护难题。

快速入口

生成式人工智能已从根本上重塑了我们对技术的期待。基于云端的大规模模型展现出惊人的创造、推理与辅助能力。然而下一次技术飞跃不仅关乎扩大云端模型规模，更要将智能直接嵌入我们的日常个人环境。若想让人工智能真正实现辅助功能——主动协助日程安排、实时翻译对话或理解周边环境——它必须能在我们随身携带的设备上运行。这带来了核心挑战：如何将环境感知AI嵌入电池受限的终端设备，使其摆脱云端依赖，实现真正私密、全天候的辅助体验。

要实现从云端到个人设备的跨越，必须解决三大关键问题：

性能鸿沟：复杂的尖端机器学习模型需要更强算力，远超终端设备有限的功耗、散热与内存预算
碎片化成本：为各类专用处理器编译优化机器学习模型难度大成本高，阻碍跨设备一致性体验
用户信任缺失：真正有用的个人AI必须优先保障个人数据与场景信息的隐私安全

今日发布的Coral NPU全栈平台，基于我们早前Coral项目的成果，为硬件设计师和ML开发者提供构建新一代私密高效终端AI设备所需的工具。该平台由谷歌研究院与Google DeepMind联合设计，采用AI优先的硬件架构，专为实现新一代超低功耗常时在线终端AI而打造。它提供统一的开发体验，简化环境感知等应用的部署流程，特别为可穿戴设备设计，在最小化电量消耗的同时支持全天候AI运行，还可配置用于更高性能场景。我们已发布完整文档与工具链，开发者现可立即投入构建。

Coral NPU：AI优先的架构设计

为低功耗终端设备开发的开发者面临根本性权衡：需在通用CPU与专用加速器间抉择。通用CPU具备关键灵活性与广泛软件支持，但缺乏针对高强度ML工作负载的领域专用架构，导致性能不足且能效低下。相反，专用加速器虽提供高效ML运算，却存在灵活性差、编程困难、不适配通用任务等局限。

高度碎片化的软件生态加剧了硬件困境。由于CPU与ML模块采用截然不同的编程模型，开发者常被迫使用专用编译器与复杂指令缓冲区，陡峭的学习曲线导致难以协同发挥不同计算单元的优势。因此，行业始终缺乏能轻松有效支持多种ML开发框架的成熟低功耗架构。

Coral NPU架构通过颠覆传统芯片设计直击此痛点：将ML矩阵引擎置于标量计算之上，从芯片层级开始为AI优化架构，打造专为高效设备端推理而生的平台。

作为完整的参考神经处理单元架构，Coral NPU为新一代高能效ML优化片上系统提供基础模块。该架构基于符合RISC-V指令集的标准IP模块，以最小功耗为目标设计，极适合常时环境感知场景。基础设计可实现512GOPS性能区间而仅耗数毫瓦，为终端设备、智能耳机、AR眼镜与智能手表带来强劲设备端AI能力。

基于RISC-V的开放可扩展架构赋予SoC设计师灵活调整基础设计，或直接将其作为预配置NPU使用。Coral NPU架构包含以下组件：

标量核心：轻量级可C编程的RISC-V前端，采用简易"运行至完成"模式管理后端核心数据流，兼顾超低功耗与传统CPU功能
矢量执行单元：符合RISC-V矢量指令集v1.0标准的稳健SIMD协处理器，支持大数据集并行运算
矩阵执行单元：高效量化外积乘积累加引擎，专为加速神经网络基础运算而设计（注：该单元尚在开发中，将于今年晚些时候在GitHub发布）

统一开发体验

Coral NPU架构作为简易的C编程目标平台，可无缝集成IREE与TFLM等现代编译器，轻松支持TensorFlow、JAX和PyTorch等ML框架。

平台集成完整软件工具链，包含TensorFlow专用TFLM编译器、通用MLIR编译器、C编译器、定制内核及模拟器。开发者拥有灵活路径选择：例如基于JAX的模型可先通过StableHLO方言导入MLIR格式，再由IREE编译器加载识别Coral NPU架构的硬件插件，执行渐进式降级——这是代码通过多级方言系统化转译趋近机器语言的关键优化步骤。优化后工具链将生成适用于终端设备高效运行的紧凑二进制文件。这套行业标准开发工具集可简化ML模型编程，确保跨硬件目标的一致性体验。

Coral NPU的协同设计聚焦两大领域：首先高效加速当前设备端视觉与音频应用的主流编码器架构；其次我们正与Gemma团队紧密合作优化小规模Transformer模型，确保加速器架构支持下一代终端生成式AI。

这种双重聚焦使Coral NPU有望成为首个面向可穿戴设备LLM应用的开放标准低功耗NPU。对开发者而言，这提供了以最小功耗实现最大性能部署现有及未来模型的标准化路径。

目标应用场景

Coral NPU专为实现超低功耗常时在线终端AI应用设计，重点关注环境感知系统。其主要目标是在可穿戴设备、手机及物联网设备上实现全天候AI体验的同时最小化电量消耗。

典型应用场景包括：

情境感知：检测用户活动（行走、跑步）、 proximity或环境（室内/户外、移动状态）以启动"勿扰"等情景感知功能
音频处理：语音检测、关键词识别、实时翻译、转录及基于音频的无障碍功能
图像处理：人物与物体检测、面部识别、手势识别及低功耗视觉搜索
用户交互：通过手势、音频提示或其他传感器驱动输入实现控制

硬件级隐私保护

Coral NPU的核心原则是通过硬件级安全构建用户信任。我们正架构设计中支持CHERI等新兴技术，提供细粒度内存安全与可扩展软件隔离。通过该方案，敏感AI模型与个人数据可隔离于硬件强化的沙箱，有效防御基于内存的攻击。

构建生态系统

开源硬件项目依赖坚实合作伙伴方能成功。我们正与首任战略芯片合作伙伴Synaptics——嵌入式计算、无线连接及物联网多模态传感领域的领导者携手推进。今日在其技术大会上，Synaptics发布了搭载Torq™ NPU子系统的Astra™ SL2610系列AI原生物联网处理器，这也是业界首款量产级Coral NPU架构实现方案。该NPU设计支持Transformer架构与动态算子，助力开发者构建面向消费级与工业物联网的未来终端AI系统。

此项合作印证我们对统一开发体验的承诺。Synaptics Torq™终端AI平台基于IREE与MLIR的开源编译器及运行时构建。本次协作是建立智能情境感知设备开放标准的重要里程碑。

破解终端核心困局

通过Coral NPU，我们正在为个人AI未来构建基础层。目标是通过提供通用、开源、安全的行业基础平台培育蓬勃生态，使开发者与芯片供应商突破当前碎片化格局，基于共享边缘计算标准加速创新。立即了解Coral NPU详情，开启构建之旅。

致谢

我们向核心贡献者与领导团队致谢，特别感谢：Billy Rutledge、Ben Laurie、Derek Chow、Michael Hoang、Naveen Dodda、Murali Vijayaraghavan、Gregory Kielian、Matthew Wilson、Bill Luan、Divya Pandya、Preeti Singh、Akib Uddin、Stefan Hall、Alex Van Damme、David Gao、Lun Dong、Julian Mullings-Black、Roman Lewkow、Shaked Flur、Yenkai Wang、Reid Tatge、Tim Harvey、Tor Jeremiassen、Isha Mishra、Kai Yick、Cindy Liu、Bangfei Pan、Ian Field、Srikanth Muroor、Jay Yagnik、Avinatan Hassidim与Yossi Matias。

英文来源：

Coral NPU: A full-stack platform for Edge AI
October 15, 2025
Billy Rutledge, Engineering Director, Google Research
Introducing Coral NPU, a full-stack, open-source platform designed to address the core performance, fragmentation, and privacy challenges that limit powerful, always-on AI with low-power edge devices and wearables.
Quick links
Generative AI has fundamentally reshaped our expectations of technology. We've seen the power of large-scale cloud-based models to create, reason and assist in incredible ways. However, the next great technological leap isn't just about making cloud models bigger; it's about embedding their intelligence directly into our immediate, personal environment. For AI to be truly assistive — proactively helping us navigate our day, translating conversations in real-time, or understanding our physical context — it must run on the devices we wear and carry. This presents a core challenge: embedding ambient AI onto battery-constrained edge devices, freeing them from the cloud to enable truly private, all-day assistive experiences.
To move from the cloud to personal devices, we must solve three critical problems:

The performance gap: Complex, state-of-the-art machine learning (ML) models demand more compute, far exceeding the limited power, thermal, and memory budgets of an edge device.
The fragmentation tax: Compiling and optimizing ML models for a diverse landscape of proprietary processors is difficult and costly, hindering consistent performance across devices.
The user trust deficit: To be truly helpful, personal AI must prioritize the privacy and security of personal data and context.
Today we introduce Coral NPU, a full-stack platform that builds on our original work from Coral to provide hardware designers and ML developers with the tools needed to build the next generation of private, efficient edge AI devices. Co-designed in partnership with Google Research and Google DeepMind, Coral NPU is an AI-first hardware architecture built to enable the next generation of ultra-low-power, always-on edge AI. It offers a unified developer experience, making it easier to deploy applications like ambient sensing. It's specifically designed to enable all-day AI on wearable devices while minimizing battery usage and being configurable for higher performance use cases. We’ve released our documentation and tools so that developers and designers can start building today.
Coral NPU: An AI-first architecture
Developers building for low-power edge devices face a fundamental trade-off, choosing between general purpose CPUs and specialized accelerators. General-purpose CPUs offer crucial flexibility and broad software support but lack the domain-specific architecture for demanding ML workloads, making them less performant and power-inefficient. Conversely, specialized accelerators provide high ML efficiency but are inflexible, difficult to program, and ill-suited for general tasks.
This hardware problem is magnified by a highly fragmented software ecosystem. With starkly different programming models for CPUs and ML blocks, developers are often forced to use proprietary compilers and complex command buffers. This creates a steep learning curve and makes it difficult to combine the unique strengths of different compute units. Consequently, the industry lacks a mature, low-power architecture that can easily and effectively support multiple ML development frameworks.
The Coral NPU architecture directly addresses this by reversing traditional chip design. It prioritizes the ML matrix engine over scalar compute, optimizing architecture for AI from silicon up and creating a platform purpose-built for more efficient, on-device inference.
As a complete, reference neural processing unit (NPU) architecture, Coral NPU provides the building blocks for the next generation of energy-efficient, ML-optimized systems on chip (SoCs). The architecture is based on a set of RISC-V ISA compliant architectural IP blocks and is designed for minimal power consumption, making it ideal for always-on ambient sensing. The base design delivers performance in the 512 giga operations per second (GOPS) range while consuming just a few milliwatts, thus enabling powerful on-device AI for edge devices, hearables, AR glasses, and smartwatches.
The open and extensible architecture based on RISC-V gives SoC designers flexibility to modify the base design, or use it as a pre-configured NPU. The Coral NPU architecture includes the following components:
A scalar core: A lightweight, C-programmable RISC-V frontend that manages data flow to the back-end cores, using a simple "run-to-completion" model for ultra-low power consumption and traditional CPU functions.
A vector execution unit: A robust single instruction multiple data (SIMD) co-processor compliant with the RISC-V Vector instruction set (RVV) v1.0, enabling simultaneous operations on large data sets.
A matrix execution unit: A highly efficient quantized outer product multiply-accumulate (MAC) engine purpose-built to accelerate fundamental neural network operations. Note that the matrix execution unit is still under development and will be released on GitHub later this year.
Unified developer experience
The Coral NPU architecture is a simple, C-programmable target that can seamlessly integrate with modern compilers like IREE and TFLM. This enables easy support for ML frameworks like TensorFlow, JAX, and PyTorch.
Coral NPU incorporates a comprehensive software toolchain, including specialized solutions like the TFLM compiler for TensorFlow, alongside a general-purpose MLIR compiler, C compiler, custom kernels, and a simulator. This provides developers with flexible pathways. For example, a model from a framework like JAX is first imported into the MLIR format using the StableHLO dialect. This intermediate file is then fed into the IREE compiler, which applies a hardware-specific plug-in to recognize the Coral NPU's architecture. From there, the compiler performs progressive lowering — a critical optimization step where the code is systematically translated through a series of dialects, moving closer to the machine's native language. After optimization, the toolchain generates a final, compact binary file ready for efficient execution on the edge device. This suite of industry-standard developer tools helps simplify the programming of ML models and can allow for a consistent experience across various hardware targets.
Coral NPU’s co-design process focuses on two key areas. First, the architecture efficiently accelerates the leading encoder-based architectures used in today's on-device vision and audio applications. Second, we are collaborating closely with the Gemma team to optimize Coral NPU for small transformer models, helping to ensure the accelerator architecture supports the next generation of generative AI at the edge.
This dual focus means Coral NPU is on track to be the first open, standards-based, low-power NPU designed to bring LLMs to wearables. For developers, this provides a single, validated path to deploy both current and future models with maximum performance at minimal power.
Target applications
Coral NPU is designed to enable ultra-low-power, always-on edge AI applications, particularly focused on ambient sensing systems. Its primary goal is to enable all day AI-experiences on wearables, mobile phones and Internet of Things (IoT) devices minimizing battery usage.
Potential use cases include:
Contextual awareness: Detecting user activity (e.g., walking, running), proximity, or environment (e.g., indoors/outdoors, on-the-go) to enable "do-not-disturb" modes or other context-aware features.
Audio processing: Voice and speech detection, keyword spotting, live translation, transcription, and audio-based accessibility features.
Image processing: Person and object detection, facial recognition, gesture recognition, and low-power visual search.
User interaction: Enabling control via hand gestures, audio cues, or other sensor-driven inputs.
Hardware-enforced privacy
A core principle of Coral NPU is building user trust through hardware-enforced security. Our architecture is being designed to support emerging technologies like CHERI, which provides fine-grained memory-level safety and scalable software compartmentalization. With this approach, we hope to enable sensitive AI models and personal data to be isolated in a hardware-enforced sandbox, mitigating memory-based attacks.
Building an ecosystem
Open hardware projects rely on strong partnerships to succeed. To that end, we’re collaborating with Synaptics, our first strategic silicon partner and a leader in embedded compute, wireless connectivity, and multimodal sensing for the IoT. Today, at their Tech Day, Synaptics announced their new Astra™ SL2610 line of AI-Native IoT Processors. This product line features their Torq™ NPU subsystem, the industry’s first production implementation of the Coral NPU architecture. The NPU’s design is transformer-capable and supports dynamic operators, enabling developers to build future-ready Edge AI systems for consumer and industrial IoT.
This partnership supports our commitment to a unified developer experience. The Synaptics Torq™ Edge AI platform is built on an open-source compiler and runtime based on IREE and MLIR. This collaboration is a significant step toward building a shared, open standard for intelligent, context-aware devices.
Solving core crises of the Edge
With Coral NPU, we are building a foundational layer for the future of personal AI. Our goal is to foster a vibrant ecosystem by providing a common, open-source, and secure platform for the industry to build upon. This empowers developers and silicon vendors to move beyond today's fragmented landscape and collaborate on a shared standard for edge computing, enabling faster innovation. Learn more about Coral NPU and start building today.
Acknowledgements
We would like to thank the core contributors and leadership team for this work, particularly Billy Rutledge, Ben Laurie, Derek Chow, Michael Hoang, Naveen Dodda, Murali Vijayaraghavan, Gregory Kielian, Matthew Wilson, Bill Luan, Divya Pandya, Preeti Singh, Akib Uddin, Stefan Hall, Alex Van Damme, David Gao, Lun Dong, Julian Mullings-Black, Roman Lewkow, Shaked Flur, Yenkai Wang, Reid Tatge, Tim Harvey, Tor Jeremiassen, Isha Mishra, Kai Yick, Cindy Liu, Bangfei Pan, Ian Field, Srikanth Muroor, Jay Yagnik, Avinatan Hassidim, and Yossi Matias.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读