泰坦与MIRAS:助力人工智能拥有长期记忆

内容来源:https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/
内容总结:
谷歌研究团队发布Titans架构与MIRAS框架,为AI模型赋予高效“长期记忆”
谷歌研究院的研究团队近日公布了一项突破性进展,提出名为“Titans”的新型神经网络架构及其理论基础“MIRAS”框架。该技术旨在解决当前大模型在处理超长文本、基因组数据等海量上下文信息时面临的内存与效率瓶颈,推动AI模型向具备实时学习与长期记忆能力的方向演进。
传统Transformer模型因其注意力机制在处理长序列时计算成本激增而受限。尽管近期如Mamba-2等线性循环神经网络(RNN)在效率上有所提升,但其固定大小的记忆压缩方式难以充分保留长序列中的丰富信息。Titans与MIRAS的创新之处在于,将RNN的高效性与Transformer的准确性相结合,使模型能在运行过程中动态更新其核心记忆,无需中断进行离线重训练。
核心机制:动态学习与“意外”捕获
Titans架构的核心是一个作为深度神经网络(多层感知机)的长期记忆模块,其表达能力远超传统RNN的固定向量记忆。该模块不仅能存储信息,更能主动学习并理解输入内容之间的深层关系和主题。
尤为关键的是其引入的“意外度量”机制。该机制模仿人类记忆特点——对常规信息容易遗忘,而对意外或重要事件记忆深刻。模型通过计算当前记忆状态与新输入信息之间的差异(梯度)来判断信息的重要性:
- 低意外值:若新输入符合模型预期(如在谈论动物时出现“猫”),则无需深度记忆。
- 高意外值:若输入与当前上下文严重不符(如在财务报告中突然出现“香蕉皮”图片),则将该信息判定为重要或异常,优先存入长期记忆。
此外,Titans通过动量机制(结合当前与近期上下文)和自适应权重衰减(充当“遗忘门”以管理记忆容量)来优化记忆的连贯性与效率。
MIRAS框架:统一视角与超越传统
MIRAS框架为序列建模提供了一个统一的理论蓝图,将各种主流架构(包括Transformer和新型RNN)均视为具有不同设计选择的“联想记忆模块”。该框架通过定义记忆架构、注意力偏向、保留门和记忆算法这四个关键设计维度,为构建新模型提供了系统化指导。
更重要的是,MIRAS引领研究超越了传统模型依赖的均方误差(MSE)范式。基于该框架,团队开发了三种各具特色的非注意力模型:
- YAAD:采用Huber损失函数,对数据中的异常值(如个别拼写错误)不敏感,增强了模型在嘈杂数据中的鲁棒性。
- MONETA:探索使用更严格的数学规范,旨在构建更强大、稳定的长期记忆系统。
- MEMORA:通过约束记忆状态使其类似于严格的概率图,确保信息整合过程的稳定与可控。
实验验证:性能卓越,尤其擅长超长上下文
在多项测试中,Titans及基于MIRAS的变体模型均展现出显著优势:
- 在标准语言建模(C4、WikiText)和常识推理任务(HellaSwag、PIQA)中,其准确率优于当前先进的线性循环模型(如Mamba-2)及同等规模的Transformer++模型。
- 在专为超长文档推理设计的BABILong基准测试中,Titans的表现超越了包括参数量大得多的GPT-4在内的所有基线模型,并成功将有效上下文窗口扩展至超过200万tokens。
- 消融研究证实,记忆模块的“深度”是提升模型性能的关键因素,更深层的记忆网络能获得更低的困惑度,并具备更优的扩展性。
行业意义
Titans架构与MIRAS框架的提出,标志着序列建模领域迈出了重要一步。它们通过使AI模型在运行中实时学习与记忆,突破了固定大小记忆状态的限制。这不仅为处理基因组学、长文档分析、长时间序列预测等需要超长上下文的场景提供了高效解决方案,其统一的理论视角也为未来开发兼具高效率与强大表达力的新一代AI模型开辟了新的道路。
中文翻译:
泰坦与MIRAS:助力AI拥有长期记忆
2025年12月4日
谷歌研究院:学生研究员阿里·贝赫鲁兹、研究员梅萨姆·拉扎维扬,副总裁兼谷歌院士瓦哈布·米罗克尼
我们推出泰坦架构与MIRAS框架,使AI模型能够在运行中动态更新核心记忆,从而大幅提升处理速度并驾驭海量上下文信息。
快速导读
Transformer架构凭借注意力机制革新了序列建模,使模型能够回溯先前输入以聚焦关键信息。然而,计算成本随序列长度急剧增加,限制了基于Transformer的模型扩展到极长上下文(如全文档理解或基因组分析所需场景)的能力。
研究界已探索多种解决方案,例如高效的线性循环神经网络和Mamba-2等状态空间模型。这些模型通过将上下文压缩至固定尺寸实现快速线性扩展,但固定压缩难以充分捕捉超长序列中的丰富信息。
在《泰坦》与《MIRAS》两篇新论文中,我们提出了一种融合RNN速度与Transformer精度的架构及理论蓝图。泰坦是具体架构(工具),而MIRAS是推广此类方法的理论框架(蓝图)。二者共同推进了“运行时记忆”概念——使AI模型能够在运行中通过更强大的“意外度”指标(即未预料的信息片段)维持长期记忆,无需专门的离线重训练。
以泰坦为例,MIRAS框架实现了向实时适应的关键转变:该架构并非将信息压缩为静态状态,而是在数据流输入时主动学习并更新自身参数。这一核心机制使模型能即时将新的具体细节融入核心知识。
泰坦:动态学习新上下文
高效的学习系统需具备独立且互联的记忆模块,仿效人脑的短期与长期记忆分离机制。
注意力机制擅长精确短期记忆,而泰坦引入了新颖的神经长期记忆模块。与传统RNN中固定尺寸的向量或矩阵记忆不同,该模块作为深度神经网络(具体为多层感知机),提供显著更高的表达能力,使模型能概括海量信息且不丢失关键上下文。模型并非简单记录,而是在理解并整合完整叙事。
关键的是,泰坦不仅被动存储数据,更主动学习识别并保留连接整个输入中符号的重要关系与概念主题。此能力的核心在于“意外度指标”。人类心理学表明,我们易遗忘常规预期事件,却会记住打破模式的事件——意外、惊奇或高情感冲击的事件。
在泰坦中,“意外度指标”指模型检测当前记忆与新输入信息间的巨大差异:
- 低意外度:若新词为“猫”而模型记忆状态已预期动物词汇,梯度(意外度)较低。模型可安全跳过将“猫”一词存入永久长期记忆。
- 高意外度:若模型记忆正在概括严肃财务报告,而新输入为香蕉皮图片(意外事件),梯度(意外度)将极高。这标志着新输入重要或异常,需优先存入长期记忆模块。
模型将内部误差信号(梯度)作为数学等价表述:“这出乎意料且重要!”泰坦架构由此仅选择最具新颖性、打破上下文模式的信息更新长期记忆,保持整体流程快速高效。
泰坦通过两项关键要素优化此机制:
- 动量:模型同时考虑“瞬时意外”(当前输入)与“历史意外”(近期上下文流),确保捕获相关后续信息,即使这些符号单独不具意外性。
- 遗忘(权重衰减):为管理极长序列下的有限记忆容量,泰坦采用自适应权重衰减机制作为遗忘门,使模型能丢弃不再需要的信息。
MIRAS:序列建模的统一视角
从现代Transformer到新型高速线性RNN,序列建模的每次重大突破本质上都是同一内核:高度复杂的关联记忆模块。
因此,MIRAS的独特与实用之处在于其看待AI建模的方式:它不视各种架构为孤立方法,而是将其视为解决同一问题的不同途径——高效融合新信息与旧记忆,同时避免遗忘核心概念。
MIRAS通过四项关键设计定义序列模型:
- 记忆架构:存储信息的结构(如向量、矩阵或泰坦中的深度多层感知机)。
- 注意力偏置:模型优化的内部学习目标,决定其优先级。
- 保留门:记忆正则化器。MIRAS将“遗忘机制”重新解读为平衡新学习与旧知识保留的正则化特定形式。
- 记忆算法:用于更新记忆的优化算法。
超越均方误差范式
现有几乎所有成功序列模型均依赖均方误差或点积相似性作为偏置与保留机制。这种依赖可能导致模型对异常值敏感并限制其表达能力。
MIRAS通过提供生成式框架突破此限制,基于优化与统计学文献探索更丰富的设计空间,从而创建具有非欧几里得目标及正则化的新型架构。
利用MIRAS,我们构建了三类无需注意力的具体模型:
- YAAD:此MIRAS变体设计为对重大错误或“异常值”(如长文档中的单个拼写错误)更不敏感。它使用更温和的数学惩罚(胡贝尔损失)处理错误,避免对孤立问题过度反应,提升模型在输入数据杂乱或不一致时的鲁棒性。
- MONETA:该模型探索使用更复杂严格的数学惩罚(广义范数),研究在对模型关注与遗忘内容施加更严格规则后,能否构建更强大稳定的长期记忆系统。
- MEMORA:该模型通过强制记忆表现为严格概率图,专注于实现最佳记忆稳定性。此约束确保每次记忆状态更新时,变化受控且平衡,从而保证信息整合过程清晰稳定。
实验与结果
我们将泰坦及MIRAS变体与Transformer++、Mamba-2、门控DeltaNet等领先架构进行严格对比,并通过基因组建模与时间序列预测验证了泰坦在文本之外领域的泛化能力。
在标准语言建模数据集与零样本推理任务中,我们的模型始终展现更高准确率与更低困惑度。
深度记忆的力量
消融研究清晰表明记忆架构的深度至关重要:相同尺寸但不同深度的长期记忆模块中,更深层模块在语言建模中持续获得更低困惑度,且随序列长度显著增加仍保持性能,呈现更优的扩展性。
语言建模与效率
在语言建模与常识推理任务中,泰坦架构优于同类规模的先进线性循环模型及Transformer++基线。新型MIRAS变体相较基线也实现性能提升,验证了探索鲁棒非MSE优化机制的价值。重要的是,这些模型保持了高效可并行训练与快速线性推理速度。
超长上下文召回能力
新架构最显著优势是处理超长上下文的能力。在需要从极长文档中分散事实进行推理的BABILong基准测试中,泰坦以少得多的参数量超越所有基线模型(包括GPT-4等超大规模模型),并展现出处理超过200万标记上下文窗口的有效扩展能力。
结论
泰坦架构与MIRAS框架的提出标志着序列建模的重大进展。通过采用深度神经网络作为随数据输入动态学习的记忆模块,这些方法克服了固定尺寸循环状态的局限。MIRAS进一步提供了强大的理论统一框架,揭示了在线优化、关联记忆与架构设计间的联系。通过超越标准欧几里得范式,本研究为新一代序列模型打开大门,将RNN的效率与长上下文AI时代所需的表达能力融为一体。
英文来源:
Titans + MIRAS: Helping AI have long-term memory
December 4, 2025
Ali Behrouz, Student Researcher, Meisam Razaviyayn, Staff Researcher, and Vahab Mirrokni, VP and Google Fellow, Google Research
We introduce the Titans architecture and the MIRAS framework, which allow AI models to work much faster and handle massive contexts by updating their core memory while it's actively running.
Quick links
The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back at earlier inputs to prioritize relevant input data. However, computational cost increases drastically with sequence length, which limits the ability to scale Transformer-based models to extremely long contexts, such as those required for full-document understanding or genomic analysis.
The research community explored various approaches for solutions, such as efficient linear recurrent neural networks (RNNs) and state space models (SSMs) like Mamba-2. These models offer fast, linear scaling by compressing context into a fixed-size. However, this fixed-size compression cannot adequately capture the rich information in very long sequences.
In two new papers, Titans and MIRAS, we introduce an architecture and theoretical blueprint that combine the speed of RNNs with the accuracy of transformers. Titans is the specific architecture (the tool), and MIRAS is the theoretical framework (the blueprint) for generalizing these approaches. Together, they advance the concept of test-time memorization, the ability of an AI model to maintain long-term memory by incorporating more powerful “surprise” metrics (i.e., unexpected pieces of information) while the model is running and without dedicated offline retraining.
The MIRAS framework, as demonstrated by Titans, introduces a meaningful shift toward real-time adaptation. Instead of compressing information into a static state, this architecture actively learns and updates its own parameters as data streams in. This crucial mechanism enables the model to incorporate new, specific details into its core knowledge instantly.
Titans: Learning new context on the fly
An effective learning system requires distinct yet interconnected memory modules, mirroring the human brain's separation of short-term and long-term memory.
While attention mechanisms excel for precise, short-term memory, Titans introduces a novel neural long-term memory module, that, unlike the fixed-size vector or matrix memory in traditional RNNs, acts as a deep neural network (specifically, a multi-layer perceptron). This memory module provides significantly higher expressive power, allowing the model to summarize large volumes of information without losing important context. The model isn't simply taking notes; it's understanding and synthesizing the entire story.
Crucially, Titans doesn’t just passively store data. It actively learns how to recognize and retain important relationships and conceptual themes that connect tokens across the entire input. A key aspect of this ability is what we call the “surprise metric”. In human psychology, we know we quickly and easily forget routine, expected events but remember things that break the pattern — unexpected, surprising, or highly emotional events.
In the context of Titans, the "surprise metric" is the model detecting a large difference between what it currently remembers and what the new input is telling it.
- Low surprise: If the new word is "cat" and the model's memory state already expects an animal word, the gradient (surprise) is low. It can safely skip memorizing the word "cat" in its permanent long-term state.
- High surprise: If the model's memory state is summarizing a serious financial report, and the new input is a picture of a banana peel (the unexpected event), the gradient (surprise) will be very high. This signals that the new input is important or anomalous, and it must be prioritized for permanent storage in the long-term memory module.
The model uses this internal error signal (the gradient) as a mathematical equivalent of saying, "This is unexpected and important!" This allows the Titans architecture to selectively update its long-term memory only with the most novel and context-breaking information, keeping the overall process fast and efficient.
Titans refines this mechanism by incorporating two critical elements: - Momentum: The model considers both "momentary surprise" (the current input) and "past surprise" (the recent context flow). This ensures relevant subsequent information is also captured, even if those tokens are not individually surprising.
- Forgetting (weight decay): To manage the finite capacity of the memory when dealing with extremely long sequences, Titans employ an adaptive weight decay mechanism. This acts as a forgetting gate, allowing the model to discard information that is no longer needed.
MIRAS: A unified view of sequence modeling
Every major breakthrough in sequence modeling — from modern transformers to the new, lightning-fast linear RNNs — is essentially the same thing under the hood: a highly complex associative memory module.
Accordingly, what makes MIRAS both unique and practical is the way it views AI modeling. Instead of seeing diverse architectures, it sees different methods of solving the same problem: efficiently combining new information with old memories without letting the essential concepts be forgotten.
MIRAS defines a sequence model through four key design choices: - Memory architecture: The structure that stores information (e.g., a vector, matrix, or a deep multi-layer perceptron, like in Titans).
- Attentional bias: The internal learning objective the model optimizes that determines what it prioritizes.
- Retention gate: The memory regularizer. MIRAS reinterprets "forgetting mechanisms" as specific forms of regularization that balance new learning against retaining past knowledge.
- Memory algorithm: The optimization algorithm used to update the memory.
Transcending the mean squared error paradigm
Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.
MIRAS transcends this limitation by providing a generative framework to explore a more rich design space informed by the literature in optimization and statistics. This allows for the creation of novel architectures with non-Euclidean objectives and regularization.
Using MIRAS, we created three specific attention-free models: - YAAD: We designed this MIRAS variant to be less sensitive to major errors or "outliers" (like a single typo in a large document). It uses a gentler math penalty (Huber loss) for mistakes, so it doesn't overreact to one-off issues. This makes the model more robust when the input data is messy or inconsistent.
- MONETA: This model explores the use of more complex and strict mathematical penalties (called generalized norms). It investigates whether using these more disciplined rules for both what the model attends to and what it forgets can lead to a more powerful and stable long-term memory system overall.
- MEMORA: This model focuses on achieving the best possible memory stability by forcing its memory to act like a strict probability map. By using this constraint, it ensures that every time the memory state is updated, the changes are controlled and balanced. This guarantees a clean, stable process for integrating new information.Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.
Experiments and results
We rigorously compared Titans along with MIRAS variants (YAAD, MONETA, MEMORA) against leading architectures, including Transformer++, Mamba-2, and Gated DeltaNet. We further validated versatility by testing Titans on genomic modeling (DNA) and time-series forecasting, proving the architecture generalizes effectively beyond text.
Across both standard language modeling datasets (C4, WikiText) and zero-shot reasoning tasks (HellaSwag, PIQA), our models consistently demonstrated higher accuracy and perplexity (a measure of how surprised an LLM is when looking at a piece of text).
The power of deep memory
Ablation studies clearly show that the depth of the memory architecture is crucial. When comparing long-term memory modules of the same size but different depths, modules with deeper memories consistently achieve lower perplexity in language modeling. Furthermore, they exhibit better scaling properties, maintaining performance as the sequence length increases significantly.
Language modeling and efficiency
In language modeling and commonsense reasoning tasks, Titans architectures outperform state-of-the-art linear recurrent models (such as Mamba-2 and Gated DeltaNet) and Transformer++ baselines of comparable sizes. The novel MIRAS variants (MONETA, YAAD, MEMORA) also achieve improved performance compared to these baselines, validating the benefit of exploring robust, non-MSE optimization mechanisms. Importantly, these models maintain efficient, parallelizable training and fast linear inference speeds.
Extreme long-context recall
The most significant advantage of these new architectures is their ability to handle extremely long contexts. This is highlighted in the BABILong benchmark, a task requiring reasoning across facts distributed in extremely long documents. In this challenging setting, Titans outperforms all baselines, including extremely large models like GPT-4, despite having many fewer parameters. Titans further demonstrates the capability to scale effectively to context window sizes larger than 2 million tokens.
Conclusion
The introduction of Titans and the MIRAS framework marks a significant advancement in sequence modeling. By employing deep neural networks as memory modules that learn to memorize as data is coming in, these approaches overcome the limitations of fixed-size recurrent states. Furthermore, MIRAS provides a powerful theoretical unification, revealing the connection between online optimization, associative memory, and architectural design. By moving beyond the standard Euclidean paradigm, this research opens the door to a new generation of sequence models that combine the efficiency of RNNs with the expressive power needed for the era of long-context AI.