量子妙招助力精简臃肿的AI模型。

内容来源:https://www.sciencenews.org/article/quantum-tensor-network-ai-model-relief
内容总结:
量子物理技术为AI模型“瘦身”:张量网络有望破解大模型能耗困局
当前,以ChatGPT为代表的大型语言模型正引发全球性的AI算力与能耗挑战。为给这些“庞然大物”减负,科学家正从量子物理领域寻找灵感,一种名为“张量网络”的数学工具成为突破关键。
张量网络最初由物理学家于上世纪90年代提出,用于描述复杂量子粒子系统的相互作用。如今,研究人员发现,这一工具能高效压缩AI模型,在保持性能的同时大幅降低其存储需求与能耗。
西班牙多诺斯蒂亚国际物理中心的物理学家罗曼·奥鲁斯与其创立的Multiverse Computing公司,已成功将张量网络应用于大模型压缩。其开发的CompactifAI技术,在应用于Llama 2 7B模型时,能将存储需求降低超过90%,参数减少70%,而精度损失仅几个百分点。第三方报告显示,经压缩的Llama 3.1 8B模型能耗可降低30%至40%。
与传统压缩技术(如剪枝、量化)依赖试错不同,张量网络基于严密的物理与数学原理,通过识别并消除数据中的冗余关联实现压缩,性能更有保障。研究显示,压缩后的小型模型甚至可能因滤除原始数据中的噪声而表现更优。例如,经张量网络压缩的GPT-2模型,已能在树莓派等微型设备上流畅运行。
更前沿的探索旨在彻底革新AI基础架构。纽约Flatiron研究所的物理学家迈尔斯·斯托登迈尔等学者,正尝试完全绕过传统神经网络,从零构建基于张量网络的AI模型。这类模型有望从根本上规避神经网络训练耗时长、黑箱难以理解的缺陷。芝加哥大学应用数学家游耀皓团队演示,用张量网络方法训练模型仅需数秒,而同等神经网络模型则需数分钟。
此外,张量网络还为提高AI可解释性开辟了新路径。其数学结构清晰,有助于揭示模型决策的内在逻辑,这对于医疗、能源等高可靠性要求的应用场景至关重要。
尽管目前神经网络在多数任务上仍占优势,但张量网络所展现的高效、透明与低能耗特性,已为AI发展提供了重要的替代范式。随着研究的深入,这一源于量子物理的工具,或将成为推动AI走向更可持续、更可信未来的关键力量。
中文翻译:
量子妙招助力"瘦身"臃肿AI模型
张量网络同时驾驭量子粒子与机器学习的双重复杂性
材料内部电子熙攘,相互碰撞跃动。要量化单个粒子在这团混沌中如何扰动其他粒子,其复杂程度促使物理学家自20世纪90年代起,专门构建了一套名为"张量网络"的深奥数学体系来描述这种现象。约十年后,当量子物理学家罗曼·奥鲁斯开始研究张量网络时,他并未设想将其应用于看似毫不相关的人工智能领域。
但随着类似ChatGPT背后那种耗能惊人的巨型语言模型兴起,"我们意识到张量网络能解决某些瓶颈问题",西班牙圣塞巴斯蒂安多诺斯蒂亚国际物理中心的奥鲁斯指出。张量网络能将臃肿的AI模型压缩至更易处理的尺寸,在保持精度的同时降低能耗、提升效率。这正是奥鲁斯在联合创立的初创企业Multiverse Computing致力实现的目标。这一前景极具吸引力:当前AI能耗惊人,科技公司甚至开始筹划建设新一代小型核电站。某些地区电价上涨的背后,可能已隐现AI数据中心耗电需求的推手。
小型化模型还有望嵌入手机或家电等个人设备。将AI直接部署于终端设备而非依赖云端运行,意味着用户无需联网即可使用AI功能。
压缩AI模型尚有其他技术路径,但张量网络支持者强调,该技术植根于物理学与数学的原理,能更可靠地保证压缩后模型的性能不逊于甚至超越原版大型模型。"每次尝试都像灌篮得分般精准高效",纽约市Flatiron研究所物理学家、张量网络倡导者迈尔斯·斯托登迈尔如此评价。
但斯托登迈尔希望进一步拓展张量网络的潜力。
当前主流AI模型大多基于受人类神经元启发的人工神经网络框架。奥鲁斯团队致力于将现有模型重构为张量网络,而斯托登迈尔等人则试图完全绕过神经网络,从零开始构建基于张量网络的AI模型。神经网络虽是强大灵活的工具,但其训练过程消耗海量能源与算力,且产生的AI模型内部工作机制如同黑箱。以张量网络为基石构建AI,则有望实现更快速高效的训练与更易理解的运作机制。
"让张量自由呼吸",斯托登迈尔说,"我要将它们从神经网络中解放,让它们展现本真力量……因为它们蕴藏着巨大潜能。"
张量网络的构建之道
张量网络是物理学家应对"维度灾难"这一棘手概念的解决方案。该概念指当数据复杂度增加、变量增多时,数据规模会呈指数级爆炸,导致计算机无法存储。
张量网络的基本构件是称为"张量"的数学对象。使用过电子表格的人或许能理解张量的强大之处:表格本质上是二维数字阵列的矩阵,而张量将这一概念拓展至多维空间。
假设要描述10个人对10种披萨配料的评分:贾里德给辣香肠打10分,凯特打3分……这需要填写10×10的表格。若想同时记录不同酱料类型(白酱、番茄酱、青酱),就需要三维张量:一个数字记录凯特对红酱辣香肠披萨的评分,另一个数字记录贾里德对白酱蘑菇披萨的评分。
处理少量变量(人物、配料、酱料)时,这类张量尚可管理。但若进行大规模披萨调研,调查10万人对100种配料和100种酱料的偏好,就会产生包含1亿个数字的张量——这仍可存储在计算机中。然而当变量持续增加(在人物、披萨、酱料基础上加入饼底、奶酪等诸多选项),张量规模将急速膨胀。
斯托登迈尔指出,当张量变量超过数十个时,"存储所需内存将超过计算史上生产的所有内存总量"。这就是维度灾难。对常需处理海量数据的计算机科学家而言,这是令人头疼的难题;对量子物理学家来说,描述多粒子复杂相互作用时同样面临此困境。
张量网络应运而生。物理学家在20世纪90年代至21世纪初运用该技术,通过将庞大张量分解为更易处理的小型张量来表征数据。这些小型张量通过"收缩"运算相互连接——该操作能将两个张量合并为一个。
斯托登迈尔比喻道:这就像将一根巨型香肠(单人难以烹煮更无法独食)扭结成适合烧烤的独立热狗。张量网络擅长表征相关性,即不同测量结果间的关联。例如在披萨调研中,喜欢白蘑菇的人往往也喜欢褐蘑菇——这两种回答具有相关性。张量网络能高效表征存在关联性的数据。
在AI领域,决定聊天机器人如何处理用户指令的数十亿参数间也存在相关性。数据相关性往往意味着冗余。通过消除冗余,张量网络能在保持模型效能的前提下实现压缩。
AI与量子物理的协同性也源于相关性。量子粒子通过纠缠效应成为相关性的典范,这种效应使两个看似独立的粒子命运相连。"我们在AI模型中再次验证了物理学中的认知:相关性至关重要",奥鲁斯强调,"万物皆关乎关联。"
羊驼模型"瘦身记"
基于神经网络的模型通常已包含简单张量(如类表格矩阵)。矩阵或其他张量存储着参数,通过受神经元启发的模型节点指导数据处理。深度学习模型含有多层节点及存储参数的对应张量。但张量网络比独立张量更能高效表征数据。
奥鲁斯的初创企业已将基于张量网络的AI模型压缩技术CompactifAI商业化。应用于大型语言模型Llama 2 7B时,该技术将模型存储所需内存从约27GB降至约2GB,缩减超90%;参数数量从70亿减至约20亿,精度损失仅几个百分点。该成果于去年4月在比利时布鲁日举行的欧洲人工神经网络、计算智能与机器学习研讨会上发表。
欧洲咨询公司Sopra Steria报告显示,Multiverse对另一模型Llama 3.1 8B的压缩版本,能耗比原版降低30%至40%(具体取决于生成长度)。其他压缩技术(如剪枝法、量化法)虽也能提升能效,但伦敦帝国理工学院机器智能研究员丹尼洛·曼迪奇指出,这些方法依赖试错,"无法保证性能维持或提升"。
专为挖掘数据隐藏结构而设计的张量网络,则能在压缩模型时确保性能。曼迪奇表示,压缩模型甚至可能超越原版精度,因为大型模型使用包含大量冗余与无关信息的网络数据训练,而张量网络压缩会过滤这些杂质。其团队2023年在arXiv发表论文证实,经张量网络压缩的OpenAI GPT-2模型性能与完整版相当甚至更优,且能在树莓派(常用于计算机教育的信用卡尺寸廉价电脑)上运行。
Multiverse持续开发更小模型,2025年8月发布的两款模型以大脑神经结构简单的动物命名:SuperFly与ChickenBrain,面向冰箱、洗衣机等个人设备与家电推广。例如,困惑的少年可直接询问洗衣机该选择何种洗涤模式。
释放张量潜能
压缩AI模型首先需要原始模型,而创建训练原始模型本身就是耗能巨大的过程。从初始阶段就使用张量网络也能降低该阶段的能耗需求。
训练类似ChatGPT的AI神经网络需要漫长的优化过程:调整参数、检验性能以寻找最佳参数值。该步骤通常依赖19世纪提出的梯度下降法。斯托登迈尔比喻这就像在房中踱步,指望嗅到食物气味来确定餐盘位置:"虽不愚笨,但相当基础。"
当前最成功的AI模型仍以深度神经网络为基础,但部分研究者正开发互补性替代方案。基于张量网络的模型将完全摒弃神经网络,同时消除那种"在房中摸索剩菜"式的优化过程。"我们根本不想使用优化",芝加哥大学应用数学家邱岳昊表示,"这是张量网络相对于深度学习架构的主要卖点——可能完全绕过优化步骤。"
为避免优化需求,张量网络可采用"分治策略":冻结部分网络,针对每个解决方案调整其余部分。相关技术还包含"缩放聚焦法"辅助求解:例如不是盲目搜寻餐盘,而是先锁定楼层,同时检测整层楼空气是否存在食物气味,再逐层逐室精细搜索。
通过这些方式,张量网络能确定参数值而非食物位置。相关技术使模型训练可在数秒内完成。邱岳昊课题组应用数学家杨思瑶曾在学术报告中实时演示训练基于张量网络的模型,仅耗时4秒;而基于神经网络的类似模型需约6分钟,耗时增加近百倍。
但分治策略也带来局限:该方法要求充分理解问题结构才能有效分割。例如搜寻餐盘时,需预先掌握房屋楼层与房间布局。这使得该技术最适合处理具有已知结构的问题(如遵循物理定律的体系)。去年8月《物理评论材料》论文显示,基于张量网络的AI能评估铜、氩、锡等材料特性的复杂方程。
该技术在机器人领域同样有效。去年7月arXiv论文中,瑞士马蒂尼Idiap研究所团队利用张量网络教导两只机械臂协同操纵箱子。
揭开AI黑箱
张量网络还能增强AI的可解释性。参数浩繁的深度学习模型因黑箱特性饱受诟病,难以追溯模型决策依据。"我们对深度学习的实际运行机制缺乏理论认知",加州大学圣地亚哥分校计算机科学家罗斯·俞坦言。
这种不可知性阻碍了神经网络在失误会导致灾难的场景中的应用。"如果不理解工作原理,就不可能用神经网络管理核电站",曼迪奇指出。
俞运用张量网络方法分析气候数据、篮球运动员球场不同位置投篮成功率等信息。她认为,数学原理清晰的张量网络能产生更易理解的结果。"张量作为理论认知透彻的工具……或可为研究深度网络行为、理解深度学习科学原理提供新型平台。"
尽管科技公司持续发布基于更多数据训练的更大更复杂模型,"当前AI趋势似乎认为扩大规模是解决一切问题的终极答案",俞评论道。但仅靠扩大规模提升性能的时代可能渐近尾声。张量网络提供了值得探索的替代范式。"我们能否从张量网络获得新见解,引领AI发展新浪潮?"俞设问道。
神经网络在多数任务上仍优于张量网络。但邱岳昊认为,这可能部分源于过去十年对神经网络的过度聚焦与对张量网络的相对忽视。加大张量网络研究投入或许能带来更大回报。"经过充分调试,我确信张量网络能够胜出。"
英文来源:
A quantum trick helps trim bloated AI models
Tensor networks grapple with the complexities of both quantum particles and machine learning
A hunk of material bustles with electrons, one tickling another as they bop around. Quantifying how one particle jostles others in that scrum is so complicated that, beginning in the 1990s, physicists developed an esoteric mathematical structure called a tensor network just to describe it. A decade or so later, when quantum physicist Román Orús began studying tensor networks, he didn’t envision applying them to the seemingly unrelated concepts of artificial intelligence.
But with the advent of enormous, energy-hogging large language models like those behind ChatGPT, “we realized that by using tensor networks we could address some of the bottlenecks,” says Orús, of Donostia International Physics Center in San Sebastián, Spain. Tensor networks can help squish bloated AI models down to a more manageable size, cutting energy use and improving efficiency without sacrificing accuracy. That’s Orús’ aim in his work at Multiverse Computing, a startup he cofounded. It’s an appealing prospect: AI currently gobbles so much energy that tech companies are hatching plans for a future generation of small nuclear power plants. And the need to power AI data centers may already be helping to drive up electricity costs in some areas.
Smaller models also boast the potential to be crammed onto personal devices like cell phones or household appliances. The ability to put AI on the devices themselves — rather than running it through the cloud — means users wouldn’t need an internet connection to use the AI.
There are other ways to compress AI models. But tensor network proponents argue that the technique’s basis in physics and math can provide more of a guarantee that the compressed model will perform as well as — or even better than — its big sibling. “It seems like kind of a slam dunk every time people try it,” says physicist and tensor network enthusiast Miles Stoudenmire of the Flatiron Institute in New York City.
But Stoudenmire wants to push tensor networks even further.
Most popular AI models are based on a framework called an artificial neural network that is inspired by the neurons of the human brain. Whereas Orús and colleagues are recasting those existing models as tensor networks, Stoudenmire and others aim to make AI models that bypass neural networks entirely, basing them on tensor networks from the get-go. Neural networks are powerful and flexible tools. But training them demands lots of energy and computer time. And they produce AI models with inner workings that are difficult to comprehend. Starting from a tensor network foundation, instead, could make AI faster and easier to train and understand.
“Let the tensors breathe,” Stoudenmire says. “I want to free them from the neural network and let them do their own thing … because I think they have a lot of latent power to offer.”
How the tensor network sausage is made
Tensor networks are physicists’ answer to a hair-raising concept called the “curse of dimensionality.” It’s the idea that, as data become more complex and involve many variables, they dramatically explode in size, making computer storage impossible.
The building blocks of tensor networks are mathematical objects known as tensors. If you’ve ever used a spreadsheet, you may understand how powerful tensors can be. A spreadsheet is, effectively, a matrix, an array of numbers in two dimensions. Tensors generalize this idea to multiple dimensions.
Say you want to describe 10 people and their rankings for 10 possible pizza toppings. Jared gives pepperoni a 10, and Kate gives it a 3, and so on. You’d fill out a 10 by 10 spreadsheet.
But what if you wanted to describe not just toppings, but different sauce types, too: white sauce, marinara, pesto. What you’d need is an order-3 tensor. One number gives Kate’s ranking for a pizza with red sauce and pepperoni, another for Jared’s ranking of pizza with white sauce and mushrooms.
When dealing with a small number of variables — people, pizza toppings, sauces — such tensors are manageable. If you did a massive pizza survey, polling 100,000 people with 100 choices for toppings and 100 sauces, that would result in a tensor with 1 billion numbers, easily storable on a computer. But once you start dealing with many variables — if on top of people, pizza and sauce you add crust, cheese and many other options — the size of a tensor quickly balloons.
Once a tensor has more than a few tens of variables, “it would take … as much memory as has ever been produced in the history of computing to store,” Stoudenmire says. That’s the curse of dimensionality. For computer scientists — who tend to huck around huge clods of data — it’s a vexing problem. And for quantum physicists, the curse rears its head when describing many particles interacting with one another in complex ways.
Enter tensor networks. Harnessed by physicists in the 1990s and 2000s, they represent one colossal tensor by breaking it up into smaller, more manageable tensors. Those smaller tensors are linked by contractions, operations that combine two tensors into one.
Stoudenmire compares it to taking a giant sausage — too much for one person to cook, let alone eat — and twisting it in places to make perfectly portioned hot dogs, sized for the grill.
Here’s how that sausage is made. Tensor networks are adept at representing correlations, connections between the results of different measurements. In a pizza survey, for example, people who like white mushrooms on their pizza probably also like cremini mushrooms — the two survey responses are correlated. Tensor networks are an efficient way of representing data that have correlations.
In AI, there are correlations between the billions of numbers called parameters that determine, for example, how a chatbot processes users’ prompts. And correlations in data can signal redundancy. By eliminating that redundancy, tensor networks can compress a model without weakening its power.
The synergy between AI and quantum physics also comes down to correlations. Quantum particles are paragons of correlation through the effect of quantum entanglement, which links the fates of two seemingly distinct particles.
“We are just finding here, in AI models, what we learned in physics, that correlations matter, period,” Orús says. “Everything is about correlations.”
A llama gets littler
Models based on neural networks typically already contain simple tensors, such as those spreadsheet-like matrices. Matrices or other tensors hold the parameters that tell the model how to process data via nodes, individual components of the model that are inspired by neurons. In a deep learning model, there are multiple layers of nodes, with associated tensors that contain parameters. But tensor networks have the power to represent data more efficiently than those individual tensors on their own.
Orús’ startup has commercialized a tensor network–based compression technique for AI models, called CompactifAI. When applied to the large language model Llama 2 7B, CompactifAI reduces the memory required to store the model by more than 90 percent, going from about 27 gigabytes to about 2 gigabytes. It shrinks the number of parameters by 70 percent, taking it from 7 billion parameters to about 2 billion with an accuracy drop of just a few percent, Orús and colleagues reported in a paper presented last April at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning in Bruges, Belgium.
Total energy consumption of a compressed and uncompressed AI model
Multiverse’s compressed version of the large language model Llama 3.1 8B produced responses to 104 questions using less energy than the full-sized model. The energy saved was more dramatic for longer responses (bottom) than shorter responses (top).
Multiverse’s compressed version of the large language model Llama 3.1 8B produced responses to 104 questions using less energy than the full-sized model. The energy saved was more dramatic for longer responses (right) than shorter responses (left).
A report by the European consulting firm Sopra Steria found that Multiverse’s compressed version of a different model, Llama 3.1 8B, used about 30 to 40 percent less energy than the original version, depending on the length of the response.
Other methods for shrinking AI models can also improve energy efficiency. A technique called pruning removes the least important parameters or nodes from the model, and a method called quantization reduces the precision of the parameters, for example by going from decimal numbers to integers. But machine intelligence researcher Danilo Mandic of Imperial College London says those techniques are reliant on trial and error. “There is no guarantee of good or improved performance.”
Tensor networks, designed to tease out the hidden structure in data, allow the model to be compressed and still perform well. Compressed models can even surpass the big ones in accuracy, Mandic says. That’s because big models are trained on large swaths of data from the internet, containing plenty of redundancies and irrelevances that get filtered out by the tensor network compression.
A tensor network–compressed version of OpenAI’s GPT-2 large language model performed similarly to or even better than full-size GPT-2, Mandic and colleagues reported in a paper published in 2023 at arXiv.org. And the mini model ran on a Raspberry Pi — a cheap, credit card–size computer often used for computer science education.
Multiverse has continued developing smaller models. Two released in August 2025 are named after animals that Multiverse says have similarly simple neural architecture in their brains. The company is marketing the models — SuperFly and ChickenBrain — for personal devices and appliances such as refrigerators and washing machines. For example, a clueless teenager could ask a washing machine which type of cycle to run.
Letting tensors breathe
To compress an AI model, you have to have one to start with. Creating and training that original model is itself an energy-sapping saga. Using tensor networks from the beginning could ease energy needs in that stage, too.
Training the neural network in an AI model like that of ChatGPT requires a lengthy process of optimization, tweaking parameters and checking the resulting performance, in order to find the best values for the parameters. This step typically relies on a process called gradient descent, originally devised in the 19th century. Stoudenmire likens it to looking for a plate of food in your house by wandering around hoping you can catch a whiff. “It’s not stupid, but it’s pretty basic.”
Right now, deep neural networks are the basis for the most successful AI models out there. But some researchers are working to create an alternative that could complement that technology. Models based on tensor networks would eliminate the neural network entirely. And it could eliminate that process of optimization, that groping about the house searching for your forgotten leftovers. “We don’t want to use optimization at all,” says applied mathematician Yuehaw Khoo of the University of Chicago. “This is the main selling point of using tensor networks over deep learning architecture, the possibility of completely bypassing the use of optimization.”
To avoid the need for optimization, tensor network methods can use a “divide and conquer” strategy. Parts of the tensor network are frozen while others are adjusted to each solution.
A related tensor network technique involves zooming in and out to help find a solution. For example, imagine that rather than roaming around searching for a plate of food, you could isolate individual floors. Maybe you sample the air of the entire first floor of the house all at once for the scent of food. If present, you zoom in, searching each room, then the different surfaces in the room.
In these ways, tensor networks can settle, not on the location of the food, but on values of parameters in the tensor network. The techniques mean models can be trained within seconds. In a scientific flex, Siyao Yang, an applied mathematician in Khoo’s group, demoed training a tensor network–based model in the middle of a scientific talk. It took four seconds. A similar model based on neural networks took about six minutes, nearly 100 times as long.
But the divide and conquer strategy also means that tensor networks have a limitation. They work best if the structure of the problem is well understood, in order to know how to divvy the problem up. For example, when searching for that plate of food, perhaps you know the layout of the house with its floors and rooms.
That makes the technique work best on problems that have some known structure, like those that are described by laws of physics. For example, an AI based on tensor networks can evaluate complex equations related to the properties of materials such as copper, argon and tin, researchers reported last August in Physical Review Materials.
AI based on tensor networks is useful in robotics too. In a paper published last July at arXiv.org, researchers at Idiap Research Institute in Martigny, Switzerland, used tensor networks to teach two robotic arms to manipulate a box.
Explaining the inner workings of AI
Tensor networks can also make for more understandable AI. Deep learning models, with their myriad parameters, are infamous for being black boxes, with little possibility to extract the reason behind a model’s response.
“There’s very little theoretical understanding about what’s actually happening with deep learning,” says computer scientist Rose Yu of the University of California, San Diego.
That obscurity holds neural networks back from tasks where a slipup would be disastrous. “You cannot employ a neural net to run your nuclear power plant if you don’t understand how it works,” Mandic says.
Yu has used tensor network methods to analyze information such as climate data and the shooting success of basketball players from different places on the court. The mathematically well understood tensor networks, she argues, lend themselves to results that are easier to grasp.
“Tensors, because they’re tools that are very well understood from a theoretical
perspective … may offer a new type of platform to study the behavior of deep networks, to understand the science behind deep learning,” Yu says.
Meanwhile, tech companies continue to release bigger, more complex models, trained on more data. “The current trend in AI seems to be [that] the ultimate answer to everything is just scaling,” Yu says.
But the era of improving performance simply by going bigger may be petering out. Tensor networks provide an alternative paradigm to explore. “Can we derive new insights from tensor networks that can help guide a new wave of development for AI?” Yu asks.
Neural networks still outperform tensor networks on most tasks. But perhaps, Khoo says, that’s partly due to the intense focus on neural networks over the past decade, and the relative neglect of tensor networks.
Putting more effort into tensor network research could mean we eventually get more out of them, Khoo says. “With enough tuning, I’m pretty sure tensor networks can win.”