人工智能作为科研伙伴：AlphaEvolve推动理论计算机科学新发展

qimuai 发布于 2025-10-1 08:01 阅读：23 一手编译

内容来源：https://research.google/blog/ai-as-a-research-partner-advancing-theoretical-computer-science-with-alphaevolve/

内容总结：

近日，谷歌DeepMind研究团队在理论计算机科学领域取得突破性进展。该团队通过其研发的人工智能编码系统AlphaEvolve，成功发现了可提升计算复杂性理论中关键问题证明效果的新型组合结构，相关成果已形成学术论文《组合结构的强化生成：在复杂性理论中的应用》。

这项研究的创新之处在于，研究人员让人工智能系统在既定数学框架内自动优化证明要素。AlphaEvolve系统通过"演化-评估-优化"的循环机制，从初始代码片段群体中逐步筛选出能生成更优数学结构的代码，最终实现了两项重要突破：一是将MAX-4-CUT问题的不可近似性系数从0.9883提升至0.987；二是在随机图平均情况硬度证明中，成功构建了包含163个节点的拉马努金图，将计算复杂性的上下界差距缩小到千分位量级。

研究团队特别强调，该工作的核心优势在于保证了数学证明的绝对正确性。与传统大语言模型直接生成证明易出现逻辑谬误不同，AlphaEvolve仅负责生成可验证的数学结构，其输出结果通过优化后的验证算法实现了万倍提速，但最终仍采用原始暴力算法进行复核确认。这种"有限结构发现+通用框架提升"的研究范式，为人工智能辅助数学理论研究开辟了新路径。

业内专家认为，尽管这项研究尚处早期阶段，但标志着人工智能正从工具向科研合作者转型。随着验证环节成为未来发展的关键瓶颈，如何建立更高效的自动化验证体系将成为该领域的重要课题。

中文翻译：

人工智能作为研究伙伴：AlphaEvolve推动理论计算机科学发展
2025年9月30日
安什·纳德加（谷歌DeepMind学生研究员）、阿布拉迪普·塔库塔（谷歌DeepMind研究科学家）与普拉巴卡尔·拉加万（谷歌首席技术官）

我们调用基于大语言模型的编程智能体AlphaEvolve，发现并验证了能改进特定优化问题近似计算难度结果的组合结构。

快速导读
当前大语言模型已在竞技数学和编程竞赛中展现出惊人能力，但在数学发现领域——如证明新定理或发现新型组合结构——成果相对有限（除少数著名案例[1,2,3]外）。由于数学与理论计算机科学要求绝对精确性，我们在最新论文《组合结构的强化生成：在复杂性理论中的应用》中，演示了如何借助大语言模型驱动的编程智能体探索推动复杂性理论发展的新型数学结构。

本研究采用谷歌DeepMind开发的AlphaEvolve系统，通过反馈循环机制初始生成代码片段种群，评估代码输出结构，并利用大语言模型对最优片段进行迭代进化。该方法在复杂性理论两大领域取得突破：1) 将四切片最大割问题（定义为MAX-4-CUT）的不可近似性界限提升至新高度；2) 强化了随机图属性认证的平均案例硬度边界。

AI辅助数学研究主要呈现两种模式：

研究者调用大语言模型进行文献综述、制定新定理研究计划或直接生成证明段落
研究者使用AI工具（如AlphaEvolve）生成更优证明要素
我们的工作属于第二类，通过AlphaEvolve获取可由计算机程序自动验证的优质证明要素。

提升之力：从有限构造到普适陈述
AI应用于理论计算机科学的核心挑战在于研究问题的普适性。AI系统或许能解决特定问题实例（如50城旅行商问题最优路径），但学者更需要适用于所有问题规模（记作∀n）的普适定理。我们通过“提升”技术实现突破：将证明视为长字符串，对其局部结构进行进化强化，同时保持与整体证明的接口完整性。这种方法只需验证进化后的有限结构即可确保整体正确性。

复杂性理论中广泛使用的“构件归约”正是典型例证。为证明目标问题的计算难度，研究者将已知难题映射至目标问题。构件作为局部转换规则，其优化往往依赖人工完成。通过AlphaEvolve搜寻更优构件，我们发现了远超既有认知的复杂结构，这些有限构造被嵌入现有数学框架后，直接催生了新的普适性定理。

复杂性理论新定理
我们将此方法应用于MAX-k-CUT问题：给定节点与边构成的图结构，目标是将节点划分为k个集合使得跨集合边数最大化。这个经典NP难问题促使我们聚焦近似算法研究——即高效寻找接近最优解的算法。关键问题在于：近似算法的性能极限何在？

MAX-4-CUT：刷新性能纪录
针对四划分的MAX-4-CUT问题，此前最佳结果证明其存在0.9883的NP难近似界限。AlphaEvolve通过搜寻新型构件归约，发现包含19个变量（节点）的复杂构件，其中部分连接权重差异达1429倍。该发现将不可近似性界限提升至0.987。在成熟的近似难度研究领域，此类进步往往意味着重大技术突破。

平均案例硬度与拉马努金图
我们进一步探究了稀疏随机图中MAX-2-CUT（及最大独立集）认证难度的平均案例。前人研究通过计算机辅助最多找到10节点的极值图。AlphaEvolve成功在163节点规模上发现具有更大割值的拉马努金图，显著改进了平均案例硬度的下界。结合最新非AI算法进展，我们几乎完全确定了这些问题的计算难度，将上下界差距缩小至小数点后三位。

可验证正确性的关键作用
本研究的核心特色在于所有结果均附带正确性证明。直接使用大语言模型生成数学证明常产生需要人工补全的证明草图，而本研究使用AI探索证明中的结构而非证明本身。最终定理的有效性依赖两大支柱：提升框架的正确性与发现结构的可验证性。AlphaEvolve通过实施复杂分支定界策略与系统级优化，将验证效率提升万倍，这是实现突破的关键。所有最终构件均经过原始暴力算法复核，确保定理的绝对正确性。

AI辅助理论的未来展望
尽管初步研究远非定论，但已显示AI有望成为数学发现的重要合作者。我们观察到AlphaEvolve生成的复杂数学对象初现推理能力。随着证明成果逐渐归功于AI的时代来临，验证工作将成为关键瓶颈。

致谢
感谢亚当·佐尔特·瓦格纳、斯瓦拉特·乔杜里、帕辛·马努朗西与苏尚特·萨奇德瓦在项目各阶段给予的帮助。

注记
数学命题具有绝对真伪性，这与AI在文章写作、艺术创作等主观标准应用领域存在本质区别。稀疏随机图指每个节点度数为固定小值d的随机边生成图结构。

英文来源：

AI as a research partner: Advancing theoretical computer science with AlphaEvolve
September 30, 2025
Ansh Nadga, Student Researcher, and Abhradeep Thakurta, Staff Research Scientist, Google DeepMind, and Prabhakar Raghavan, Chief Technologist, Google
We invoke AlphaEvolve, an LLM-based coding agent, to find and verify combinatorial structures that improve results on the hardness of approximately solving certain optimization problems.
Quick links
Recently, large language models (LLMs) have demonstrated surprising capabilities in competitive mathematics and competitive programming, demonstrating world-leading performance across both of these fields. However, their successes in mathematical discovery — proving novel theorems or uncovering new combinatorial structures — have been relatively few (with some notable exceptions [1, 2, 3]). Since mathematics and theoretical computer science demand absolute correctness
In our recent paper, “Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory”, we demonstrate how an LLM-powered coding agent can help discover new mathematical structures that push the boundaries of our understanding of complexity theory (a sub-field of theoretical computer science). Our work utilizes AlphaEvolve, a system developed at Google DeepMind that uses LLMs to iteratively evolve code. By employing a feedback loop, AlphaEvolve began with populations of code snippets, evaluated the structures produced by the code snippets, and used an LLM to morph the most successful snippets toward better solutions. This approach led to new results in two distinct areas of complexity theory: 1) improving the state-of-the-art for the limit on our ability to approximate the outcome (i.e., the "inapproximability") of the maximum cut problem for 4 slices (which we define as the MAX-4-CUT problem), and 2) tightening the bounds on the average-case hardness of certifying properties of random graphs.
AI-assisted mathematical research can operate in the following modes:

A person invokes an LLM to summarize the literature, to chart a research plan towards new theorems, or to directly generate chunks of (or entire) proofs.
A person uses AI-derived tools, such as AlphaEvolve, to generate better proof elements.
Our work falls in the second category, where we obtain better proof elements using AlphaEvolve that can be automatically verified by a computer program.
The power of lifting: From finite constructions to universal statements
A fundamental challenge in using AI for theoretical computer science research lies in the universal nature of the problems studied. An AI system might find a solution to a specific instance of a problem — say, the optimal route for a traveling salesman visiting 50 specific cities. However, computer scientists often seek theorems that hold true universally for all problem instances and sizes (denoted as ∀n).
How can we use AlphaEvolve to prove a universal statement? The answer lies in a technique known as "lifting" (see image below). If a proof is viewed as a long string, then one can take a chunk of the proof (corresponding to a certain finite structure), and evolve it to support a stronger universal statement, while keeping the interface to the rest of the proof intact. The advantage of this approach is that to certify overall correctness, one needs to only certify the correctness of the finite structure that has been evolved.
In complexity theory, researchers often use established proof frameworks that rely on the existence of specific, highly optimized finite structures. If a better structure can be found, the entire proof framework "lifts" this improvement to a better universal result.
A key example of this is a "gadget reduction." To prove that a target problem is computationally hard (intractable), researchers try to map a known intractable source problem to it, hence demonstrating that the target problem is at least as hard as the source problem. A gadget is a recipe for locally transforming a small piece of the source problem into a piece of the target problem. These gadgets are finite structures, and finding the optimal gadget is a painstaking process often done by hand.
By tasking AlphaEvolve with finding better gadgets, we were able to discover structures far more complex than those previously known. These finite discoveries, when plugged into the existing mathematical frameworks, immediately yield new universal theorems in complexity theory.
New theorems in complexity theory
We applied this methodology to the MAX-k-CUT problem. Given a graph (a network of nodes and edges), the goal is to partition the nodes into k distinct sets such that the number of edges crossing between different sets is maximized. This is a classic intractable (NP-hard) problem, meaning we do not expect to find efficient algorithms that solve it exactly. Therefore, we focused on approximation algorithms — those that efficiently find solutions guaranteed to be close to the optimum.
The crucial question is: what is the limit of approximation?
MAX-4-CUT: A new state of the art
For MAX-4-CUT (partitioning into four sets), the previous best-known result proved that it is NP-hard to approximate the solution within a factor of 0.9883. AlphaEvolve was deployed to search for a new gadget reduction to MAX-4-CUT.
The system discovered an intricate gadget involving 19 variables (nodes) with a complex weighting scheme (some connections having up to 1429 times the weight of others). This discovery established a new inapproximability bound of 0.987.
This improvement may seem incremental, but in the mature field of hardness of approximation, such advances often require significant new techniques or combinatorial insights.
Average-case hardness and Ramanujan graphs
We also explored the hardness of problems on average, rather than in the worst case. Specifically, we studied the difficulty of certifying bounds on the MAX-2-CUT (as well as maximum independent set) of sparse random graphs
Prior work used computer assistance to find such graphs on up to 10 nodes. Improving their results requires finding more extremal Ramanujan graphs on many more nodes, which are exceedingly difficult to find and verify. AlphaEvolve successfully navigated this vast search space, discovering Ramanujan graphs with even larger cuts on as many as 163 nodes.
These discoveries significantly improved the lower bounds for average-case hardness. Furthermore, combined with new algorithmic progress (non-AI based), we were able to nearly settle the computational hardness of these questions, matching the upper and lower bounds to within the third decimal place.
The crucial role of verified correctness
A critical distinction of this work is that the results come with proofs of correctness.
When an LLM is prompted to generate a mathematical proof directly, it often produces a proof sketch or an argument that requires substantial human intervention to verify and complete. Hallucinations or subtle errors can render the output useless. As mentioned earlier, the standard for correctness in math is absolute.
In contrast, the approach taken here uses AI to discover a structure within the proof, not the proof itself. The validity of the final theorem relies on two components: the correctness of the lifting framework, and the verification of the discovered structure. While the frameworks are sound, verifying the structures discovered by AlphaEvolve is computationally intensive.
Remarkably, AlphaEvolve achieved a 10,000x speedup in the verification process by implementing sophisticated branch-and-bound strategies and system-level optimizations. This massive speedup was the key enabler for the research, allowing the system to explore much larger and more complex gadgets.
Crucially, the final gadgets discovered were still verified using the original, brute-force algorithm, ensuring the absolute correctness of the theorems.
The future of AI-assisted theory
While these initial research findings are far from conclusive, they suggest that AI is poised to become a helpful collaborator in mathematical discovery. We have observed the models in AlphaEvolve generate intricate mathematical objects that at times exhibit nascent reasoning capabilities. However, as we transition into an era where proofs may increasingly be attributed to AI, the crucial task of verification is set to become a significant bottleneck.
Acknowledgments
We would like to thank Adam Zsolt Wagner, Swarat Chaudhuri, Pasin Manurangsi and Sushant Sachdeva for helping us during various stages of the project.
In mathematics, a statement is definitively true or false, with no intermediate state possible. This stands in contrast to several other applications of AI, such as essay-writing or artistic creation, which have subjective standards of correctness and do not need to be correct in an absolute sense.
A sparse random graph is generated by randomly adding edges between a pair of nodes, where each node is guaranteed to have exactly d neighbors for some small d.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读