破解虚拟机难题：人工智能如何优化云计算

qimuai 发布于 2025-10-18 08:01 阅读：5 一手编译

破解虚拟机难题：人工智能如何优化云计算

内容来源：https://research.google/blog/solving-virtual-machine-puzzles-how-ai-is-optimizing-cloud-computing/

内容总结：

【科技前沿】谷歌研发新型AI调度算法有效提升云计算资源利用率

本报讯谷歌研究团队近日公布了一项名为LAVA的创新性人工智能调度系统，该技术通过持续预测虚拟机生命周期，显著提升了大型云数据中心的资源利用效率。这项研究成果已在实际生产环境中取得显著成效。

在云计算领域，数据中心每秒钟需要处理海量虚拟机调度请求，这类似于一场永不停歇的“俄罗斯方块”游戏。传统调度算法因无法准确预知虚拟机的存续时间，常导致服务器资源闲置或碎片化。谷歌DeepMind与研究部门合作开发的LAVA系统，突破性地采用了“持续重预测”机制，通过机器学习模型动态更新虚拟机生命周期预测，而非依赖创建时的单次预测。

该系统包含三大核心算法：非侵入式生命周期评分（NILAS）算法已部署于谷歌集群管理系统Borg，可根据虚拟机实时运行状态调整主机评分；生命周期感知调度（LAVA）算法创新性地采用长短生命周期虚拟机混合部署策略；生命周期感知重调度（LARS）算法则能有效减少维护期间的虚拟机迁移数量。

特别值得关注的是，为确保系统可靠性，研究团队将机器学习模型直接编译进调度器核心程序，使预测延迟降至9微秒，较传统模型服务器方案提速780倍。同时通过建立主机生命周期评分缓存机制，成功解决了超大规模集群的预测瓶颈问题。

实际运行数据显示，自2024年初部署以来，NILAS算法使谷歌数据中心空闲主机数量提升2.3-9.2个百分点，相当于为每个集群释放1%-9%的算力容量。在部分试验中，CPU资源碎片化现象降低约3%，内存资源碎片化减少2%。模拟实验表明，LAVA算法可在此基础上再提升0.4个百分点效率，LARS算法则有望将维护迁移量减少4.5%。

这项技术突破标志着机器学习在数据中心基础架构优化领域迈出关键一步，其提出的持续预测框架及模型系统协同设计理念，为未来智能基础设施管理提供了重要技术范式。

中文翻译：

破解虚拟机调度难题：人工智能如何优化云计算
2025年10月17日
谷歌研究院研究科学家 Pratik Worah，谷歌DeepMind研究科学家 Martin Maas

我们推出全新调度算法LAVA，通过持续预测并适配虚拟机实际生命周期，优化大型云数据中心的资源利用效率。

快速链接
想象一场类似俄罗斯方块的拼图游戏：方块不断坠落在堆栈上，有些严丝合缝，有些则格格不入。游戏目标是将方块尽可能紧密高效地排列组合。这个游戏恰如其分地映射了云数据中心每秒数次面临的挑战——如何最高效地分配处理任务（即虚拟机）。不过现实中的“方块”（虚拟机）会动态出现和消失，有些仅存续数分钟，有些则持续数天。尽管虚拟机初始寿命未知，为了提升效率，我们仍需尽可能充分利用物理服务器资源。若能预知任务的大致存续时间，资源分配效率将显著提升。

在大型数据中心规模下，资源高效利用对经济效益和环境保护都至关重要。拙劣的虚拟机分配会导致“资源搁浅”——服务器剩余资源过少或过于零散，无法承载新虚拟机，造成实质性的容量浪费。欠佳的分配方案还会减少“空置主机”数量，而这类主机对于系统更新、部署大型资源密集型虚拟机等任务不可或缺。

由于虚拟机行为信息不完整，这个经典的装箱问题变得尤为复杂。人工智能可通过训练模型预测虚拟机存续时长来应对该挑战。但传统方案仅依赖虚拟机创建时的单次预测，一旦预测偏差就可能导致整台主机长期被占用，从而降低整体效率。

在论文《LAVA：基于学习分布与错误预测自适应的生命周期感知虚拟机分配》中，我们提出三重算法架构：非侵入式生命周期感知评分（NILAS）、生命周期感知虚拟机分配（LAVA）及生命周期感知重调度（LARS），旨在解决虚拟机与物理服务器高效匹配的装箱难题。该体系采用“持续再预测”机制，不依赖创建时的初始寿命猜测，而是随虚拟机持续运行，动态更新其剩余寿命预测。

虚拟机的隐秘生命：再预测与概率分布
本研究的关键洞见在于：虚拟机生命周期往往难以预测且呈长尾分布。例如虽然88%的虚拟机存续不足一小时，但这些短寿虚拟机仅消耗总资源的2%。这意味着少数长寿虚拟机的布局对整体资源效率具有超比例影响。

我们摒弃了容易产生误导的单点平均寿命预测（尤其对双峰或高波动寿命的虚拟机），转而设计出预测虚拟机寿命概率分布的机器学习模型。这种受生存分析启发的方案能有效捕捉虚拟机行为的固有不确定性。

更重要的是，系统会基于该分布持续更新预测。我们不断追问：“假设某虚拟机已运行五天，其预期剩余寿命是多少？”随着虚拟机持续运行，系统获取更多信息，寿命预测愈发精准。我们的算法专门针对利用这些再预测而协同设计，能主动响应预测偏差并持续提升准确度。

新一代调度算法体系
基于更稳健的预测模型，我们开发了三项创新算法以优化虚拟机分配：

非侵入式生命周期感知调度（NILAS）
NILAS作为非侵入式算法，将生命周期预测融入现有评分函数。它通过重预测主机上所有现存虚拟机的退出时间，来评估新虚拟机的潜在宿主机优先级。通过优先选择所有虚拟机预期同时退出的主机，NILAS致力创造更多空置机器。再预测机制使其对预测误差不敏感，并能主动修正错误。该算法已部署于我们的大规模集群管理系统Borg，显著提升了虚拟机分配效率。
生命周期感知虚拟机分配（LAVA）
LAVA是对现有调度机制的根本性革新。与NILAS聚集相似寿命虚拟机的策略相反，LAVA将短寿虚拟机与长寿虚拟机混配。其目标是用寿命至少短一个数量级的临时虚拟机填充资源间隙，使它们快速退出而不延长主机整体生命周期。当虚拟机超期运行时，LAVA还会通过延长主机预期寿命来自适应预测偏差。模拟显示该策略能最小化碎片化，确保主机最终释放资源。
生命周期感知重调度（LARS）
LARS运用寿命预测来最小化碎片整理和维护期间的虚拟机中断。当主机需碎片整理时，LARS按预测剩余寿命排序该主机上的虚拟机，优先迁移最长寿的虚拟机。短寿虚拟机则在迁移前自然终止。模拟实验表明LARS有望将维护所需的实时迁移总量降低约4.5%。

应对大规模部署挑战
开发强大模型与算法仅是解决方案的一部分。要实现大规模可靠部署，必须重新构思模型部署方案。

常规做法是在专用推理服务器部署机器学习模型。但这会形成循环依赖——这些服务器本身也运行于集群调度系统。模型服务层故障可能引发调度器级联崩溃，这对关键任务系统不可接受。

我们的解决方案是将模型直接编译进Borg调度器二进制文件。此举既消除循环依赖，又确保模型通过与调度器代码变更相同的严格流程进行测试和发布。此举还带来额外优势：模型预测中位延迟仅9微秒，比独立模型服务器方案快780倍。这种低延迟对频繁再预测及在性能敏感任务（如维护和碎片整理）中使用模型至关重要。

针对最大规模区域预测量可能成为瓶颈的问题，我们引入了主机寿命分数缓存——仅当虚拟机增删或主机预期寿命到期时才更新预测。该缓存机制既保障高性能，又支持全舰队部署。

实践成果
自2024年初起，我们的NILAS算法已在谷歌生产数据中心运行，成效显著：

• 空置主机提升：生产试点与全舰队部署显示空置主机增加2.3-9.2个百分点。该指标直接关联效率——每提升1个百分点相当于节约集群1%容量。
• 资源搁浅减少：部分试点实验中，NILAS降低约3%的CPU搁浅和2%内存搁浅，使主机更多资源可用于新虚拟机。

LAVA模拟实验显示其较NILAS还能再提升约0.4个百分点。LARS模拟则表明其有望将维护所需的实时迁移量减少4.5%。

结语
我们相信这项研究为机器学习系统持续优化数据中心管理奠定了基石。我们开发的技术——特别是再预测机制及模型系统协同设计——可推广至其他任务。实践证明，将先进机器学习技术融入系统基础设施栈最底层时，既可保障可靠性与低延迟，又能实现显著效率提升。

致谢
LAVA是跨谷歌多团队（包括谷歌云、谷歌DeepMind、谷歌研究院及SystemsResearch@Google）的大型协作项目。核心贡献者包括：Jianheng Ling、Pratik Worah、Yawen Wang、Yunchuan Kong、Anshul Kapoor、Chunlei Wang、Clifford Stein、Diwakar Gupta、Jason Behmer、Logan A. Bush、Prakash Ramanan、Rajesh Kumar、Thomas Chestna、Yajing Liu、Ying Liu、Ye Zhao、Kathryn S. McKinley、Meeyoung Park及Martin Maas。

英文来源：

Solving virtual machine puzzles: How AI is optimizing cloud computing
October 17, 2025
Pratik Worah, Research Scientist, Google Research, and Martin Maas, Research Scientist, Google DeepMind
We present LAVA, a new scheduling algorithm that continuously re-predicts and adapts to the actual lifetimes of virtual machines to optimize resource efficiency in large cloud data centers.
Quick links
Imagine a puzzle game similar to Tetris with pieces rapidly falling onto a stack. Some fit perfectly. Others don’t. The goal is to pack the blocks as tightly and efficiently as possible. This game is a loose analogy to the challenge faced by cloud data centers several times every second as they try to allocate processing jobs (called virtual machines or VMs) as efficiently as possible. But in this case, the “pieces” (or VMs) appear and disappear, some with a lifespan of only minutes, and others, days. In spite of the initially unknown VM lifespans, we still want to fill as much of the physical servers as possible with these VMs for the sake of efficiency. If only we knew the approximate lifespan of a job, we could clearly allocate much better.
At the scale of large data centers, efficient resource use is especially critical for both economic and environmental reasons. Poor VM allocation can lead to "resource stranding", where a server's remaining resources are too small or unbalanced to host new VMs, effectively wasting capacity. Poor VM allocation also reduces the number of "empty hosts", which are essential for tasks like system updates and provisioning large, resource-intensive VMs.
This classic bin packing problem is made more complex by this incomplete information about VM behavior. AI can help with this problem by using learned models to predict VM lifetimes. However, this often relies on a single prediction at the VM's creation. The challenge with this approach is that a single misprediction can tie up an entire host for an extended period, degrading efficiency.
In “LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions”, we introduce a trio of algorithms — non-invasive lifetime aware scoring (NILAS), lifetime-aware VM allocation (LAVA), and lifetime-aware rescheduling (LARS) — which are designed to solve the bin packing problem of efficiently fitting VMs onto physical servers. This system uses a process we call “continuous reprediction”, which means it doesn’t rely on the initial, one-time guess of a VM’s lifespan made at its creation. Instead, the model constantly and automatically updates its prediction for a VM's expected remaining lifetime as the VM continues to run.
The secret life of VMs: Repredictions and probability distributions
One of the key insights driving this research is the recognition that VM lifetimes are often unpredictable and follow a long-tailed distribution. For example, while the vast majority of VMs (88%) live for less than an hour, these short-lived VMs consume only a tiny fraction (2%) of the total resources. This means that the placement of a small number of long-lived VMs has a disproportionately large impact on overall resource efficiency.
Instead of trying to predict a single, average lifetime, which can be misleading for VMs with bi-modal or highly varied lifespans, we designed an ML model that predicts a probability distribution for a VM's lifetime. This approach, inspired by survival analysis, allows the model to capture the inherent uncertainty of a VM's behavior.
More importantly, our system uses this distribution to continuously update its predictions. We ask, “Given a VM has been running for five days, what is its expected remaining lifetime?” As a VM continues to run, the system gains more information, and its lifetime prediction becomes more accurate. Our algorithms are specifically co-designed to leverage these repredictions, actively responding to mispredictions and improving the accuracy over time.
A new class of scheduling algorithms
With this new, more robust prediction model, we developed three novel algorithms to improve VM allocation.

Non-Invasive Lifetime Aware Scheduling (NILAS)
NILAS is a non-invasive algorithm that incorporates lifetime predictions into an existing scoring function. It ranks potential hosts for a new VM by considering the repredicted exit times of all existing VMs on that host. By prioritizing hosts where all VMs are expected to exit at a similar time, NILAS aims to create more empty machines. Our use of repredictions is less sensitive to prediction accuracy and allows NILAS to correct for errors. The NILAS algorithm has been deployed on our large-scale cluster manager, Borg, where it significantly improves VM allocation.
Lifetime-Aware VM Allocation (LAVA)
LAVA is a more fundamental departure from existing scheduling mechanisms. While NILAS aims to pack VMs with similar lifetimes, LAVA does the opposite: it puts shorter-lived VMs on hosts with one or more long-lived VMs. The goal is to fill in resource gaps with short-lived VMs that are at least an order of magnitude shorter than the host’s anticipated lifespan, so that they exit quickly without extending the host’s overall lifespan. LAVA also actively adapts to mispredictions by increasing a host’s anticipated lifespan if a VM outlives its expected deadline. Simulations show that this strategy minimizes fragmentation and ensures that hosts are eventually freed up.
Lifetime-Aware Rescheduling (LARS)
LARS uses our lifetime predictions to minimize VM disruptions during defragmentation and maintenance. When a host needs to be defragmented, LARS sorts the VMs on that host by their predicted remaining lifetime and migrates the longest-lived VMs first. Shorter-lived VMs exit naturally before migration. Simulations with LARS indicate it has the potential to reduce the total number of migrations required by around 4.5%.
Addressing the challenge of deployment at scale
Developing powerful models and algorithms is only one part of the solution. Getting them to work reliably at large scale required us to rethink our approach to model deployment.
A common practice is to serve machine learning models on dedicated inference servers. However, this would have created a circular dependency, as these servers would themselves run on our cluster scheduling system. A failure in the model serving layer could then cause a cascading failure in the scheduler itself, which is unacceptable for a mission-critical system.
Our solution was to compile the model directly into the Borg scheduler binary. This approach eliminated the circular dependency and ensured that the model was tested and rolled out with the same rigorous process as any other code change to the scheduler. This also yielded an additional benefit: the model's median latency is just 9 microseconds (µs), which is 780 times faster than a comparable approach that uses separate model servers. This low latency is crucial for running repredictions frequently and for using the model in performance-sensitive tasks, like maintenance and defragmentation.
We also found that for our largest zones, the number of required predictions could become a bottleneck. We addressed this by introducing a host lifetime score cache, which only updates predictions when a VM is added or removed from a host, or when a host's expected lifetime expires. This caching mechanism ensures high performance and allows us to deploy our system fleet-wide.
Results
Our NILAS algorithm has been running in Google's production data centers since early 2024. The results are clear and significant.
- Increased empty hosts: Our production pilots and fleet-wide rollouts have shown an increase in empty hosts by 2.3–9.2 percentage points (pp). This metric directly correlates with efficiency, as a 1 pp improvement is typically equivalent to saving 1% of a cluster's capacity.
- Reduced resource stranding: In some pilot experiments, NILAS reduced CPU stranding by approximately 3% and memory stranding by 2%. This means more of a host's resources are available to be used by new VMs.
  Simulations running LAVA suggest it will provide a further ~0.4 pp improvement over NILAS. Similarly, simulations with LARS indicate that it has the potential to reduce the number of VM live migrations needed for maintenance by 4.5%.
  Conclusion
  We believe this work is a foundational step towards a future where data center management is increasingly optimized by machine learning systems. The techniques we developed, particularly the use of repredictions and the co-design of models and systems, are generalizable to other tasks. We have demonstrated that it is possible to integrate advanced machine learning techniques into the lowest layers of a system’s infrastructure stack without sacrificing reliability or latency, while still delivering significant efficiency gains.
  Acknowledgements
  LAVA is a large collaborative project that spanned multiple teams across Google, including Google Cloud, Google DeepMind, Google Research, and SystemsResearch@Google. Key contributors include Jianheng Ling, Pratik Worah, Yawen Wang, Yunchuan Kong, Anshul Kapoor, Chunlei Wang, Clifford Stein, Diwakar Gupta, Jason Behmer, Logan A. Bush, Prakash Ramanan, Rajesh Kumar, Thomas Chestna, Yajing Liu, Ying Liu, Ye Zhao, Kathryn S. McKinley, Meeyoung Park, and Martin Maas.

谷歌研究进展

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读