在变化的世界中调度:如何利用时变能力实现吞吐量最大化

内容总结:
谷歌研究团队提出新型调度算法 应对云基础设施动态容量挑战
2026年2月11日,谷歌研究院科学家马尼什·普罗希特(Manish Purohit)及其合作团队发表研究,针对云环境中机器可用性持续波动的现实,提出了一系列全新且被证明有效的不间断作业调度算法。该研究旨在解决动态资源环境下批处理作业吞吐量最大化这一核心难题。
传统调度模型通常将计算资源视为静态,但现代大规模云计算的实际情况远为复杂。硬件故障、维护周期、能效限制,尤其是高优先级任务随时抢占资源,导致用于低优先级批处理作业的“剩余”容量随时间不断变化。此类批处理作业往往不可中断,一旦因容量不足被中止,所有进度将丢失,因此调度决策至关重要:是冒险立即启动长时作业,还是等待更安全的时机但可能错过截止期限?
在发表于SPAA 2025的论文《时变容量下的非抢占式吞吐量最大化》中,研究团队首次系统研究了在容量波动环境中最大化作业吞吐量(成功作业的总权重或数量)的问题。该工作为这一领域的多个变体问题提供了首个恒定因子近似算法,为在波动云环境中构建更稳健的调度器奠定了理论基础。
研究聚焦的调度模型包含四个关键作业属性:释放时间、截止期限、处理时长和权重。目标是在不超过任何时刻实时容量的约束下,选择作业子集并安排其连续运行,以最大化完成作业的总权重。
研究分为离线和在线两种场景:
- 离线场景(未来信息已知):研究证明,即使寻找最优解属于NP难问题,简单的贪心策略也能取得良好效果。对于单位权重作业,一种选择最早完成作业的贪心算法可实现1/2的近似比(即至少完成最优数量的一半作业)。对于不同权重的作业,利用原始对偶框架可获得1/4的近似比。
- 在线场景(作业实时到达,未来信息未知):标准非抢占式算法在此场景下竞争比趋近于零,单一长作业的错误调度可能阻塞大量后续短作业。为使问题可解,团队研究了两种允许中断的模型:
- 允许重启的中断:中断后作业可重新尝试。研究表明,采用最早完成作业策略的贪心算法变体仍能保持1/2的竞争比。
- 不允许重启的中断:中断导致作业被永久丢弃。在此严格模型下,任何在线算法的竞争比仍可能趋近于零。针对实践中常见的“共同截止期限”场景(如所有数据处理需在夜间批处理窗口前完成),团队设计了新颖的恒定竞争比算法。在单位容量的简单设定下,算法通过维护一个暂定调度表,对新到达作业采取四项直观操作之一:插入空闲时段、替换未来作业、中断当前执行作业或丢弃新作业。该算法推广至任意容量曲线后,可获得1/11的竞争比。尽管在最坏情况下仅保证完成约9%的最优作业量,但这是该严格模型下首个恒定竞争比结果。
随着云基础设施日益动态化,静态容量假设已不再适用。本研究开创了时变容量下吞吐量最大化问题的形式化研究,弥合了理论调度模型与数据中心波动现实之间的差距。尽管已取得坚实的恒定因子近似结果,但在线场景1/11的竞争比与理论上限1/2之间仍存在改进空间。未来探索随机算法或未来容量信息不完美的场景,有望为实际应用带来更优的解决方案。
此项研究由谷歌研究院的马尼什·普罗希特、佐娅·斯维特基娜、埃里克·维、乔舒亚·王,以及伊利诺伊大学厄巴纳-香槟分校的阿尼凯特·穆尔赫卡共同完成。
中文翻译:
在不断变化的世界中调度:利用时变容量最大化吞吐量
2026年2月11日
马尼什·普罗希特,谷歌研究院科学家
我们提出了一系列经过严格验证的新算法,用于在机器可用性持续波动的云基础设施上调度不可中断的任务。
快速链接
在算法化任务调度的领域中,计算资源常被视为静态存在:服务器拥有固定数量的CPU,或集群保持恒定数量的可用机器。然而,现代大规模云计算的现实要动态得多。由于硬件故障、维护周期或电力限制,资源始终处于波动状态。
更重要的是,在分层调度系统中,高优先级任务常会按需占用资源,从而为低优先级批处理任务留下随时间变化的“剩余”容量。这就像一家餐厅在不同时段为VIP预留座位,而安排普通顾客使用剩余座位则成为一道复杂的难题。
当这些低优先级任务具有“不可抢占”特性——即无法暂停后恢复执行时,调度决策的风险便显著增加。若因容量下降导致任务中断,所有进度都将丢失。调度器必须作出抉择:是立即启动这项耗时任务并承担未来容量下降的风险,还是等待更安全的执行窗口却可能错过截止期限?
在2025年并行算法与架构研讨会(SPAA)发表的论文《时变容量下的不可抢占式吞吐量最大化》中,我们开创性地研究了在可用容量随时间波动的环境中最大化吞吐量(成功任务的总权重或数量)的问题。我们的研究首次为多个问题变体提供了恒定因子近似算法(即无论问题规模如何扩大,算法结果与最优解之间的“差距”始终保持在固定稳定数值),为在波动云环境中构建更稳健的调度器奠定了理论基础。
定义调度问题
我们的研究致力于构建一个能捕捉关键约束的调度模型。我们考虑一台容量随时间变化的机器(或集群),其容量曲线决定了任意时刻可并行运行任务的最大数量。面对一系列潜在任务,每个任务具有四个关键属性:
- 就绪时间:任务可开始执行的时间
- 截止期限:任务必须完成的硬性时限
- 处理时长:机器执行该任务所需的持续时间
- 权重:任务成功完成时获得的价值
目标在于选择任务子集并安排其执行,使得每个被选任务能在其有效时间窗内连续运行。核心约束在于:任意时刻正在运行的任务总数不得超过当前容量。我们的优化目标是最大化吞吐量,即所有完成任务的权重总和。
我们在两种不同环境中研究该问题:
- 离线环境:未来任务到达情况与容量变化均预先已知
- 在线环境:任务实时到达,调度器必须在未知后续任务的情况下作出不可撤销的决策(容量变化仍可预先知晓)
离线场景研究成果
在可预先规划的离线场景中,我们发现简单策略表现出惊人的有效性。由于寻找该场景下的最优调度方案属于“NP难”问题(即不存在已知的完美求解捷径),我们聚焦于具有严格近似保证的算法。我们分析了一种称为“贪心算法”的短视策略——该算法迭代式地调度最早完成的任务,并证明在任务权重相同的情况下,这一简单启发式算法能达到1/2近似比(在此类问题中通常已达最优水平)。这意味着即使在最坏情况下(面对精心设计的任务序列与容量曲线),我们的简单算法仍能保证调度至少半数的最优任务量。这一结果与贪心算法在传统单容量机器(一次仅能执行单任务)上取得的保证性能相匹配。当任务具有不同权重时,我们运用原始对偶框架实现了1/4近似比。
在线场景研究成果
真正的复杂性存在于在线场景中:任务动态到达,调度器必须在未知后续任务的情况下立即作出不可撤销的决策。我们通过竞争比(在线算法吞吐量与先知最优算法吞吐量在最坏情况下的比值)量化在线算法的性能。
传统的不可抢占式算法在此场景中完全失效——其竞争比趋近于零。这是因为调度一个长任务的错误决策可能彻底丧失后续调度多个短任务的机会(假设所有完成任务的权重相同,完成多个短任务远比完成单个长任务更具效益)。
为使在线问题可解并体现实际灵活性,我们研究了两种允许中断执行中任务的模型(仅当任务被重启并最终以不可抢占方式完成时才计入成功统计):
允许重启的中断模型
该模型允许在线算法中断当前执行的任务。虽然中断任务已执行的部分工作将作废,但任务仍保留在系统中并可重新尝试。我们发现允许任务重启的灵活性极具价值:采用“调度最早完成任务”策略的贪心算法变体仍能保持1/2竞争比,与离线场景结果持平。
禁止重启的中断模型
在此更严格的模型中,中断任务的所有工作成果都将丢失,且该任务被永久废弃。遗憾的是,我们发现任何在线算法在此模型下都可能遭遇特定的任务序列,迫使其作出阻碍未来完成更多工作的决策,导致竞争比再次趋近于零。通过分析这些困难案例,我们将研究聚焦于所有任务共享同一截止期限的实际场景(例如所有数据处理必须在夜间批处理运行前完成)。针对此类共同时限场景,我们设计了新颖的恒定竞争比算法。该算法直观易懂,以下以单位容量场景(即任意时刻仅能调度单任务)为例说明:
在此场景中,我们的算法通过为已到达任务分配互不重叠的时间段来维护暂定调度表。当新任务到达时,算法按顺序执行以下四种操作中的首个可行操作来更新暂定调度:
- 将新任务放入空闲时间段加入暂定调度
- 若新任务显著小于暂定调度中的某个未来任务,则替换该任务
- 若新任务小于当前执行任务的剩余处理时间,则中断当前任务
- 直接丢弃新到达任务
我们的核心成果表明:将该算法推广至任意容量曲线后,可为此问题带来首个恒定竞争比。严格来说,我们获得了1/11的竞争比。虽然保证仅调度约9%的最优任务量看似较弱,但这正是适用于最极端对抗情况的最坏情况保证。
结论与未来方向
随着云基础设施日益动态化,调度算法中静态容量的假设已不再成立。本文开创了时变容量下吞吐量最大化问题的系统性研究,弥合了理论调度模型与数据中心波动现实之间的鸿沟。
尽管我们建立了强恒定因子近似结果,但仍有提升空间。在线场景1/11竞争比与理论上限1/2之间的差距表明,可能存在更高效的算法。探索随机化算法或未来容量信息不完整的场景,有望为实际应用带来更优解决方案。
致谢
本研究是与伊利诺伊大学厄巴纳-香槟分校的阿尼凯特·穆尔赫卡尔、谷歌研究院的卓娅·斯维特基娜、埃里克·维以及约书亚·王共同完成的成果。
英文来源:
Scheduling in a changing world: Maximizing throughput with time-varying capacity
February 11, 2026
Manish Purohit, Research Scientist, Google Research
We introduce new, provably effective algorithms for scheduling jobs without interruptions on cloud infrastructure when machine availability constantly fluctuates.
Quick links
In the world of algorithmic job scheduling, computing resources are often viewed as static: a server has a fixed number of CPUs, or a cluster has a constant number of available machines. However, the reality of modern large-scale cloud computing is far more dynamic. Resources fluctuate constantly due to hardware failure, maintenance cycles, or power limitations.
More significantly, in tiered scheduling systems, high-priority tasks often claim resources on demand, leaving a time-varying amount of “leftover” capacity for lower-priority batch jobs. Imagine a restaurant where tables are reserved for VIPs at different times; scheduling regular customers on the remaining tables can become a complex puzzle.
When these low-priority jobs are non-preemptive — meaning they cannot be paused and resumed later — the stakes are high. If a job is interrupted because capacity drops, all progress is lost. The scheduler must decide: Do we start this long job now, risking a future capacity drop? Or do we wait for a safer window, potentially missing the deadline?
In “Non-preemptive Throughput Maximization under Time-varying Capacity”, presented at SPAA 2025, we initiate the study of maximizing throughput (total weight or number of successful jobs) in environments when the available capacity fluctuates over time. Our research provides the first constant-factor (i.e., the "gap" between the algorithm's answer and the optimal answer is guaranteed to be a fixed, stable number, regardless of how large the problem gets) approximation algorithms for several variants of this problem, offering a theoretical foundation for building more robust schedulers in volatile cloud environments.
Defining the scheduling problem
Our work focuses on designing a scheduling model that captures a number of key constraints. We consider a single machine (or cluster) with a capacity profile that varies over time. This profile dictates the maximum number of jobs that can run in parallel at any given moment. We are given a collection of potential jobs, each characterized by four key attributes:
- Release time: When the job becomes available to run
- Deadline: A hard deadline by which the job must finish
- Processing time: The duration for which the machine must work on the job
- Weight: The value gained if the job is successfully completed
The goal is to select a subset of jobs and schedule them so that each selected job runs continuously within its valid window. The critical constraint is that any time, the total number of running jobs must not exceed the current capacity. Our objective is to maximize the throughput, i.e., the total weight of all completed jobs.
We address this problem in two distinct environments: - Offline: Where future job arrivals and capacity changes are known in advance.
- Online: Where jobs arrive in real time, and the scheduler must make irrevocable decisions without knowledge of future arrivals. The capacity landscape is still known in advance.
Results for the offline setting
In the offline setting, where we can plan ahead, we find that simple strategies perform surprisingly well. Because finding the optimal schedule in this setting is considered “NP-hard” (i.e., there is no known shortcut to find the perfect answer), we focus on algorithms with rigorous approximation guarantees. We analyze a myopic strategy, called Greedy, that iteratively schedules the job that would finish earliest and prove that this simple heuristic achieves a 1/2-approximation (in this context, often as good as it gets) when jobs have unit profits. This means that even in the worst-case scenario with adversarially chosen jobs and capacity profiles, our simple algorithm is guaranteed to schedule at least half of the optimal number of jobs. This matches the guarantees enjoyed by Greedy on simpler, unit capacity machines that can only do one task at a time. When different jobs can have different associated profits, we utilize a primal-dual framework to achieve a 1/4-approximation.
Results for the online setting
The real complexity lies in the online setting, where jobs arrive dynamically and the scheduler must make immediate, irrevocable decisions without knowing what jobs will arrive next. We quantified the performance of an online algorithm via its competitive ratio, which is the worst case comparison between the throughput of our online algorithm and the throughput of an optimal algorithm that is aware of all the jobs apriori.
The standard non-preemptive algorithms fail completely here as their competitive ratio approaches zero. This happens because a single bad decision of scheduling a long job can ruin the possibility of scheduling many future smaller jobs. In this example, if you imagine that each completed job brings equal weight, regardless of its length, completing many short jobs is much more profitable than completing one long job.
To make the online problem solvable and reflect real-world flexibility, we studied two models that allow an active job to be interrupted if a better opportunity arises (though only jobs restarted and later completed non-preemptively count as successful).
Interruption with restarts
In this model, an online algorithm is allowed to interrupt a currently executing job. While the partial work already performed on the interrupted job is lost, the job itself remains in the system and can be retried.
We found that the flexibility provided by allowing job restarts is highly beneficial. A variant of Greedy that iteratively schedules the job that finishes earliest continues to achieve a 1/2-competitive ratio, matching the result in the offline setting.
Interruption without restarts
In this stricter model, all work performed on the interrupted job is lost and the job itself is discarded forever. Unfortunately, we find that in this strict model, any online algorithm can encounter a sequence of jobs that forces it into decisions which prevent it from satisfying much more work in the future. Once again, the competitive ratio of all online algorithms approaches zero. Analyzing the above hard instances led us to focus on the practical scenario where all jobs share a common deadline (e.g., all data processing must finish by the nightly batch run). For such common deadline instances, we devise novel constant competitive algorithms. Our algorithm is very intuitive and we describe the algorithm here for the simple setting of a unit capacity profile, i.e., we can schedule a single job at any time.
In this setting, our algorithm maintains a tentative schedule by assigning the jobs that have already arrived to disjoint time intervals. When a new job arrives, the algorithm modifies the tentative schedule by taking the first applicable action out of the following four actions:- Add the job to the tentative schedule by placing it in an empty interval
- Replace another future job from the tentative schedule if the new job is significantly smaller
- Interrupt the currently executing job if the new job is smaller than the remaining time of the executing job
- Discard the newly arrived job
Our main result shows that a generalization of this algorithm to arbitrary capacity profiles gives the first constant competitive ratio for this problem. Formally, we obtain a competitive ratio of 1/11. While guaranteeing to schedule only ~9% of the optimal number of jobs sounds like a weak guarantee, this is a worst-case guarantee that holds even in the most adversarial situations.
Conclusion and future directions
As cloud infrastructure becomes more dynamic, the assumption of static capacity in scheduling algorithms no longer holds. This paper initiates the formal study of throughput maximization under time-varying capacity, bridging the gap between theoretical scheduling models and the fluctuating reality of data centers.
While we establish strong constant factor approximations, there is still room for growth. The gap between our 1/11 competitive ratio for the online setting and the theoretical upper bound of 1/2 suggests that more efficient algorithms may exist. Exploring randomized algorithms or scenarios with imperfect knowledge of future capacity could yield even better results for real-world applications.
Acknowledgments
This represents joint work with Aniket Murhekar (University of Illinois at Urbana Champaign), Zoya Svitkina (Google Research), Erik Vee (Google Research), and Joshua Wang (Google Research).
文章标题:在变化的世界中调度:如何利用时变能力实现吞吐量最大化
文章链接:https://www.qimuai.cn/?post=3278
本站文章均为原创,未经授权请勿用于任何商业用途