报告一
报告题目:A Generic approach to scheduling and checkpointing workflows
报 告 人:Frédéric Vivien professor
主 持 人:刘静 教授
报告时间:2018年6月20日 周三 13:30-15:00
报告地点:中北校区理科楼B1102
报告人简介:
Frédéric Vivien is a Professor at INRIA and leads the INRIA project-team ROMA which focuses on scheduling strategies and algorithm design for large-scale platforms. His main research interests are scheduling techniques and parallel algorithms for distributed and/or heterogeneous systems.
报告摘要:
This work deals with scheduling and checkpointing strategies to execute scientific workflows on failure-prone large-scale platforms. To the best of our knowledge, this work is the first to target fail-stop errors for arbitrary workflows. Most previous work addresses soft errors, which corrupt the task being executed by a processor but do not cause the entire memory of that processor to be lost, contrarily to fail-stop errors. We revisit classical mapping heuristics such as HEFT and MinMin and complement them with several checkpointing strategies. The objective is to derive an efficient trade-off between checkpointing every task (CkptAll), which is an overkill when failures are rare events, and checkpointing no task (CkptNone), which induces dramatic re-execution overhead even when only a few failures strike during execution. Contrarily to previous work, our approach applies to arbitrary workflows, not just special classes of dependence graphs such as M-SPGs (Minimal Series-Parallel Graphs). Extensive experiments report significant gain over both CkptAll and CkptNone, for a wide variety of workflows.
报告二
报告题目:Online Scheduling of Task Graphs on Hybrid Platforms
报 告 人:Louis-Claude Canon Scientific Researcher
主 持 人:刘静 教授
报告时间: 2018年6月20日 15:00-16:30
报告地点:中北校区理科楼B1102
报告人简介:
Louis-Claude Canon is an Research Fellow in the FEMTO-ST laboratory at Université Franche Comté in Besançon. He is on leave at ENS Lyon from 1996 to 1998. His research interests are scheduling, stochastic optimization, and reproducible research.
报告摘要:
Modern computing platforms commonly include accelerators. We target the problem of scheduling applications modeled as task graphs on hybrid platforms made of two types of resources, such as CPUs and GPUs. We consider that task graphs are uncovered dynamically, and that the scheduler has information only on the available tasks, i.e., tasks whose predecessors have all been completed. Each task can be processed by either a CPU or a GPU, and the corresponding processing times are known. Our study extends a previous 4 m/k-competitive online algorithm, where m is the number of CPUs and k the number of GPUs (m ≥ k). We prove p that no online algorithm can have a competitive ratio smaller than m/k. We also study how adding flexibility on task processing, such as task migration or spoliation, or increasing the knowledge of the scheduler by providing it with information on the task graph, influences the lower bound. We provide a (2 m/k + 1)-competitive algorithm as well as a tunable combination of a system-oriented heuristic and a competitive algorithm; this combination performs well in practice and has a competitive ratio in Θ( m/k). Finally, simulations on different sets of task graphs illustrate how the instance properties impact the performance of the studied algorithms and show that our proposed tunable algorithm performs the best among the online algorithms in almost all cases and has even performance close to an offline algorithm.