<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>批处理 on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/%E6%89%B9%E5%A4%84%E7%90%86/</link><description>Recent content in 批处理 on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Sun, 12 Apr 2026 11:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/%E6%89%B9%E5%A4%84%E7%90%86/index.xml" rel="self" type="application/rss+xml"/><item><title>Argo Workflows 工作流实战：批处理与 ML Pipeline</title><link>https://socake.github.io/posts/argo-workflows-practice/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argo-workflows-practice/</guid><description>Argo Workflows 是 Kubernetes 原生的工作流引擎，适合批处理和 ML Pipeline 场景。本文涵盖与 Airflow/Temporal 的选型对比、核心资源模型、三个完整实战（DAG 数据处理、ML 训练 Pipeline、定时备份）、资源管控（Semaphore/Node Selector）、Argo Events 事件驱动触发，以及 Prometheus 监控和常见问题处理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argo-workflows-practice/featured.jpg"/></item><item><title>Volcano 批调度实战：AI 训练集群的 Gang Scheduling、队列与抢占</title><link>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</link><pubDate>Wed, 25 Mar 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</guid><description>K8s 默认调度器对 AI 训练极不友好。Volcano 把 HPC 调度理念搬进 K8s：Gang Scheduling、Queue、Fairshare、Preemption、拓扑亲和。这篇讲清楚它在 AI 训练集群的落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/volcano-gpu-batch-scheduling/featured.jpg"/></item><item><title>Kueue 批处理调度实战：让 Kubernetes 真正承担 AI/HPC 工作负载</title><link>https://socake.github.io/posts/kueue-batch-workload/</link><pubDate>Sat, 15 Mar 2025 09:40:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kueue-batch-workload/</guid><description>把 AI 训练任务塞进 Kubernetes，第一天你会发现原生调度器完全不够用：没有队列、没有 quota、没有 gang scheduling、没有公平共享、preemption 语义一塌糊涂。Kueue 是 sig-scheduling 官方给出的答案，它比 Volcano 更贴近 Kubernetes 原生、比自研 controller 更成熟。这是一份真实的生产笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kueue-batch-workload/featured.jpg"/></item></channel></rss>