<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GPU on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/gpu/</link><description>Recent content in GPU on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Mon, 30 Mar 2026 09:08:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>Ollama 在 K8s 上跑大模型：本地 LLM 的运维实践</title><link>https://socake.github.io/posts/ollama-kubernetes-llm/</link><pubDate>Mon, 30 Mar 2026 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ollama-kubernetes-llm/</guid><description>在 Kubernetes 上部署 Ollama 运行本地大模型，从 GPU 调度到 CPU 推理降级，再到运维场景的实际集成，记录完整的踩坑与实践过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ollama-kubernetes-llm/featured.jpg"/></item><item><title>Volcano 批调度实战：AI 训练集群的 Gang Scheduling、队列与抢占</title><link>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</link><pubDate>Wed, 25 Mar 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</guid><description>K8s 默认调度器对 AI 训练极不友好。Volcano 把 HPC 调度理念搬进 K8s：Gang Scheduling、Queue、Fairshare、Preemption、拓扑亲和。这篇讲清楚它在 AI 训练集群的落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/volcano-gpu-batch-scheduling/featured.jpg"/></item><item><title>LLM 生产服务化：vLLM 部署与 GPU 推理优化实战</title><link>https://socake.github.io/posts/llm-production-serving-vllm/</link><pubDate>Tue, 13 Jan 2026 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-production-serving-vllm/</guid><description>团队把 Ollama 搬上生产后，高峰期请求排队超过 30 秒，用户纷纷反映 AI 功能不可用。这篇文章记录我们迁移到 vLLM 的全过程，包括 PagedAttention、Continuous Batching 原理，以及 Kubernetes GPU 部署的完整配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-production-serving-vllm/featured.jpg"/></item><item><title>Kubernetes GPU 调度实战：AI 训练与推理基础设施</title><link>https://socake.github.io/posts/kubernetes-gpu-scheduling/</link><pubDate>Wed, 05 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gpu-scheduling/</guid><description>GPU 是 AI 基础设施的核心资源，如何在 Kubernetes 上高效调度和管理 GPU 直接影响训练效率和推理成本。本文从底层驱动安装到上层调度策略，完整覆盖 K8s GPU 基础设施的搭建、监控和优化实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gpu-scheduling/featured.jpg"/></item></channel></rss>