Triton on 黄文卓 | DevOps Engineer

Triton on 黄文卓 | DevOps Engineerhttps://socake.github.io/tags/triton/Recent content in Triton on 黄文卓 | DevOps EngineerHugo -- gohugo.iozh-CN17691281867@163.com (Wenzhuo Huang)17691281867@163.com (Wenzhuo Huang)© 2026 Wenzhuo HuangSun, 22 Mar 2026 09:15:00 +0800Unsloth 高效微调实战：单卡 QLoRA 的极致性能与内部原理https://socake.github.io/posts/unsloth-efficient-finetuning/Sun, 22 Mar 2026 09:15:00 +080017691281867@163.com (Wenzhuo Huang)https://socake.github.io/posts/unsloth-efficient-finetuning/Unsloth 用手写 Triton kernel 把单卡 LoRA 微调速度和显存压到极致。本文讲清 Unsloth 的原理、和 LLaMA Factory/TRL 的组合用法，以及真实使用的坑。Triton Inference Server 生产部署：模型编排、动态批处理与多框架混部https://socake.github.io/posts/triton-inference-server-production/Wed, 11 Mar 2026 10:00:00 +080017691281867@163.com (Wenzhuo Huang)https://socake.github.io/posts/triton-inference-server-production/把 Triton 从一个陌生的 NVIDIA 推理服务器讲清楚：model repository、backend、动态批处理、ensemble、BLS、Python backend、生产监控和踩坑实录。