<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>CUDA on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/cuda/</link><description>Recent content in CUDA on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Sat, 07 Mar 2026 14:20:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/cuda/index.xml" rel="self" type="application/rss+xml"/><item><title>TensorRT-LLM 推理加速实战：从 engine 编译到 kernel 调优</title><link>https://socake.github.io/posts/tensorrt-llm-inference/</link><pubDate>Sat, 07 Mar 2026 14:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tensorrt-llm-inference/</guid><description>TensorRT-LLM 是 NVIDIA 端到端推理栈的关键一环，这篇把 engine 编译流程、plugin 机制、量化策略、inflight batching、kernel 调优和生产踩坑都梳理清楚。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tensorrt-llm-inference/featured.jpg"/></item></channel></rss>