<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kubernetes on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/categories/kubernetes/</link><description>Recent content in Kubernetes on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Sun, 12 Apr 2026 11:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/categories/kubernetes/index.xml" rel="self" type="application/rss+xml"/><item><title>Kubernetes cgroup v2 迁移实践</title><link>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</guid><description>K8s 1.25+ 默认启用 cgroup v2，MemoryQoS 和 PSI 等新特性只在 v2 支持。本文给出完整的节点迁移操作流程和常见问题解决方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-cgroup-v2-migration/featured.jpg"/></item><item><title>Kubernetes v1.33 新特性深度解读：GA 特性全览与升级指南</title><link>https://socake.github.io/posts/kubernetes-v133-features/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-v133-features/</guid><description>Kubernetes v1.33 带来了多项重量级 GA 特性，本文深入解读 In-Place Pod Vertical Scaling、原生 Sidecar Containers、Pod Scheduling Readiness、KMS v2 加密等核心变更，并提供实际可用的配置示例和生产升级建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-v133-features/featured.jpg"/></item><item><title>从 Ingress 迁移到 Gateway API：完整实操指南</title><link>https://socake.github.io/posts/ingress-to-gateway-api-migration/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ingress-to-gateway-api-migration/</guid><description>Gateway API 是 Kubernetes 官方下一代流量入口标准，解决了 Ingress 注解泛滥、跨实现不可移植等历史遗留问题。本文带你从零完成生产迁移。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ingress-to-gateway-api-migration/featured.jpg"/></item><item><title>故障排查实录：Terway CRD IPAM IP 泄漏导致 Pod 无法调度</title><link>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</link><pubDate>Tue, 07 Apr 2026 09:54:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</guid><description>一次真实的连锁故障：节点磁盘告警 → Pod 被驱逐 → Terway IPAM IP 未正常回收 → 节点 ENI IP 耗尽 → 新 Pod 无法调度。排查链路、根因分析与修复方案完整记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/featured.jpg"/></item><item><title>云原生存储方案选型：EFS/EBS/OSS 实践</title><link>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</link><pubDate>Tue, 09 Dec 2025 17:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</guid><description>系统梳理 AWS EBS、EFS、S3 在 Kubernetes 中的使用方式，覆盖 StorageClass 配置、动态供给、性能测试与数据备份策略，附阿里云 NAS/OSS 对比。</description></item><item><title>AWS IAM 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/aws-iam%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/aws-iam%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 IAM 核心概念到 IRSA/GitHub Actions OIDC 联合身份，再到权限边界与 SCP，系统梳理 AWS IAM 在生产环境的最佳实践。</description></item><item><title>AWS EKS 实战指南</title><link>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</link><pubDate>Tue, 09 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</guid><description>覆盖 EKS 核心架构、eksctl/aws cli 常用操作、IRSA 原理与配置、VPC CNI 网络限制、升级流程及常见故障排查。</description></item><item><title>Helm 使用指南：从入门到生产实践</title><link>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</guid><description>Helm 从入门到生产实践：Chart 结构、values 覆盖、模板语法、&amp;ndash;atomic/&amp;ndash;wait 等生产参数，以及常用 Chart 安装示例。</description></item><item><title>Kubernetes Ingress 配置实践</title><link>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</guid><description>从 Ingress 概念到生产实践：nginx/traefik/ALB 选型对比、TLS 自动签发、canary 灰度发布、限速超时等常用 annotations 详解。</description></item><item><title>Kubernetes 安全加固实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</guid><description>K8s 安全加固从 Pod 到集群：SecurityContext 配置、网络策略隔离、Secret 安全管理、镜像漏洞扫描、RBAC 最小权限原则的落地实践。</description></item><item><title>Kubernetes 故障排查 SOP</title><link>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</guid><description>从现象到根因的 K8s 故障排查全流程：Pod 异常状态、Node NotReady、Service 不通、存储挂载失败等场景的系统化排查方法。</description></item><item><title>Kubernetes 集群升级实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</guid><description>K8s 集群升级全流程：从版本兼容性检查、etcd 备份、EKS 托管升级命令，到节点蓝绿替换、PDB 配置、pluto 工具检测废弃 API，再到常见升级问题处理。</description></item><item><title>Kubernetes HPA/VPA 弹性伸缩配置</title><link>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</guid><description>从 HPA v2 到 KEDA 事件驱动伸缩，覆盖 CPU/内存/自定义指标配置、防抖参数调优、VPA 推荐器集成和生产级弹性伸缩最佳实践。</description></item><item><title>Kubernetes RBAC 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 RBAC 核心概念到生产级多租户权限设计，涵盖 ServiceAccount 最小权限、kubectl auth can-i 排查和命名空间隔离实践。</description></item><item><title>Kubernetes 存储：PV/PVC/StorageClass 实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</guid><description>从 PV/PVC 基础概念到生产级 CSI 配置，涵盖动态供给、StatefulSet 存储、AWS EBS/EFS、阿里云云盘/NAS 以及数据迁移实践。</description></item><item><title>Kubernetes 网络模型与 Service 详解</title><link>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</guid><description>从 K8s 网络基础模型到生产级 Service 配置，覆盖 CNI 插件对比、kube-proxy 模式选择、DNS 解析规则和排查思路。</description></item><item><title>Kubernetes 资源管理：requests/limits/QoS/配额</title><link>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</guid><description>从 CPU throttling 到内存 OOMKill，从 QoS 分类到驱逐优先级，系统梳理 Kubernetes 资源管理机制与生产调优实践。</description></item><item><title>Prometheus + Grafana + Loki 可观测性体系建设</title><link>https://socake.github.io/docs/kubernetes/%E5%8F%AF%E8%A7%82%E6%B5%8B%E6%80%A7%E5%BB%BA%E8%AE%BE/</link><pubDate>Mon, 08 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/%E5%8F%AF%E8%A7%82%E6%B5%8B%E6%80%A7%E5%BB%BA%E8%AE%BE/</guid><description>记录在多套 K8s 集群上建立统一可观测性平台的实践经验，包含 Prometheus 采集配置、告警规则设计、Grafana Dashboard 组织方式，以及跨集群日志聚合的 Loki 部署方案。</description></item><item><title>Karpenter 弹性节点管理实战</title><link>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</link><pubDate>Mon, 08 Dec 2025 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</guid><description>Karpenter 替代 Cluster Autoscaler 的完整实践：NodePool 约束配置、EC2NodeClass 实例选型、consolidation 节点整合降本、Spot 实例容错，以及多套集群配置的组织方式。</description></item><item><title>kubectl 命令速查手册</title><link>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>kubectl 实用命令手册，按场景分类整理，涵盖资源查看、Pod调试、日志查看、滚动更新、扩缩容、强制删除等高频操作。</description></item><item><title>Kubernetes 核心架构全景</title><link>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</guid><description>深入理解 Kubernetes 控制面与工作节点各组件的职责与交互关系，结合生产环境实际经验，梳理核心资源对象与调度原理。</description></item><item><title>Kubernetes Operator 开发实战：Go + controller-runtime 完全指南</title><link>https://socake.github.io/posts/kubernetes-operator-development/</link><pubDate>Wed, 03 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-operator-development/</guid><description>用 Go + controller-runtime 开发生产级 Kubernetes Operator 的完整实战指南。以 DatabaseCluster Operator 为例，深入讲解 CRD 设计、Reconcile 模式、Status Conditions、Finalizer 防孤儿资源、Leader Election、指标暴露、Webhook 验证，以及 envtest + Kind 测试策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-operator-development/featured.jpg"/></item><item><title>Kubernetes 多租户方案深度对比：vCluster vs Capsule vs HNC</title><link>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</link><pubDate>Wed, 03 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</guid><description>Namespace 级隔离远不够用。本文深入剖析 vCluster、Capsule、HNC 三种主流多租户方案的架构差异，给出完整的部署配置示例、隔离能力横向对比，以及 SaaS 平台、内部平台、开发环境三种场景下的选型建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/featured.jpg"/></item><item><title>零信任网络改造：从公网暴露到 Headscale VPN</title><link>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</link><pubDate>Sat, 22 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</guid><description>从发现公网暴露的安全隐患开始，到用 Headscale 自建零信任网络，替代跳板机体系，实现 kubectl 和运维系统的 VPN 接入。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/featured.jpg"/></item><item><title>如何设计一个好的告警体系</title><link>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</link><pubDate>Tue, 18 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</guid><description>从真实的告警噪音泛滥经历出发，分享如何用 SLI/SLO 重新设计告警体系，包括告警分级、规则设计原则、路由策略和复盘机制。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/featured.jpg"/></item><item><title>Kubernetes GPU 调度实战：AI 训练与推理基础设施</title><link>https://socake.github.io/posts/kubernetes-gpu-scheduling/</link><pubDate>Wed, 05 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gpu-scheduling/</guid><description>GPU 是 AI 基础设施的核心资源，如何在 Kubernetes 上高效调度和管理 GPU 直接影响训练效率和推理成本。本文从底层驱动安装到上层调度策略，完整覆盖 K8s GPU 基础设施的搭建、监控和优化实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gpu-scheduling/featured.jpg"/></item><item><title>CoreDNS 深度排障：K8s DNS 问题完全指南</title><link>https://socake.github.io/posts/coredns-troubleshooting-guide/</link><pubDate>Wed, 29 Oct 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/coredns-troubleshooting-guide/</guid><description>DNS 问题是 K8s 中最难定位的问题之一，因为它的失败往往是间歇性的、有延迟的，看起来像网络问题，实际上是 DNS 超时。本文记录了我在生产环境排查过的多类 DNS 故障，附详细的抓包分析和调优配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/coredns-troubleshooting-guide/featured.jpg"/></item><item><title>混沌工程实战：Chaos Mesh 在 K8s 中注入故障</title><link>https://socake.github.io/posts/chaos-mesh-practice/</link><pubDate>Sat, 13 Sep 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/chaos-mesh-practice/</guid><description>混沌工程不是破坏系统，而是在可控环境中提前暴露脆弱点。本文记录了我用 Chaos Mesh 在生产级 K8s 集群中设计并执行混沌演练的完整过程，包括安装、实验配置、Workflow 编排和游戏日流程设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/chaos-mesh-practice/featured.jpg"/></item><item><title>OPA/Kyverno：K8s 准入控制策略实战</title><link>https://socake.github.io/posts/opa-kyverno-admission-control/</link><pubDate>Thu, 11 Sep 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opa-kyverno-admission-control/</guid><description>没有准入控制的 K8s 集群就像一个没有门卫的机房——任何人都能随意进出。本文记录了我在多个生产集群部署 Kyverno 策略的实战经验，涵盖资源限制强制、镜像来源白名单、标签规范、以及与 OPA Gatekeeper 的对比选型思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opa-kyverno-admission-control/featured.jpg"/></item><item><title>Kubernetes 成本优化实战：系统性降本的四条路径</title><link>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 18 Aug 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</guid><description>真实的降本案例：从发现成本异常到分析根因，通过 Karpenter 节点弹性伸缩、资源请求规格治理、大机型收敛等手段，系统性降低 AWS EC2 成本。包含具体配置和执行思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/featured.jpg"/></item><item><title>平台工程实践：构建 Internal Developer Platform</title><link>https://socake.github.io/posts/platform-engineering-practice/</link><pubDate>Sun, 10 Aug 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/platform-engineering-practice/</guid><description>平台工程不是给 DevOps 换个名字，而是把基础设施能力产品化——让开发者像用 SaaS 一样消费平台能力。这篇文章记录我们团队从 0 到 MVP 的六个月实践，包括 Backstage 落地、黄金路径设计、以及用 DORA 指标验证平台价值。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/platform-engineering-practice/featured.jpg"/></item><item><title>SLO/SLI/Error Budget 从理论到落地：SRE 可靠性工程实战</title><link>https://socake.github.io/posts/slo-sli-error-budget-practice/</link><pubDate>Fri, 01 Aug 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/slo-sli-error-budget-practice/</guid><description>从 SLI 指标选取到 Error Budget 消耗速率告警，系统讲解 SRE 可靠性工程体系的落地实践，包括 Prometheus recording rules 计算 SLI、多窗口 burn rate 告警规则配置、SLO 违规复盘流程，以及与开发团队的协作策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/slo-sli-error-budget-practice/featured.jpg"/></item><item><title>Kubernetes NetworkPolicy 网络隔离实战</title><link>https://socake.github.io/posts/kubernetes-network-policy/</link><pubDate>Sun, 15 Jun 2025 09:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-network-policy/</guid><description>系统讲解 Kubernetes NetworkPolicy 的工作机制与生产实战配置，覆盖 deny-all 基础模板、常见隔离场景、Cilium 扩展、多租户设计、测试验证方法及常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-network-policy/featured.jpg"/></item><item><title>Helm 工程化实践：从 Chart 设计到多环境管理</title><link>https://socake.github.io/posts/helm-engineering-practice/</link><pubDate>Sat, 14 Jun 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/helm-engineering-practice/</guid><description>基于生产踩坑经验，系统梳理 Helm Chart 结构设计、_helpers.tpl 复用技巧、多环境 values 管理策略、私有 Harbor 仓库推送流程，以及 &amp;ndash;atomic 升级与回滚的正确姿势。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/helm-engineering-practice/featured.jpg"/></item><item><title>Karpenter 深度解析：下一代 K8s 节点自动扩缩</title><link>https://socake.github.io/posts/karpenter-deep-dive/</link><pubDate>Wed, 11 Jun 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karpenter-deep-dive/</guid><description>从 Cluster Autoscaler 迁移到 Karpenter 之后，集群扩容速度和节点利用率都有明显提升。本文详细拆解 Karpenter 的核心机制、关键配置项，以及在多套生产集群运行中踩过的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karpenter-deep-dive/featured.jpg"/></item><item><title>Istio Service Mesh 落地实战：从 Sidecar 注入到灰度发布</title><link>https://socake.github.io/posts/istio-service-mesh-practice/</link><pubDate>Fri, 06 Jun 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-service-mesh-practice/</guid><description>记录 Istio Service Mesh 从零落地的完整过程，包括 sidecar 注入原理、VirtualService 灰度发布流量切分、DestinationRule 熔断与负载均衡配置、PeerAuthentication mTLS 加固，以及用 istioctl analyze 排查常见问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-service-mesh-practice/featured.jpg"/></item><item><title>GitOps 落地实战：ArgoCD + Kustomize 多环境管理</title><link>https://socake.github.io/posts/gitops-argocd/</link><pubDate>Tue, 03 Jun 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/gitops-argocd/</guid><description>GitOps 不只是「把配置放 Git 里」，真正落地需要解决 overlay 结构设计、ApplicationSet 管理多集群、image updater 自动化，以及 sync wave、resource hook 这些细节。这篇文章记录我们团队从传统 CI/CD 迁移到 GitOps 的实际过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/gitops-argocd/featured.jpg"/></item><item><title>多集群 Kubernetes 运维：跨集群管理与统一可观测</title><link>https://socake.github.io/posts/multi-cluster-k8s-management/</link><pubDate>Wed, 21 May 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multi-cluster-k8s-management/</guid><description>从单集群到多集群，运维复杂度不是线性增加，而是指数级。这篇文章总结了我们管理跨地域、跨环境多套 K8s 集群的实际经验：如何用 ArgoCD ApplicationSet 统一部署、如何用 Thanos 聚合多集群指标、以及一次真实的跨集群迁移过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multi-cluster-k8s-management/featured.jpg"/></item><item><title>业务上云实战：传统应用容器化迁移的踩坑与经验</title><link>https://socake.github.io/posts/kubernetes-migration-practice/</link><pubDate>Mon, 19 May 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-migration-practice/</guid><description>把一批跑在虚拟机上的 Java 应用迁移到 Kubernetes，踩过的坑比想象中多。本文记录整个迁移过程的关键决策和教训。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-migration-practice/featured.jpg"/></item><item><title>Kubernetes 集群升级策略：零停机升级的完整实践指南</title><link>https://socake.github.io/posts/kubernetes-upgrade-strategy/</link><pubDate>Wed, 14 May 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-upgrade-strategy/</guid><description>K8s 集群升级听起来简单，实际操作中坑很多：API 弃用导致的 Helm 失败、Admission Webhook 拦截升级流量、PDB 配置不当导致服务中断。这篇文章从真实的升级经验出发，给出一套可复用的零停机升级方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-upgrade-strategy/featured.jpg"/></item><item><title>K8s Gateway API：告别 Ingress，拥抱下一代流量路由</title><link>https://socake.github.io/posts/kubernetes-gateway-api/</link><pubDate>Mon, 12 May 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gateway-api/</guid><description>Gateway API 已经 GA，是时候认真考虑从 Ingress 迁移了。本文梳理 Gateway API 的设计理念、实际配置示例和迁移注意事项。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gateway-api/featured.jpg"/></item><item><title>Kubernetes 存储体系生产实践：PV/PVC/StorageClass 全解</title><link>https://socake.github.io/posts/kubernetes-storage-practice/</link><pubDate>Tue, 06 May 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-storage-practice/</guid><description>从存储基础概念到生产实战，覆盖 StorageClass 动态供给配置、AWS EBS 和 EFS CSI 驱动安装、StatefulSet 存储管理、PVC 在线扩容操作、跨 AZ 挂载失败排查，以及有状态服务数据迁移方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-storage-practice/featured.jpg"/></item><item><title>从 Nginx Ingress 迁移到 Traefik：为什么换，怎么换</title><link>https://socake.github.io/posts/traefik-vs-nginx-ingress/</link><pubDate>Sun, 27 Apr 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/traefik-vs-nginx-ingress/</guid><description>从实际痛点出发，讲清楚 Traefik 和 Nginx Ingress 的本质区别，给出可直接参考的迁移路径和配置示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/traefik-vs-nginx-ingress/featured.jpg"/></item><item><title>ETCD 运维实战：部署、备份恢复与 K8s 集群数据管理</title><link>https://socake.github.io/posts/etcd-ops-practice/</link><pubDate>Sun, 13 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/etcd-ops-practice/</guid><description>ETCD 是 Kubernetes 的命脉，所有集群状态都存储在这里。本文从实际运维角度梳理部署、备份、恢复和配置动态更新的完整操作链路，包含多个踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/etcd-ops-practice/featured.jpg"/></item><item><title>自研 Kubernetes Admission Webhook 开发实战：从零到生产</title><link>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</link><pubDate>Sat, 12 Apr 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</guid><description>Kubernetes 的 admission 体系是一个强大但脆弱的扩展点。webhook 挂了能让集群所有 Pod 创建卡死。写一个能上生产的 webhook 不难，但要让它在面对各种怪异请求、证书轮换、集群升级、大流量突发时都不挂，就是另一回事了。这是一份从零到生产的工程笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-admission-webhook-dev/featured.jpg"/></item><item><title>Descheduler 深度实战：Kubernetes 自动再平衡的正确打开方式</title><link>https://socake.github.io/posts/descheduler-workload-rebalance/</link><pubDate>Sat, 22 Mar 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/descheduler-workload-rebalance/</guid><description>kube-scheduler 只在 Pod 创建那一刻做决策，之后集群状态变了它就不管了。几个月下来，你的集群会变成 hot node + cold node 混杂、同一个 Deployment 的 Pod 全挤在一个 node、failure-domain 完全失衡。Descheduler 就是把调度决策后置、周期性重新评估的那只手。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/descheduler-workload-rebalance/featured.jpg"/></item><item><title>Kueue 批处理调度实战：让 Kubernetes 真正承担 AI/HPC 工作负载</title><link>https://socake.github.io/posts/kueue-batch-workload/</link><pubDate>Sat, 15 Mar 2025 09:40:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kueue-batch-workload/</guid><description>把 AI 训练任务塞进 Kubernetes，第一天你会发现原生调度器完全不够用：没有队列、没有 quota、没有 gang scheduling、没有公平共享、preemption 语义一塌糊涂。Kueue 是 sig-scheduling 官方给出的答案，它比 Volcano 更贴近 Kubernetes 原生、比自研 controller 更成熟。这是一份真实的生产笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kueue-batch-workload/featured.jpg"/></item><item><title>Kubernetes 日志采集方案选型：从技术对比到生产落地</title><link>https://socake.github.io/posts/k8s-logging-solution/</link><pubDate>Tue, 25 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-logging-solution/</guid><description>记录我们团队从无到有建立 Kubernetes 日志采集系统的完整历程，最终选择 Fluent Bit + Fluentd + Elasticsearch 方案的技术依据，以及生产环境踩过的那些坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-logging-solution/featured.jpg"/></item><item><title>Kubernetes RBAC 安全加固实战：最小权限到 NetworkPolicy</title><link>https://socake.github.io/posts/kubernetes-rbac-security/</link><pubDate>Fri, 24 Jan 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-rbac-security/</guid><description>从真实安全事件出发，系统讲解 Kubernetes RBAC 最小权限设计、ClusterRole 与 Role 的适用场景、审计日志分析 RBAC 问题的方法，以及 NetworkPolicy 实现命名空间和 Pod 级别的网络隔离。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-rbac-security/featured.jpg"/></item><item><title>Kubernetes YAML 工程化：常用资源模板与生产最佳实践</title><link>https://socake.github.io/posts/kubernetes-yaml-patterns/</link><pubDate>Sun, 19 Jan 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-yaml-patterns/</guid><description>写好 Kubernetes YAML 不只是语法问题，更多是工程经验的沉淀。本文梳理了生产环境中常见的 YAML 反模式，并给出各类资源的完整可用模板。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-yaml-patterns/featured.jpg"/></item><item><title>Kubernetes 资源管理实战——QoS、ResourceQuota、VPA 体系化实践</title><link>https://socake.github.io/posts/kubernetes-resource-management/</link><pubDate>Thu, 16 Jan 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-resource-management/</guid><description>我在生产中见过太多因为资源配置不当导致的事故：不设 limits 的服务把节点内存吃光导致 OOM 驱逐、requests 设得过高导致 Pod 调度不上去、HPA 配置错误导致扩缩失灵。这篇文章把 K8s 资源管理体系从头到尾捋一遍，让你建立完整的资源治理思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-resource-management/featured.jpg"/></item><item><title>Kubernetes 网络深度解析——CNI、kube-proxy、NetworkPolicy 完全指南</title><link>https://socake.github.io/posts/kubernetes-networking-deep-dive/</link><pubDate>Fri, 10 Jan 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-networking-deep-dive/</guid><description>K8s 网络是很多工程师的知识盲区，平时不出问题就忽略，一出问题就完全不知道从哪下手。我在多次生产网络故障的排查中，深刻理解了 K8s 网络的每一层。这篇文章从 Pod 网络模型讲到 NetworkPolicy 实战，帮你建立完整的 K8s 网络知识体系。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-networking-deep-dive/featured.jpg"/></item><item><title>Kubernetes 从零开始：工程师视角的入门指南</title><link>https://socake.github.io/posts/kubernetes-beginner-guide/</link><pubDate>Sun, 20 Oct 2024 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-beginner-guide/</guid><description>Docker Compose 能运行多个容器，为什么还需要 Kubernetes？本文从这个问题出发，用类比的方式讲清楚 Pod/Deployment/Service/Ingress 等核心概念，给出最常用的 kubectl 命令和完整的入门部署示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-beginner-guide/featured.jpg"/></item></channel></rss>