<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kubernetes on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/kubernetes/</link><description>Recent content in Kubernetes on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Thu, 30 Apr 2026 11:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/kubernetes/index.xml" rel="self" type="application/rss+xml"/><item><title>Playbook：每个 PR 一个独立环境——X-env header 路由 + 三层清理保障（深度版）</title><link>https://socake.github.io/playbook/per-pr-isolated-environment/</link><pubDate>Thu, 30 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/per-pr-isolated-environment/</guid><description>QA 共享环境是并行开发的最大瓶颈。本 Playbook 给出一套已经在多个业务服务上线、跑通端到端真实代码改动验证的 PR 隔离方案：feature 分支推送即触发 deploy.py 在独立 namespace 拉起 PR Pod，入口域名继续用 QA 域名，HTTPRoute 按 X-env header 把流量切到对应 PR Pod，关闭 PR + 24h cron + 容量水位三层清理避免泄漏。本版（v2 深度版）相对 v1 重点强化了可执行性：所有 yaml 是完整 manifest（含 namespace / RBAC / Secret），所有脚本都能 chmod +x 直接跑，每步含前置 / 执行 / 验证 / 回滚四件套，配 5 个完整踩坑修复 + 2 张 mermaid 图。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/per-pr-isolated-environment/featured.jpg"/></item><item><title>Playbook：中等规模公司的完整 DevOps 流程——从代码提交到生产部署的全链路设计</title><link>https://socake.github.io/playbook/end-to-end-devops-pipeline/</link><pubDate>Thu, 30 Apr 2026 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/end-to-end-devops-pipeline/</guid><description>中等规模公司的 DevOps 体系最常见的两个症状：工具碎片化（GitLab + Jenkins + 手工 kubectl）和阶段衔接断裂（PR 慢、合并后部署延迟、监控滞后）。本文不讲入门概念，给一份真实可落地的全流程蓝图：开发者本机 → Git 提交 → 云效 / GitHub Actions CI（含 Schema Check 双 Stage）→ ECR/ACR → GitOps 仓库自动更新镜像 tag → ArgoCD 自动 sync → K8s 多集群部署 → Prometheus + Loki + 钉钉告警。每个环节标注用什么工具具体到版本号，关键集成点（ApplicationSet / Kustomize overlay / deploy.py）给完整可执行配置，配三个真实坑（GitOps 闭环缺口、deploy.py path-mode 切换混乱、多 ArgoCD 凭据路由），并给出 DORA 风格的 before/after 对比与采集脚本。可以把这篇当成整个 Playbook 系列的目录页。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/end-to-end-devops-pipeline/featured.jpg"/></item><item><title>Linux 火焰图实战：从采集到定位问题</title><link>https://socake.github.io/posts/linux-flame-graph-practice/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-flame-graph-practice/</guid><description>CPU 飙高、响应慢、内存泄漏——这三类问题用火焰图都能快速定位。本文从怎么读火焰图开始，讲到 perf、async-profiler、py-spy 各自的适用场景，最后用一个真实的 Go 服务案例走完完整排查流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-flame-graph-practice/featured.jpg"/></item><item><title>OpenCost 实战：Kubernetes 成本可见性与多团队费用分摊</title><link>https://socake.github.io/posts/opencost-kubernetes-cost-visibility/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opencost-kubernetes-cost-visibility/</guid><description>Kubernetes 成本不透明是 FinOps 落地的最大障碍。本文通过 OpenCost 构建完整的成本可见性体系，涵盖部署集成、云厂商价格接入、按团队分摊、Grafana 看板、超预算告警和自动周报推送，提供可直接复用的配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opencost-kubernetes-cost-visibility/featured.jpg"/></item><item><title>Argo Workflows 工作流实战：批处理与 ML Pipeline</title><link>https://socake.github.io/posts/argo-workflows-practice/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argo-workflows-practice/</guid><description>Argo Workflows 是 Kubernetes 原生的工作流引擎，适合批处理和 ML Pipeline 场景。本文涵盖与 Airflow/Temporal 的选型对比、核心资源模型、三个完整实战（DAG 数据处理、ML 训练 Pipeline、定时备份）、资源管控（Semaphore/Node Selector）、Argo Events 事件驱动触发，以及 Prometheus 监控和常见问题处理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argo-workflows-practice/featured.jpg"/></item><item><title>Kubernetes cgroup v2 迁移实践</title><link>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</guid><description>K8s 1.25+ 默认启用 cgroup v2，MemoryQoS 和 PSI 等新特性只在 v2 支持。本文给出完整的节点迁移操作流程和常见问题解决方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-cgroup-v2-migration/featured.jpg"/></item><item><title>USE Method：系统性能分析方法论</title><link>https://socake.github.io/posts/use-method-performance-analysis/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/use-method-performance-analysis/</guid><description>随机尝试是性能排查的大敌。USE Method 用一个三维框架（使用率/饱和度/错误）把所有系统资源纳入统一分析体系，本文从原理到实战全面解析这套方法论，并提供 K8s 环境下的 PromQL 映射和工具链速查表。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/use-method-performance-analysis/featured.jpg"/></item><item><title>bpftrace 实战：线上问题排查的瑞士军刀</title><link>https://socake.github.io/posts/bpftrace-performance-debug/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/bpftrace-performance-debug/</guid><description>strace 太重、perf 太原始、BCC 工具集要装一堆依赖——bpftrace 是这三者之间的平衡点。本文用四个真实场景讲清楚 bpftrace 的工作方式，帮你把它变成日常排查工具。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/bpftrace-performance-debug/featured.jpg"/></item><item><title>FinOps 实践：Kubernetes 成本治理体系建设</title><link>https://socake.github.io/posts/finops-kubernetes-cost-governance/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/finops-kubernetes-cost-governance/</guid><description>一套完整的 Kubernetes FinOps 落地路径：如何识别僵尸资源、配置成本分摊模型、利用 Karpenter 降低节点成本，以及如何将月账单从 $50k 压到 $30k。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/finops-kubernetes-cost-governance/featured.jpg"/></item><item><title>gRPC 微服务实践：协议、负载均衡与 Kubernetes 集成</title><link>https://socake.github.io/posts/grpc-microservices-practice/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grpc-microservices-practice/</guid><description>从协议原理到 Kubernetes 生产落地，系统梳理 gRPC 微服务的核心实践：Protobuf 向后兼容设计、拦截器链（日志/限流/OTel）、长连接负载不均问题（headless Service + round_robin vs Envoy L7）、健康检查 Probe 配置、以及 grpc-gateway REST 共存方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grpc-microservices-practice/featured.jpg"/></item><item><title>Kubernetes v1.33 新特性深度解读：GA 特性全览与升级指南</title><link>https://socake.github.io/posts/kubernetes-v133-features/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-v133-features/</guid><description>Kubernetes v1.33 带来了多项重量级 GA 特性，本文深入解读 In-Place Pod Vertical Scaling、原生 Sidecar Containers、Pod Scheduling Readiness、KMS v2 加密等核心变更，并提供实际可用的配置示例和生产升级建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-v133-features/featured.jpg"/></item><item><title>Service Mesh 技术选型：Istio vs Cilium vs Linkerd 深度对比</title><link>https://socake.github.io/posts/service-mesh-comparison/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/service-mesh-comparison/</guid><description>Istio、Cilium Service Mesh、Linkerd 三种方案各有侧重：Istio 功能最全但最重，Cilium 基于 eBPF 性能最优，Linkerd 最轻量最易运维。本文从架构、性能、功能、运维四个维度全面拆解，帮助架构师做出有数据支撑的选型决策。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/service-mesh-comparison/featured.jpg"/></item><item><title>从 Ingress 迁移到 Gateway API：完整实操指南</title><link>https://socake.github.io/posts/ingress-to-gateway-api-migration/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ingress-to-gateway-api-migration/</guid><description>Gateway API 是 Kubernetes 官方下一代流量入口标准，解决了 Ingress 注解泛滥、跨实现不可移植等历史遗留问题。本文带你从零完成生产迁移。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ingress-to-gateway-api-migration/featured.jpg"/></item><item><title>Flagger 渐进式交付实战：金丝雀、蓝绿、A/B 与 Istio/NGINX/Gateway API 集成</title><link>https://socake.github.io/posts/flagger-progressive-delivery/</link><pubDate>Sat, 11 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/flagger-progressive-delivery/</guid><description>传统的 kubectl apply 发布方式让风险集中在发布那一刻。Flagger 通过指标驱动的渐进式切流（Canary Analysis），把风险摊到整个发布过程，异常自动回滚。本文基于官方文档，系统讲解 Canary CR 的完整字段、三种策略的配置模板、与 Istio/NGINX Ingress/Gateway API 的集成、自定义指标分析、自动化回滚机制，以及与 Argo Rollouts 的选型对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/flagger-progressive-delivery/featured.jpg"/></item><item><title>Temporal 分布式工作流引擎实战：Worker、Activity、重试语义与生产部署</title><link>https://socake.github.io/posts/temporal-workflow-engine/</link><pubDate>Wed, 08 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/temporal-workflow-engine/</guid><description>长流程业务编排历来头疼——状态机、定时器、补偿、幂等、失败恢复都要自己写。Temporal 用 event sourcing + 确定性 replay 把这些问题一次性解决。本文以 Go SDK 为主线，从编程模型、Workflow 确定性约束、Activity 重试、Signal/Query、child workflow、到生产集群部署、监控和容量规划，给出可直接落地的范式。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/temporal-workflow-engine/featured.jpg"/></item><item><title>故障排查实录：Terway CRD IPAM IP 泄漏导致 Pod 无法调度</title><link>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</link><pubDate>Tue, 07 Apr 2026 09:54:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</guid><description>一次真实的连锁故障：节点磁盘告警 → Pod 被驱逐 → Terway IPAM IP 未正常回收 → 节点 ENI IP 耗尽 → 新 Pod 无法调度。排查链路、根因分析与修复方案完整记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/featured.jpg"/></item><item><title>Tetragon eBPF 运行时安全实战：进程/网络/文件策略、与 Falco 的对比</title><link>https://socake.github.io/posts/tetragon-runtime-security/</link><pubDate>Thu, 02 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tetragon-runtime-security/</guid><description>Kubernetes 运行时安全是传统 EDR 难以覆盖的盲区。Tetragon 用 eBPF 在内核态采集进程、网络、文件和系统调用事件，并能在内核就地阻断攻击动作。本文从架构原理出发，讲解 TracingPolicy 语法、典型攻击检测（反弹 shell、提权、敏感文件访问）、阻断机制、性能开销，以及它与 Falco 的差异。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tetragon-runtime-security/featured.jpg"/></item><item><title>Ollama 在 K8s 上跑大模型：本地 LLM 的运维实践</title><link>https://socake.github.io/posts/ollama-kubernetes-llm/</link><pubDate>Mon, 30 Mar 2026 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ollama-kubernetes-llm/</guid><description>在 Kubernetes 上部署 Ollama 运行本地大模型，从 GPU 调度到 CPU 推理降级，再到运维场景的实际集成，记录完整的踩坑与实践过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ollama-kubernetes-llm/featured.jpg"/></item><item><title>GitHub Copilot 工程化使用：不只是代码补全</title><link>https://socake.github.io/posts/github-copilot-engineering/</link><pubDate>Sat, 28 Mar 2026 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/github-copilot-engineering/</guid><description>GitHub Copilot不只是Tab补全。Copilot Chat的/fix /explain /tests命令、workspace上下文、Copilot for CLI、在Terraform/Dockerfile/K8s YAML中的实际用法，以及提高补全命中率的技巧。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/github-copilot-engineering/featured.jpg"/></item><item><title>Volcano 批调度实战：AI 训练集群的 Gang Scheduling、队列与抢占</title><link>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</link><pubDate>Wed, 25 Mar 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</guid><description>K8s 默认调度器对 AI 训练极不友好。Volcano 把 HPC 调度理念搬进 K8s：Gang Scheduling、Queue、Fairshare、Preemption、拓扑亲和。这篇讲清楚它在 AI 训练集群的落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/volcano-gpu-batch-scheduling/featured.jpg"/></item><item><title>FluxCD vs ArgoCD 深度对比与迁移实战：架构、语义、多租户与选型决策</title><link>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</link><pubDate>Sun, 22 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</guid><description>GitOps 的两条主流路线——FluxCD 与 ArgoCD——在架构、语义、运维成本和扩展性上有显著差异。本文基于官方文档和生产实战，按同步模型、应用抽象、多租户隔离、Helm 支持、可观测性、扩展机制逐项对比，给出选型决策树，并提供一套可复用的从 ArgoCD 迁移到 FluxCD 的操作手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/fluxcd-vs-argocd-migration/featured.jpg"/></item><item><title>Linux 内核网络参数深度调优：高并发场景实战</title><link>https://socake.github.io/posts/linux-kernel-network-tuning/</link><pubDate>Fri, 20 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-kernel-network-tuning/</guid><description>在高并发场景下，Linux 默认内核参数往往成为系统瓶颈。本文从原理出发，系统讲解 TCP backlog、TIME_WAIT、keepalive、内存缓冲区、conntrack、网卡队列（RSS/RPS/RFS）的调优方法，并提供 K8s 节点专属的 sysctl DaemonSet 方案和完整的压测验证流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-kernel-network-tuning/featured.jpg"/></item><item><title>Tekton Pipelines 企业级落地：从 Task 抽象到供应链签名</title><link>https://socake.github.io/posts/tekton-pipelines-production/</link><pubDate>Thu, 15 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tekton-pipelines-production/</guid><description>Jenkins 扛不动 K8s Native 的调度压力，GitLab Runner 又太 monolithic。Tekton 把 &amp;lsquo;CI job&amp;rsquo; 拆成 Task + Pipeline + PipelineRun 三层 CRD，所有执行都是 Pod，天然贴合 K8s。本文讲清楚它在企业里该怎么用——以及怎么避免把它用成 YAML 地狱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tekton-pipelines-production/featured.jpg"/></item><item><title>LLM 生产服务化：vLLM 部署与 GPU 推理优化实战</title><link>https://socake.github.io/posts/llm-production-serving-vllm/</link><pubDate>Tue, 13 Jan 2026 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-production-serving-vllm/</guid><description>团队把 Ollama 搬上生产后，高峰期请求排队超过 30 秒，用户纷纷反映 AI 功能不可用。这篇文章记录我们迁移到 vLLM 的全过程，包括 PagedAttention、Continuous Batching 原理，以及 Kubernetes GPU 部署的完整配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-production-serving-vllm/featured.jpg"/></item><item><title>高级运维/DevOps 工程师面试题精选：系统设计与深度考察</title><link>https://socake.github.io/posts/devops-senior-interview/</link><pubDate>Thu, 11 Dec 2025 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-senior-interview/</guid><description>高级运维面试考什么？本文整理 5 道系统设计题和 10 道深度技术题，每题给出答题框架。从监控体系设计到 K8s 调度器原理，从生产事故复盘到新技术引入决策，帮你建立完整的回答思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-senior-interview/featured.jpg"/></item><item><title>云原生存储方案选型：EFS/EBS/OSS 实践</title><link>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</link><pubDate>Tue, 09 Dec 2025 17:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</guid><description>系统梳理 AWS EBS、EFS、S3 在 Kubernetes 中的使用方式，覆盖 StorageClass 配置、动态供给、性能测试与数据备份策略，附阿里云 NAS/OSS 对比。</description></item><item><title>AWS EKS 实战指南</title><link>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</link><pubDate>Tue, 09 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</guid><description>覆盖 EKS 核心架构、eksctl/aws cli 常用操作、IRSA 原理与配置、VPC CNI 网络限制、升级流程及常见故障排查。</description></item><item><title>Helm 使用指南：从入门到生产实践</title><link>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</guid><description>Helm 从入门到生产实践：Chart 结构、values 覆盖、模板语法、&amp;ndash;atomic/&amp;ndash;wait 等生产参数，以及常用 Chart 安装示例。</description></item><item><title>Kubernetes Ingress 配置实践</title><link>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</guid><description>从 Ingress 概念到生产实践：nginx/traefik/ALB 选型对比、TLS 自动签发、canary 灰度发布、限速超时等常用 annotations 详解。</description></item><item><title>Kubernetes 安全加固实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</guid><description>K8s 安全加固从 Pod 到集群：SecurityContext 配置、网络策略隔离、Secret 安全管理、镜像漏洞扫描、RBAC 最小权限原则的落地实践。</description></item><item><title>Kubernetes 故障排查 SOP</title><link>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</guid><description>从现象到根因的 K8s 故障排查全流程：Pod 异常状态、Node NotReady、Service 不通、存储挂载失败等场景的系统化排查方法。</description></item><item><title>Kubernetes 集群升级实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</guid><description>K8s 集群升级全流程：从版本兼容性检查、etcd 备份、EKS 托管升级命令，到节点蓝绿替换、PDB 配置、pluto 工具检测废弃 API，再到常见升级问题处理。</description></item><item><title>Go 运维工具开发实战</title><link>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</guid><description>从零写一个 Go 运维工具：cobra CLI 框架、执行 kubectl 命令、调用 K8s API、配置 zap 日志、viper 配置管理，完整可运行的代码示例</description></item><item><title>Kubernetes HPA/VPA 弹性伸缩配置</title><link>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</guid><description>从 HPA v2 到 KEDA 事件驱动伸缩，覆盖 CPU/内存/自定义指标配置、防抖参数调优、VPA 推荐器集成和生产级弹性伸缩最佳实践。</description></item><item><title>Kubernetes RBAC 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 RBAC 核心概念到生产级多租户权限设计，涵盖 ServiceAccount 最小权限、kubectl auth can-i 排查和命名空间隔离实践。</description></item><item><title>Kubernetes 存储：PV/PVC/StorageClass 实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</guid><description>从 PV/PVC 基础概念到生产级 CSI 配置，涵盖动态供给、StatefulSet 存储、AWS EBS/EFS、阿里云云盘/NAS 以及数据迁移实践。</description></item><item><title>Kubernetes 网络模型与 Service 详解</title><link>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</guid><description>从 K8s 网络基础模型到生产级 Service 配置，覆盖 CNI 插件对比、kube-proxy 模式选择、DNS 解析规则和排查思路。</description></item><item><title>Kubernetes 资源管理：requests/limits/QoS/配额</title><link>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</guid><description>从 CPU throttling 到内存 OOMKill，从 QoS 分类到驱逐优先级，系统梳理 Kubernetes 资源管理机制与生产调优实践。</description></item><item><title>Python 操作 Kubernetes：kubernetes-client 实战</title><link>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</guid><description>系统介绍 Python kubernetes-client 的核心用法，从集群认证到资源操作，最终构建一个完整的 K8s 巡检脚本</description></item><item><title>ArgoCD + Kustomize GitOps 体系实践</title><link>https://socake.github.io/docs/kubernetes/argocd-gitops%E5%AE%9E%E8%B7%B5/</link><pubDate>Mon, 08 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/argocd-gitops%E5%AE%9E%E8%B7%B5/</guid><description>记录在多套 K8s 集群（AWS EKS + 阿里云 ACK）上落地 GitOps 的完整过程：目录结构设计、Kustomize overlay 环境差异管理、ArgoCD ApplicationSet 自动化、以及真实踩过的坑。</description></item><item><title>Karpenter 弹性节点管理实战</title><link>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</link><pubDate>Mon, 08 Dec 2025 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</guid><description>Karpenter 替代 Cluster Autoscaler 的完整实践：NodePool 约束配置、EC2NodeClass 实例选型、consolidation 节点整合降本、Spot 实例容错，以及多套集群配置的组织方式。</description></item><item><title>kubectl 命令速查手册</title><link>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>kubectl 实用命令手册，按场景分类整理，涵盖资源查看、Pod调试、日志查看、滚动更新、扩缩容、强制删除等高频操作。</description></item><item><title>GitHub Actions CI/CD 实战：从镜像构建到 K8s 部署</title><link>https://socake.github.io/docs/cicd/github-actions-%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/github-actions-%E5%AE%9E%E6%88%98/</guid><description>完整的 GitHub Actions CI/CD 流水线设计：Docker 多阶段构建优化、ECR 推送、Kustomize 更新 GitOps 仓库触发 ArgoCD 自动部署，以及多环境（QA/PRE/PROD）的分支策略。</description></item><item><title>Kubernetes 核心架构全景</title><link>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</guid><description>深入理解 Kubernetes 控制面与工作节点各组件的职责与交互关系，结合生产环境实际经验，梳理核心资源对象与调度原理。</description></item><item><title>DevOps/运维工程师面试题精选：K8s、Linux、网络高频考点</title><link>https://socake.github.io/posts/devops-interview-questions/</link><pubDate>Sun, 07 Dec 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-interview-questions/</guid><description>基于真实面试经验整理的运维/DevOps 面试题，覆盖 K8s 调度、故障排查、Linux 内核、网络协议等方向，附「面试官真正想考的点」，帮你把答案说到位。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-interview-questions/featured.jpg"/></item><item><title>Kubernetes Operator 开发实战：Go + controller-runtime 完全指南</title><link>https://socake.github.io/posts/kubernetes-operator-development/</link><pubDate>Wed, 03 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-operator-development/</guid><description>用 Go + controller-runtime 开发生产级 Kubernetes Operator 的完整实战指南。以 DatabaseCluster Operator 为例，深入讲解 CRD 设计、Reconcile 模式、Status Conditions、Finalizer 防孤儿资源、Leader Election、指标暴露、Webhook 验证，以及 envtest + Kind 测试策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-operator-development/featured.jpg"/></item><item><title>Kubernetes 多租户方案深度对比：vCluster vs Capsule vs HNC</title><link>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</link><pubDate>Wed, 03 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</guid><description>Namespace 级隔离远不够用。本文深入剖析 vCluster、Capsule、HNC 三种主流多租户方案的架构差异，给出完整的部署配置示例、隔离能力横向对比，以及 SaaS 平台、内部平台、开发环境三种场景下的选型建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/featured.jpg"/></item><item><title>Kyverno 策略即代码实战：从准入到变异到生成的全场景落地</title><link>https://socake.github.io/posts/kyverno-policy-as-code/</link><pubDate>Fri, 28 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kyverno-policy-as-code/</guid><description>一份基于 Kyverno 1.12+ 的生产落地笔记：覆盖 validate/mutate/generate/verifyImages 四种策略类型的实战用法、CEL 和 JMESPath 表达式语法、策略分层治理、PolicyException、性能调优和常见踩坑，并与 OPA Gatekeeper 做对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kyverno-policy-as-code/featured.jpg"/></item><item><title>Pod Security Standards 生产落地：从 PSP 到 PSA 的迁移实战</title><link>https://socake.github.io/posts/kubernetes-pod-security-standards/</link><pubDate>Fri, 21 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-pod-security-standards/</guid><description>一份从 PSP 迁移到 Pod Security Standards 的实战笔记：对比 Baseline 与 Restricted 两套 profile 的实际约束、Pod Security Admission 的三种 mode、如何一次性迁移 200+ 命名空间、和 Kyverno/OPA 互补使用的最佳实践，以及遗留业务 securityContext 改造的典型模式。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-pod-security-standards/featured.jpg"/></item><item><title>WebAssembly 在云原生中的应用：从浏览器到 K8s 数据面</title><link>https://socake.github.io/posts/webassembly-cloud-native/</link><pubDate>Sat, 08 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/webassembly-cloud-native/</guid><description>WebAssembly 在云原生领域的热度持续上涨，但很多讨论都停留在概念层面。这篇文章试图给出一个务实的视角：Wasm 在哪些云原生场景已经可以生产落地，在哪些场景还需要等待，以及和容器相比的真实差异。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/webassembly-cloud-native/featured.jpg"/></item><item><title>Istio Ambient Mode 无 Sidecar 服务网格实践</title><link>https://socake.github.io/posts/istio-ambient-mesh-practice/</link><pubDate>Sat, 08 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-ambient-mesh-practice/</guid><description>Sidecar 模式已经陪我们走了六七年，但它的问题也越来越难以忽视。Ambient Mode 不是缝缝补补，而是从架构层面重新设计了服务网格的数据面。本文从实际运维视角深入拆解 ztunnel + Waypoint 两层架构，并给出从 Sidecar 迁移到 Ambient 的完整路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-ambient-mesh-practice/featured.jpg"/></item><item><title>Kubernetes GPU 调度实战：AI 训练与推理基础设施</title><link>https://socake.github.io/posts/kubernetes-gpu-scheduling/</link><pubDate>Wed, 05 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gpu-scheduling/</guid><description>GPU 是 AI 基础设施的核心资源，如何在 Kubernetes 上高效调度和管理 GPU 直接影响训练效率和推理成本。本文从底层驱动安装到上层调度策略，完整覆盖 K8s GPU 基础设施的搭建、监控和优化实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gpu-scheduling/featured.jpg"/></item><item><title>Cilium NetworkPolicy 与 L7 过滤生产落地实战</title><link>https://socake.github.io/posts/cilium-network-policy-production/</link><pubDate>Fri, 31 Oct 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cilium-network-policy-production/</guid><description>一份基于 Cilium 1.16+ 的生产落地笔记：讲清楚 Kubernetes NetworkPolicy 的局限、CiliumNetworkPolicy 的扩展能力、L7 HTTP/Kafka/DNS 过滤的真实用法、Hubble 可观测性、策略开发方法论，以及多集群 ClusterMesh 场景下的策略治理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cilium-network-policy-production/featured.jpg"/></item><item><title>CoreDNS 深度排障：K8s DNS 问题完全指南</title><link>https://socake.github.io/posts/coredns-troubleshooting-guide/</link><pubDate>Wed, 29 Oct 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/coredns-troubleshooting-guide/</guid><description>DNS 问题是 K8s 中最难定位的问题之一，因为它的失败往往是间歇性的、有延迟的，看起来像网络问题，实际上是 DNS 超时。本文记录了我在生产环境排查过的多类 DNS 故障，附详细的抓包分析和调优配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/coredns-troubleshooting-guide/featured.jpg"/></item><item><title>TCP/IP 网络排障：抓包与连接问题诊断</title><link>https://socake.github.io/posts/tcp-network-troubleshooting/</link><pubDate>Tue, 21 Oct 2025 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tcp-network-troubleshooting/</guid><description>网络问题排查的核心是「眼见为实」，没有抓包的排障都是猜测。本文系统梳理了 tcpdump 的实战用法、TCP 连接状态机分析、conntrack 追踪，以及 Kubernetes 中 NodePort/LoadBalancer 的典型网络故障定位方法。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tcp-network-troubleshooting/featured.jpg"/></item><item><title>Elasticsearch 集群部署实战：ECK 在 K8s 上的生产级配置</title><link>https://socake.github.io/posts/elasticsearch-cluster-deployment/</link><pubDate>Fri, 19 Sep 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-cluster-deployment/</guid><description>从集群角色规划到 ECK Operator 落地，结合生产环境踩坑经验，完整讲解 Elasticsearch 在 Kubernetes 上的生产级部署方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-cluster-deployment/featured.jpg"/></item><item><title>eBPF 可观测性实践：Cilium 网络监控与 Tetragon 安全审计</title><link>https://socake.github.io/posts/ebpf-observability/</link><pubDate>Wed, 17 Sep 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ebpf-observability/</guid><description>eBPF 正在重塑云原生可观测性的底层基础。本文记录在 K8s 集群中落地 Cilium + Hubble 网络监控和 Tetragon 安全审计的实践经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ebpf-observability/featured.jpg"/></item><item><title>混沌工程实战：Chaos Mesh 在 K8s 中注入故障</title><link>https://socake.github.io/posts/chaos-mesh-practice/</link><pubDate>Sat, 13 Sep 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/chaos-mesh-practice/</guid><description>混沌工程不是破坏系统，而是在可控环境中提前暴露脆弱点。本文记录了我用 Chaos Mesh 在生产级 K8s 集群中设计并执行混沌演练的完整过程，包括安装、实验配置、Workflow 编排和游戏日流程设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/chaos-mesh-practice/featured.jpg"/></item><item><title>OPA/Kyverno：K8s 准入控制策略实战</title><link>https://socake.github.io/posts/opa-kyverno-admission-control/</link><pubDate>Thu, 11 Sep 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opa-kyverno-admission-control/</guid><description>没有准入控制的 K8s 集群就像一个没有门卫的机房——任何人都能随意进出。本文记录了我在多个生产集群部署 Kyverno 策略的实战经验，涵盖资源限制强制、镜像来源白名单、标签规范、以及与 OPA Gatekeeper 的对比选型思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opa-kyverno-admission-control/featured.jpg"/></item><item><title>供应链安全：Trivy 镜像扫描 + Cosign 签名验证实践</title><link>https://socake.github.io/posts/trivy-cosign-supply-chain/</link><pubDate>Sat, 06 Sep 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/trivy-cosign-supply-chain/</guid><description>你的镜像安全吗？本文梳理容器供应链的主要攻击面，手把手演示 Trivy 扫描、Cosign 签名、K8s 准入控制三层防护的搭建过程，并给出 GitLab CI 集成示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/trivy-cosign-supply-chain/featured.jpg"/></item><item><title>用 Go 写 K8s 运维工具：client-go 实战</title><link>https://socake.github.io/posts/go-kubernetes-client-tools/</link><pubDate>Mon, 25 Aug 2025 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/go-kubernetes-client-tools/</guid><description>kubectl 能解决 80% 的日常问题，剩下 20% 需要你自己写工具。本文用实际可运行的 Go 代码，展示如何用 client-go 构建批量重启 Deployment、Pod 资源报告、过期 ConfigMap 清理等运维工具，并用 cobra 封装成 CLI。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/go-kubernetes-client-tools/featured.jpg"/></item><item><title>AWS EKS 生产实践：网络、安全与多集群管理</title><link>https://socake.github.io/posts/aws-eks-best-practices/</link><pubDate>Fri, 22 Aug 2025 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aws-eks-best-practices/</guid><description>管理多套 EKS 集群两年下来，踩了不少坑。本文系统整理网络选型、IAM 权限、节点管理、集群升级、安全加固和成本控制这六个核心话题，每个话题都有具体配置示例和实际遇到的问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aws-eks-best-practices/featured.jpg"/></item><item><title>Kubernetes 成本优化实战：系统性降本的四条路径</title><link>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 18 Aug 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</guid><description>真实的降本案例：从发现成本异常到分析根因，通过 Karpenter 节点弹性伸缩、资源请求规格治理、大机型收敛等手段，系统性降低 AWS EC2 成本。包含具体配置和执行思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/featured.jpg"/></item><item><title>云原生转型实践：从传统运维到 K8s 的迁移经验</title><link>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</link><pubDate>Thu, 14 Aug 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</guid><description>这是一篇个人经验向的文章，记录了从传统虚拟机运维转向 Kubernetes 的全过程：为什么要迁移、迁移中踩了哪些坑、团队如何度过学习曲线，以及回头看哪些事情当时做对了。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/featured.jpg"/></item><item><title>平台工程实践：构建 Internal Developer Platform</title><link>https://socake.github.io/posts/platform-engineering-practice/</link><pubDate>Sun, 10 Aug 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/platform-engineering-practice/</guid><description>平台工程不是给 DevOps 换个名字，而是把基础设施能力产品化——让开发者像用 SaaS 一样消费平台能力。这篇文章记录我们团队从 0 到 MVP 的六个月实践，包括 Backstage 落地、黄金路径设计、以及用 DORA 指标验证平台价值。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/platform-engineering-practice/featured.jpg"/></item><item><title>Cilium Hubble 实战：用 eBPF 看透 Kubernetes 网络</title><link>https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/</link><pubDate>Wed, 30 Jul 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/</guid><description>Cilium Hubble 是 Kubernetes 下最接近交换机镜像端口的东西。本文讲清楚它的架构、关键配置和生产上如何读 flow 定位网络问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/featured.jpg"/></item><item><title>Thanos 实战：多 K8s 集群 Prometheus 统一监控与长期存储</title><link>https://socake.github.io/posts/thanos-multi-cluster/</link><pubDate>Sat, 26 Jul 2025 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/thanos-multi-cluster/</guid><description>记录我们将三套 EKS 集群的独立 Prometheus 迁移到 Thanos 统一监控体系的全过程，重点覆盖选型决策、生产配置和踩坑总结。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/thanos-multi-cluster/featured.jpg"/></item><item><title>Kubernetes NetworkPolicy 网络隔离实战</title><link>https://socake.github.io/posts/kubernetes-network-policy/</link><pubDate>Sun, 15 Jun 2025 09:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-network-policy/</guid><description>系统讲解 Kubernetes NetworkPolicy 的工作机制与生产实战配置，覆盖 deny-all 基础模板、常见隔离场景、Cilium 扩展、多租户设计、测试验证方法及常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-network-policy/featured.jpg"/></item><item><title>Helm 工程化实践：从 Chart 设计到多环境管理</title><link>https://socake.github.io/posts/helm-engineering-practice/</link><pubDate>Sat, 14 Jun 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/helm-engineering-practice/</guid><description>基于生产踩坑经验，系统梳理 Helm Chart 结构设计、_helpers.tpl 复用技巧、多环境 values 管理策略、私有 Harbor 仓库推送流程，以及 &amp;ndash;atomic 升级与回滚的正确姿势。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/helm-engineering-practice/featured.jpg"/></item><item><title>Karpenter 深度解析：下一代 K8s 节点自动扩缩</title><link>https://socake.github.io/posts/karpenter-deep-dive/</link><pubDate>Wed, 11 Jun 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karpenter-deep-dive/</guid><description>从 Cluster Autoscaler 迁移到 Karpenter 之后，集群扩容速度和节点利用率都有明显提升。本文详细拆解 Karpenter 的核心机制、关键配置项，以及在多套生产集群运行中踩过的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karpenter-deep-dive/featured.jpg"/></item><item><title>Istio Service Mesh 落地实战：从 Sidecar 注入到灰度发布</title><link>https://socake.github.io/posts/istio-service-mesh-practice/</link><pubDate>Fri, 06 Jun 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-service-mesh-practice/</guid><description>记录 Istio Service Mesh 从零落地的完整过程，包括 sidecar 注入原理、VirtualService 灰度发布流量切分、DestinationRule 熔断与负载均衡配置、PeerAuthentication mTLS 加固，以及用 istioctl analyze 排查常见问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-service-mesh-practice/featured.jpg"/></item><item><title>GitOps 落地实战：ArgoCD + Kustomize 多环境管理</title><link>https://socake.github.io/posts/gitops-argocd/</link><pubDate>Tue, 03 Jun 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/gitops-argocd/</guid><description>GitOps 不只是「把配置放 Git 里」，真正落地需要解决 overlay 结构设计、ApplicationSet 管理多集群、image updater 自动化，以及 sync wave、resource hook 这些细节。这篇文章记录我们团队从传统 CI/CD 迁移到 GitOps 的实际过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/gitops-argocd/featured.jpg"/></item><item><title>ArgoCD 高级模式：ApplicationSet、Sync Waves 与 GitOps 企业级实践</title><link>https://socake.github.io/posts/argocd-advanced-patterns/</link><pubDate>Tue, 27 May 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argocd-advanced-patterns/</guid><description>从 ApplicationSet 的四种 Generator 到 Sync Waves 控制数据库迁移顺序，再到 Image Updater 打通 ECR 自动触发 GitOps 流程，这篇文章覆盖 ArgoCD 在企业级多集群环境下的高级用法和常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argocd-advanced-patterns/featured.jpg"/></item><item><title>多集群 Kubernetes 运维：跨集群管理与统一可观测</title><link>https://socake.github.io/posts/multi-cluster-k8s-management/</link><pubDate>Wed, 21 May 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multi-cluster-k8s-management/</guid><description>从单集群到多集群，运维复杂度不是线性增加，而是指数级。这篇文章总结了我们管理跨地域、跨环境多套 K8s 集群的实际经验：如何用 ArgoCD ApplicationSet 统一部署、如何用 Thanos 聚合多集群指标、以及一次真实的跨集群迁移过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multi-cluster-k8s-management/featured.jpg"/></item><item><title>业务上云实战：传统应用容器化迁移的踩坑与经验</title><link>https://socake.github.io/posts/kubernetes-migration-practice/</link><pubDate>Mon, 19 May 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-migration-practice/</guid><description>把一批跑在虚拟机上的 Java 应用迁移到 Kubernetes，踩过的坑比想象中多。本文记录整个迁移过程的关键决策和教训。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-migration-practice/featured.jpg"/></item><item><title>Kubernetes 集群升级策略：零停机升级的完整实践指南</title><link>https://socake.github.io/posts/kubernetes-upgrade-strategy/</link><pubDate>Wed, 14 May 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-upgrade-strategy/</guid><description>K8s 集群升级听起来简单，实际操作中坑很多：API 弃用导致的 Helm 失败、Admission Webhook 拦截升级流量、PDB 配置不当导致服务中断。这篇文章从真实的升级经验出发，给出一套可复用的零停机升级方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-upgrade-strategy/featured.jpg"/></item><item><title>K8s Gateway API：告别 Ingress，拥抱下一代流量路由</title><link>https://socake.github.io/posts/kubernetes-gateway-api/</link><pubDate>Mon, 12 May 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gateway-api/</guid><description>Gateway API 已经 GA，是时候认真考虑从 Ingress 迁移了。本文梳理 Gateway API 的设计理念、实际配置示例和迁移注意事项。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gateway-api/featured.jpg"/></item><item><title>Kubernetes 存储体系生产实践：PV/PVC/StorageClass 全解</title><link>https://socake.github.io/posts/kubernetes-storage-practice/</link><pubDate>Tue, 06 May 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-storage-practice/</guid><description>从存储基础概念到生产实战，覆盖 StorageClass 动态供给配置、AWS EBS 和 EFS CSI 驱动安装、StatefulSet 存储管理、PVC 在线扩容操作、跨 AZ 挂载失败排查，以及有状态服务数据迁移方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-storage-practice/featured.jpg"/></item><item><title>从 Nginx Ingress 迁移到 Traefik：为什么换，怎么换</title><link>https://socake.github.io/posts/traefik-vs-nginx-ingress/</link><pubDate>Sun, 27 Apr 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/traefik-vs-nginx-ingress/</guid><description>从实际痛点出发，讲清楚 Traefik 和 Nginx Ingress 的本质区别，给出可直接参考的迁移路径和配置示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/traefik-vs-nginx-ingress/featured.jpg"/></item><item><title>ETCD 运维实战：部署、备份恢复与 K8s 集群数据管理</title><link>https://socake.github.io/posts/etcd-ops-practice/</link><pubDate>Sun, 13 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/etcd-ops-practice/</guid><description>ETCD 是 Kubernetes 的命脉，所有集群状态都存储在这里。本文从实际运维角度梳理部署、备份、恢复和配置动态更新的完整操作链路，包含多个踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/etcd-ops-practice/featured.jpg"/></item><item><title>自研 Kubernetes Admission Webhook 开发实战：从零到生产</title><link>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</link><pubDate>Sat, 12 Apr 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</guid><description>Kubernetes 的 admission 体系是一个强大但脆弱的扩展点。webhook 挂了能让集群所有 Pod 创建卡死。写一个能上生产的 webhook 不难，但要让它在面对各种怪异请求、证书轮换、集群升级、大流量突发时都不挂，就是另一回事了。这是一份从零到生产的工程笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-admission-webhook-dev/featured.jpg"/></item><item><title>Cluster API 实战：用声明式的方式管理 Kubernetes 集群的生命周期</title><link>https://socake.github.io/posts/cluster-api-infrastructure/</link><pubDate>Sat, 05 Apr 2025 14:15:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cluster-api-infrastructure/</guid><description>用 Terraform 建集群是起手式，但集群一旦多起来 Terraform 的代码量和状态管理开始爆炸。Cluster API 把&amp;rsquo;集群&amp;rsquo;本身做成了 Kubernetes CRD——你在 Management Cluster 里 kubectl apply 一个 Cluster 对象，就能得到一个新集群。这是 Kubernetes 治理 Kubernetes 的一种优雅解法。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cluster-api-infrastructure/featured.jpg"/></item><item><title>KubeVirt 生产实战：在 Kubernetes 上跑虚拟机的完整路线</title><link>https://socake.github.io/posts/kubevirt-vm-on-kubernetes/</link><pubDate>Sat, 29 Mar 2025 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubevirt-vm-on-kubernetes/</guid><description>Broadcom 吃掉 VMware 之后，VMware 替代方案成了所有基础设施团队的议题。KubeVirt 1.8 已经是个相当成熟的选择，能在 Kubernetes 里跑真正的 VM——不是轻量容器、不是 microVM，是完整的 Windows/Linux VM。这是一年多的实战笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubevirt-vm-on-kubernetes/featured.jpg"/></item><item><title>Descheduler 深度实战：Kubernetes 自动再平衡的正确打开方式</title><link>https://socake.github.io/posts/descheduler-workload-rebalance/</link><pubDate>Sat, 22 Mar 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/descheduler-workload-rebalance/</guid><description>kube-scheduler 只在 Pod 创建那一刻做决策，之后集群状态变了它就不管了。几个月下来，你的集群会变成 hot node + cold node 混杂、同一个 Deployment 的 Pod 全挤在一个 node、failure-domain 完全失衡。Descheduler 就是把调度决策后置、周期性重新评估的那只手。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/descheduler-workload-rebalance/featured.jpg"/></item><item><title>Kueue 批处理调度实战：让 Kubernetes 真正承担 AI/HPC 工作负载</title><link>https://socake.github.io/posts/kueue-batch-workload/</link><pubDate>Sat, 15 Mar 2025 09:40:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kueue-batch-workload/</guid><description>把 AI 训练任务塞进 Kubernetes，第一天你会发现原生调度器完全不够用：没有队列、没有 quota、没有 gang scheduling、没有公平共享、preemption 语义一塌糊涂。Kueue 是 sig-scheduling 官方给出的答案，它比 Volcano 更贴近 Kubernetes 原生、比自研 controller 更成熟。这是一份真实的生产笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kueue-batch-workload/featured.jpg"/></item><item><title>Prometheus 服务发现深度解析：kubernetes_sd_configs 实战</title><link>https://socake.github.io/posts/prometheus-service-discovery/</link><pubDate>Sat, 15 Mar 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-service-discovery/</guid><description>在 K8s 环境里手动维护 Prometheus scrape targets 是不现实的，kubernetes_sd_configs 配合 relabel_configs 是解决这个问题的核心机制。本文从原理到实践，把这套体系讲透。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-service-discovery/featured.jpg"/></item><item><title>vcluster 虚拟集群实战：比 namespace 强一百倍的多租户方案</title><link>https://socake.github.io/posts/vcluster-virtual-cluster/</link><pubDate>Sat, 08 Mar 2025 15:10:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vcluster-virtual-cluster/</guid><description>namespace 不是隔离边界，它只是一层命名约定。ClusterRole、CRD、webhook、LimitRange 全都穿透 namespace。真正的多租户需要每个租户有自己的 kube-apiserver。vcluster 让这件事便宜到几乎免费——一个 namespace 里起一个完整的 Kubernetes 控制平面。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vcluster-virtual-cluster/featured.jpg"/></item><item><title>EFK 日志系统实战：Fluent Bit + Fluentd + Elasticsearch 完整部署</title><link>https://socake.github.io/posts/efk-logging-practice/</link><pubDate>Wed, 05 Mar 2025 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/efk-logging-practice/</guid><description>讲清楚为什么要 Fluent Bit + Fluentd 两层架构，给出可直接参考的完整 ConfigMap 配置和 ES 索引模板设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/efk-logging-practice/featured.jpg"/></item><item><title>Karmada 多集群联邦实战：PropagationPolicy、OverridePolicy 与 FailOver 的真实用法</title><link>https://socake.github.io/posts/karmada-multi-cluster/</link><pubDate>Sun, 02 Mar 2025 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karmada-multi-cluster/</guid><description>如果你有 2 个以上 Kubernetes 集群，跨集群发同一个应用这件事迟早成为你的日常。Karmada 是 CNCF 孵化项目里做多集群联邦最完整的一个，但它的 CRD 设计比较克制，生产要用得好，得理清资源分发、差异覆盖、调度和 failover 四层语义。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karmada-multi-cluster/featured.jpg"/></item><item><title>Kubernetes 日志采集方案选型：从技术对比到生产落地</title><link>https://socake.github.io/posts/k8s-logging-solution/</link><pubDate>Tue, 25 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-logging-solution/</guid><description>记录我们团队从无到有建立 Kubernetes 日志采集系统的完整历程，最终选择 Fluent Bit + Fluentd + Elasticsearch 方案的技术依据，以及生产环境踩过的那些坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-logging-solution/featured.jpg"/></item><item><title>ExternalDNS 多云 DNS 同步实战：从 Route53 到 Cloudflare 再到阿里云 DNS</title><link>https://socake.github.io/posts/external-dns-multi-provider/</link><pubDate>Sat, 22 Feb 2025 09:45:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/external-dns-multi-provider/</guid><description>手工在 Cloudflare 控制台点 DNS 记录这件事，随着集群和业务增长最终必然崩溃。ExternalDNS 就是把 Kubernetes 资源当 source-of-truth、DNS provider 当执行器的一个 controller。但真要用好，你得理解 txtOwnerId、policy、provider 各自的限制以及跨集群共享 zone 的几个坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/external-dns-multi-provider/featured.jpg"/></item><item><title>Secret 管理实战：HashiCorp Vault + External Secrets Operator</title><link>https://socake.github.io/posts/vault-external-secrets/</link><pubDate>Thu, 20 Feb 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vault-external-secrets/</guid><description>base64 不是加密。本文从 Secret 泄露风险说起，完整介绍 Vault 核心概念、K8s 部署方式、ESO 集成配置，以及动态数据库凭证的自动轮换实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vault-external-secrets/featured.jpg"/></item><item><title>cert-manager 生产级实战：从 Let's Encrypt 到企业内网 PKI 的完整路线</title><link>https://socake.github.io/posts/cert-manager-production/</link><pubDate>Sat, 15 Feb 2025 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cert-manager-production/</guid><description>cert-manager 几乎是每个 Kubernetes 集群的标配，但真正跑到生产的团队都会遇到：Let&amp;rsquo;s Encrypt 限流被打爆、通配符证书续期失败、内部服务想要私有 CA、Istio / Gateway API 的证书怎么发。这篇把一年里我在 5 个集群上做 cert-manager 运维踩过的坑写成一份实操手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cert-manager-production/featured.jpg"/></item><item><title>KEDA 事件驱动弹性伸缩实战：从 HPA 的尽头到真正按业务信号扩缩</title><link>https://socake.github.io/posts/keda-event-driven-autoscaling/</link><pubDate>Sat, 08 Feb 2025 10:12:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/keda-event-driven-autoscaling/</guid><description>HPA 只能看 CPU/内存，但生产环境真正的扩缩信号往往是 Kafka lag、RabbitMQ 队列深度、Prometheus 自定义指标、甚至 cron。本文把 KEDA 的架构、核心 CRD、常见 scaler 的坑和运维动作写成一份资深工程师的备忘录，不讲理论，只讲什么样的配置能在凌晨 3 点把你从告警里救出来。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/keda-event-driven-autoscaling/featured.jpg"/></item><item><title>GitLab CI/CD + Kubernetes：从代码提交到生产部署全流程</title><link>https://socake.github.io/posts/gitlab-ci-kubernetes/</link><pubDate>Sat, 01 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/gitlab-ci-kubernetes/</guid><description>从 GitLab Runner 的 Kubernetes executor 配置，到 kaniko 替代 DinD 的镜像构建方案，再到通过更新 GitOps 仓库完成生产部署——记录一套在真实 AWS EKS 环境跑通的 CI/CD 全流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/gitlab-ci-kubernetes/featured.jpg"/></item><item><title>Jenkins + Kubernetes：动态 Agent 构建与流水线最佳实践</title><link>https://socake.github.io/posts/jenkins-kubernetes-cicd/</link><pubDate>Sun, 26 Jan 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/jenkins-kubernetes-cicd/</guid><description>静态 Jenkins Slave 的资源浪费和配置混乱问题，在 Kubernetes 动态 Pod Agent 模式下得到根本解决。本文记录在真实生产环境中把 Jenkins 迁移到 K8s 的完整过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/jenkins-kubernetes-cicd/featured.jpg"/></item><item><title>Kubernetes RBAC 安全加固实战：最小权限到 NetworkPolicy</title><link>https://socake.github.io/posts/kubernetes-rbac-security/</link><pubDate>Fri, 24 Jan 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-rbac-security/</guid><description>从真实安全事件出发，系统讲解 Kubernetes RBAC 最小权限设计、ClusterRole 与 Role 的适用场景、审计日志分析 RBAC 问题的方法，以及 NetworkPolicy 实现命名空间和 Pod 级别的网络隔离。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-rbac-security/featured.jpg"/></item><item><title>Kubernetes YAML 工程化：常用资源模板与生产最佳实践</title><link>https://socake.github.io/posts/kubernetes-yaml-patterns/</link><pubDate>Sun, 19 Jan 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-yaml-patterns/</guid><description>写好 Kubernetes YAML 不只是语法问题，更多是工程经验的沉淀。本文梳理了生产环境中常见的 YAML 反模式，并给出各类资源的完整可用模板。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-yaml-patterns/featured.jpg"/></item><item><title>Kubernetes 资源管理实战——QoS、ResourceQuota、VPA 体系化实践</title><link>https://socake.github.io/posts/kubernetes-resource-management/</link><pubDate>Thu, 16 Jan 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-resource-management/</guid><description>我在生产中见过太多因为资源配置不当导致的事故：不设 limits 的服务把节点内存吃光导致 OOM 驱逐、requests 设得过高导致 Pod 调度不上去、HPA 配置错误导致扩缩失灵。这篇文章把 K8s 资源管理体系从头到尾捋一遍，让你建立完整的资源治理思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-resource-management/featured.jpg"/></item><item><title>Kubernetes 网络深度解析——CNI、kube-proxy、NetworkPolicy 完全指南</title><link>https://socake.github.io/posts/kubernetes-networking-deep-dive/</link><pubDate>Fri, 10 Jan 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-networking-deep-dive/</guid><description>K8s 网络是很多工程师的知识盲区，平时不出问题就忽略，一出问题就完全不知道从哪下手。我在多次生产网络故障的排查中，深刻理解了 K8s 网络的每一层。这篇文章从 Pod 网络模型讲到 NetworkPolicy 实战，帮你建立完整的 K8s 网络知识体系。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-networking-deep-dive/featured.jpg"/></item><item><title>Rook-Ceph on Kubernetes 运维实战：从部署到故障恢复</title><link>https://socake.github.io/posts/ceph-rook-kubernetes/</link><pubDate>Fri, 13 Dec 2024 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ceph-rook-kubernetes/</guid><description>当你需要在 Kubernetes 上提供 block、file、object 三种存储时，Rook-Ceph 是几乎没有替代品的方案。但它的复杂度也是所有 K8s 存储方案里最高的。这篇文章是我在一套裸金属 Rook-Ceph 生产集群上两年运维经验的整理，包括几次把集群从悬崖边拉回来的复盘。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ceph-rook-kubernetes/featured.jpg"/></item><item><title>Kubernetes 从零开始：工程师视角的入门指南</title><link>https://socake.github.io/posts/kubernetes-beginner-guide/</link><pubDate>Sun, 20 Oct 2024 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-beginner-guide/</guid><description>Docker Compose 能运行多个容器，为什么还需要 Kubernetes？本文从这个问题出发，用类比的方式讲清楚 Pod/Deployment/Service/Ingress 等核心概念，给出最常用的 kubectl 命令和完整的入门部署示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-beginner-guide/featured.jpg"/></item></channel></rss>