<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>首页 on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/</link><description>Recent content in 首页 on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Thu, 30 Apr 2026 17:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/index.xml" rel="self" type="application/rss+xml"/><item><title>Playbook：多云告警体系合并实战 —— 从 200 条规则混战到统一治理</title><link>https://socake.github.io/playbook/multi-cloud-alerting-consolidation/</link><pubDate>Thu, 30 Apr 2026 17:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/multi-cloud-alerting-consolidation/</guid><description>做告警最常见的状态不是没告警，而是有两套甚至三套并行运行的告警系统，渠道交叉、规则重叠、silence 写得到处都是。本文给出从混乱状态收敛成统一治理的完整路径，包含可直接 1:1 复制部署的全量 yaml、脚本与配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/multi-cloud-alerting-consolidation/featured.jpg"/></item><item><title>Playbook：K8s 成本优化实战——Karpenter + 弹性占位 + 精细 NodePool 的组合拳</title><link>https://socake.github.io/playbook/k8s-cost-optimization-karpenter/</link><pubDate>Thu, 30 Apr 2026 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/k8s-cost-optimization-karpenter/</guid><description>Karpenter 不是开箱即用的省钱按钮。把它跑出真实收益，需要先做 NodePool 按 workload 分层，再处理 sandbox/gpu 这类不被 K8s 识别的工作负载，最后用 placeholder 占位 Pod 弥合「扩容慢但缩容快」的体验缺口。本文给出可直接 kubectl apply 的完整 yaml 与可 chmod +x 直接跑的脚本，覆盖安装、四类 NodePool、弹性占位、S3 Gateway Endpoint、MQ 降级、监控与告警。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/k8s-cost-optimization-karpenter/featured.jpg"/></item><item><title>Playbook：AWS Aurora 公网入口收紧的渐进路径——从 0.0.0.0/0 到零信任</title><link>https://socake.github.io/playbook/aurora-public-access-tightening/</link><pubDate>Thu, 30 Apr 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/aurora-public-access-tightening/</guid><description>很多团队的生产 Aurora 长期挂着 0.0.0.0/0 全协议规则，加上几条来源不明的 IP 白名单。直接删规则会立刻打断跨 Region 服务和开发者本地调试，于是收紧工作年复一年被推迟。本文给出一条工程化路径：先用 Flow Logs + Athena + CloudTrail 摸清依赖，把跨 Region 业务切到 VPC Peering + Route53 Private Hosted Zone，再用 SSM Port Forwarding 替代开发者直连，最后原子切换 SG 并清理长尾白名单。每一步都给可直接执行的脚本和 IAM Policy。覆盖 4 个真实踩到的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/aurora-public-access-tightening/featured.jpg"/></item><item><title>Playbook：自建 Headscale 零信任 Mesh，混合云内网访问的可执行落地方案</title><link>https://socake.github.io/playbook/zerotrust-mesh-headscale/</link><pubDate>Thu, 30 Apr 2026 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/zerotrust-mesh-headscale/</guid><description>数据库公网入口收紧后，开发调试需求仍然真实存在。SSM Port Forwarding 这类临时方案随着资源增加和团队扩大很快变得不可维护。Headscale + Tailscale 提供了一层统一的访问控制：单台 ECS 跑控制面，每个 K8s 集群部署 Subnet Router Pod，ACL 基于身份控制访问范围。本文给出从阿里云 ECS 创建命令、Caddyfile、完整 Headscale 配置、K8s 完整 manifest、运维脚本、客户端接入脚本到故障 runbook 的一整套可直接复制执行的工件，包含 5 个生产中真实踩到的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/zerotrust-mesh-headscale/featured.jpg"/></item><item><title>Playbook：让 DDL 风险在合并前可见——CI/CD 双 Stage Schema Check 设计</title><link>https://socake.github.io/playbook/schema-check-dual-stage-pipeline/</link><pubDate>Thu, 30 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/schema-check-dual-stage-pipeline/</guid><description>很多团队把 schema diff 接进流水线后仍然出 DDL 事故——绿色构建 + warning 通知，没人读，等于没装。本文记录一套已经在 5 条主流水线（MySQL / PostgreSQL）上线两周的双 Stage 设计：pre stage 在 PR 阶段以 warning 模式跑，给开发者『提前修』的窗口；post stage 在合并到 PRE 后以 fail 模式跑，缺表/破坏性 DDL 直接阻塞 PRE → PROD 推进。给出完整 schema_check.py、ignore-rules.yaml、双 stage 云效 Flow YAML、GitHub Actions 等价实现、PR 评论机器人脚本、5 种 DDL 危险场景的 unit test、跨服务依赖图脚本，以及五个踩坑的完整修复与复现脚本。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/schema-check-dual-stage-pipeline/featured.jpg"/></item><item><title>Playbook：AWS MSK Serverless 迁回 Provisioned——什么时候、为什么、怎么迁</title><link>https://socake.github.io/playbook/msk-serverless-to-provisioned/</link><pubDate>Thu, 30 Apr 2026 13:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/msk-serverless-to-provisioned/</guid><description>MSK Serverless 看似按用量付费，实际上有一个常被忽视的最低消费层级：每个集群每月固定 $540 起、每个活跃消费者 IAM principal 还要按小时另收。对于流量长期 &amp;laquo; 1MB/s 的非生产环境，月费可以是同等吞吐 Provisioned 集群的 5-7 倍。本文记录将 4 个非生产环境从 MSK Serverless 迁回 Provisioned（kafka.t3.small × 2）的完整流程：成本计算脚本、aws kafka create-cluster 完整 JSON、IRSA 三 role 拆分、Java/Go/Python 三栈客户端配置、双集群双写五阶段切换、Schema Registry 导出导入、回滚脚本，以及踩过的多 IRSA、sarama、broker 数不可缩、Schema Registry 漏迁五个坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/msk-serverless-to-provisioned/featured.jpg"/></item><item><title>Playbook：K8s 集群三合一实战——QA / PRE / AI Sandbox 合并的完整可执行手册</title><link>https://socake.github.io/playbook/k8s-cluster-consolidation/</link><pubDate>Thu, 30 Apr 2026 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/k8s-cluster-consolidation/</guid><description>集群合并的好处显性，坏处隐性。本 Playbook 不再停留在『讲个思路』，每段 yaml 都是完整 manifest（含 Namespace / ServiceAccount / RBAC / Secret），每段脚本都能 chmod +x 直接跑，每个步骤含前置 / 执行 / 验证 / 回滚四件套，并附一次真实事故的完整修复 SQL。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/k8s-cluster-consolidation/featured.jpg"/></item><item><title>Playbook：CI/CD 流水线模板化——3 个标准模板覆盖 80% 服务的端到端实战</title><link>https://socake.github.io/playbook/cicd-pipeline-templating/</link><pubDate>Thu, 30 Apr 2026 12:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/cicd-pipeline-templating/</guid><description>在 80+ 条流水线的体量下，每条服务自己拷一份 yaml 是工程债：字段命名漂移、改一次通知模板要改 80 处、新人不知道照哪条抄。本文把方案从「思路」推进到「拿来即用」：每个标准模板给完整 YAML（含 anchors / 变量组绑定 / 审批节点）、对应 GitHub Actions reusable workflow、Jenkins shared library；附 create-pipeline.sh 端到端脚本、变量组管理 API 调用、模板回归测试 dry-run；7 个云效官方文档不写的硬约束（schedule 不工作 / step envs 失效 / stage 间永远线性渲染等）每个含完整修复 + 通用结论。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/cicd-pipeline-templating/featured.jpg"/></item><item><title>Playbook：新建子环境的隔离 checklist——一次 ID 撞车污染 10 万条数据的事故复盘</title><link>https://socake.github.io/playbook/multi-environment-isolation-checklist/</link><pubDate>Thu, 30 Apr 2026 11:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/multi-environment-isolation-checklist/</guid><description>一个共用 RabbitMQ broker、共用 Aurora cluster、自增 id 都从 1 起步的新子环境上线 24 天，向已有环境的老用户项目里灌入了约 10 万条不属于他们的消息。本文复盘事故根因（4 件套同时成立才会爆雷），对比三种隔离方案的成本与风险，给出推荐架构（独立中间件 + 共享集群 + ID 起点错开），并把 7 条强制 checklist 沉淀为新子环境上线门槛，附完整可执行的 aws cli / kubectl / SQL / Go 中间件代码。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/multi-environment-isolation-checklist/featured.jpg"/></item><item><title>Playbook：每个 PR 一个独立环境——X-env header 路由 + 三层清理保障（深度版）</title><link>https://socake.github.io/playbook/per-pr-isolated-environment/</link><pubDate>Thu, 30 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/per-pr-isolated-environment/</guid><description>QA 共享环境是并行开发的最大瓶颈。本 Playbook 给出一套已经在多个业务服务上线、跑通端到端真实代码改动验证的 PR 隔离方案：feature 分支推送即触发 deploy.py 在独立 namespace 拉起 PR Pod，入口域名继续用 QA 域名，HTTPRoute 按 X-env header 把流量切到对应 PR Pod，关闭 PR + 24h cron + 容量水位三层清理避免泄漏。本版（v2 深度版）相对 v1 重点强化了可执行性：所有 yaml 是完整 manifest（含 namespace / RBAC / Secret），所有脚本都能 chmod +x 直接跑，每步含前置 / 执行 / 验证 / 回滚四件套，配 5 个完整踩坑修复 + 2 张 mermaid 图。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/per-pr-isolated-environment/featured.jpg"/></item><item><title>Playbook：中等规模公司的完整 DevOps 流程——从代码提交到生产部署的全链路设计</title><link>https://socake.github.io/playbook/end-to-end-devops-pipeline/</link><pubDate>Thu, 30 Apr 2026 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/playbook/end-to-end-devops-pipeline/</guid><description>中等规模公司的 DevOps 体系最常见的两个症状：工具碎片化（GitLab + Jenkins + 手工 kubectl）和阶段衔接断裂（PR 慢、合并后部署延迟、监控滞后）。本文不讲入门概念，给一份真实可落地的全流程蓝图：开发者本机 → Git 提交 → 云效 / GitHub Actions CI（含 Schema Check 双 Stage）→ ECR/ACR → GitOps 仓库自动更新镜像 tag → ArgoCD 自动 sync → K8s 多集群部署 → Prometheus + Loki + 钉钉告警。每个环节标注用什么工具具体到版本号，关键集成点（ApplicationSet / Kustomize overlay / deploy.py）给完整可执行配置，配三个真实坑（GitOps 闭环缺口、deploy.py path-mode 切换混乱、多 ArgoCD 凭据路由），并给出 DORA 风格的 before/after 对比与采集脚本。可以把这篇当成整个 Playbook 系列的目录页。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/playbook/end-to-end-devops-pipeline/featured.jpg"/></item><item><title>Nacos 一文通：从零基础到生产精通的配置中心与服务发现实战</title><link>https://socake.github.io/posts/nacos-config-service-discovery-guide/</link><pubDate>Sat, 18 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/nacos-config-service-discovery-guide/</guid><description>Nacos 同时承担配置中心和服务注册发现两个核心职责，是 Spring Cloud Alibaba 生态的基石。本文系统梳理 Nacos 的数据模型、一致性协议、长轮询推送机制、临时实例健康检查、生产集群部署、多语言 SDK 接入、灰度发布、权限控制、常见故障排查（配置不生效/密码漂移/集群脑裂）以及云原生时代的定位，适合从入门到生产运维的完整参考。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/nacos-config-service-discovery-guide/featured.jpg"/></item><item><title>多云中间件横向速查与跨环境隔离实战</title><link>https://socake.github.io/posts/multi-cloud-middleware-and-isolation/</link><pubDate>Sat, 18 Apr 2026 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multi-cloud-middleware-and-isolation/</guid><description>做多云运维最容易的事就是把 AWS 那套思维原样搬到阿里云，然后在某次故障里发现选型完全错位。本文整理了一份 AWS↔阿里云中间件横向对照表，附上跨环境隔离强制 checklist 和高频运维命令速查，是我自己工作中反复回查的一份速记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multi-cloud-middleware-and-isolation/featured.jpg"/></item><item><title>Headscale 自建零信任 VPN：跨云多机房内网打通</title><link>https://socake.github.io/posts/headscale-zero-trust-vpn/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/headscale-zero-trust-vpn/</guid><description>从 WireGuard 协议原理到 Headscale 完整部署，包括 DERP 自建、Subnet Router 配置、K8s 集成和 ACL 策略设计，用 Mesh VPN 替代传统堡垒机的完整实操指南。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/headscale-zero-trust-vpn/featured.jpg"/></item><item><title>Linux 火焰图实战：从采集到定位问题</title><link>https://socake.github.io/posts/linux-flame-graph-practice/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-flame-graph-practice/</guid><description>CPU 飙高、响应慢、内存泄漏——这三类问题用火焰图都能快速定位。本文从怎么读火焰图开始，讲到 perf、async-profiler、py-spy 各自的适用场景，最后用一个真实的 Go 服务案例走完完整排查流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-flame-graph-practice/featured.jpg"/></item><item><title>MySQL 高可用实战：MGR + ProxySQL + Orchestrator 完整部署</title><link>https://socake.github.io/posts/mysql-ha-mgr-proxysql/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mysql-ha-mgr-proxysql/</guid><description>详细讲解 MySQL 8.0 MGR 单主模式完整搭建过程、脑裂与 GTID 不一致处理方法、ProxySQL 读写分离配置和健康检查脚本、Orchestrator 自动故障转移与 ProxySQL 联动，以及 mysqld_exporter 监控集成。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mysql-ha-mgr-proxysql/featured.jpg"/></item><item><title>OpenCost 实战：Kubernetes 成本可见性与多团队费用分摊</title><link>https://socake.github.io/posts/opencost-kubernetes-cost-visibility/</link><pubDate>Sun, 12 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opencost-kubernetes-cost-visibility/</guid><description>Kubernetes 成本不透明是 FinOps 落地的最大障碍。本文通过 OpenCost 构建完整的成本可见性体系，涵盖部署集成、云厂商价格接入、按团队分摊、Grafana 看板、超预算告警和自动周报推送，提供可直接复用的配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opencost-kubernetes-cost-visibility/featured.jpg"/></item><item><title>Argo Workflows 工作流实战：批处理与 ML Pipeline</title><link>https://socake.github.io/posts/argo-workflows-practice/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argo-workflows-practice/</guid><description>Argo Workflows 是 Kubernetes 原生的工作流引擎，适合批处理和 ML Pipeline 场景。本文涵盖与 Airflow/Temporal 的选型对比、核心资源模型、三个完整实战（DAG 数据处理、ML 训练 Pipeline、定时备份）、资源管控（Semaphore/Node Selector）、Argo Events 事件驱动触发，以及 Prometheus 监控和常见问题处理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argo-workflows-practice/featured.jpg"/></item><item><title>Kubernetes cgroup v2 迁移实践</title><link>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-cgroup-v2-migration/</guid><description>K8s 1.25+ 默认启用 cgroup v2，MemoryQoS 和 PSI 等新特性只在 v2 支持。本文给出完整的节点迁移操作流程和常见问题解决方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-cgroup-v2-migration/featured.jpg"/></item><item><title>USE Method：系统性能分析方法论</title><link>https://socake.github.io/posts/use-method-performance-analysis/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/use-method-performance-analysis/</guid><description>随机尝试是性能排查的大敌。USE Method 用一个三维框架（使用率/饱和度/错误）把所有系统资源纳入统一分析体系，本文从原理到实战全面解析这套方法论，并提供 K8s 环境下的 PromQL 映射和工具链速查表。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/use-method-performance-analysis/featured.jpg"/></item><item><title>bpftrace 实战：线上问题排查的瑞士军刀</title><link>https://socake.github.io/posts/bpftrace-performance-debug/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/bpftrace-performance-debug/</guid><description>strace 太重、perf 太原始、BCC 工具集要装一堆依赖——bpftrace 是这三者之间的平衡点。本文用四个真实场景讲清楚 bpftrace 的工作方式，帮你把它变成日常排查工具。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/bpftrace-performance-debug/featured.jpg"/></item><item><title>FinOps 实践：Kubernetes 成本治理体系建设</title><link>https://socake.github.io/posts/finops-kubernetes-cost-governance/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/finops-kubernetes-cost-governance/</guid><description>一套完整的 Kubernetes FinOps 落地路径：如何识别僵尸资源、配置成本分摊模型、利用 Karpenter 降低节点成本，以及如何将月账单从 $50k 压到 $30k。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/finops-kubernetes-cost-governance/featured.jpg"/></item><item><title>gRPC 微服务实践：协议、负载均衡与 Kubernetes 集成</title><link>https://socake.github.io/posts/grpc-microservices-practice/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grpc-microservices-practice/</guid><description>从协议原理到 Kubernetes 生产落地，系统梳理 gRPC 微服务的核心实践：Protobuf 向后兼容设计、拦截器链（日志/限流/OTel）、长连接负载不均问题（headless Service + round_robin vs Envoy L7）、健康检查 Probe 配置、以及 grpc-gateway REST 共存方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grpc-microservices-practice/featured.jpg"/></item><item><title>Kubernetes v1.33 新特性深度解读：GA 特性全览与升级指南</title><link>https://socake.github.io/posts/kubernetes-v133-features/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-v133-features/</guid><description>Kubernetes v1.33 带来了多项重量级 GA 特性，本文深入解读 In-Place Pod Vertical Scaling、原生 Sidecar Containers、Pod Scheduling Readiness、KMS v2 加密等核心变更，并提供实际可用的配置示例和生产升级建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-v133-features/featured.jpg"/></item><item><title>PostgreSQL 高可用实战：Patroni + HAProxy + etcd 完整部署指南</title><link>https://socake.github.io/posts/postgresql-ha-patroni/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/postgresql-ha-patroni/</guid><description>详解 Patroni 自动故障转移机制，手把手完成 etcd 三节点集群搭建、Patroni 完整配置（含 pg_hba.conf 托管）、HAProxy 读写分离配置，以及 kill primary 故障切换演练全过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/postgresql-ha-patroni/featured.jpg"/></item><item><title>Service Mesh 技术选型：Istio vs Cilium vs Linkerd 深度对比</title><link>https://socake.github.io/posts/service-mesh-comparison/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/service-mesh-comparison/</guid><description>Istio、Cilium Service Mesh、Linkerd 三种方案各有侧重：Istio 功能最全但最重，Cilium 基于 eBPF 性能最优，Linkerd 最轻量最易运维。本文从架构、性能、功能、运维四个维度全面拆解，帮助架构师做出有数据支撑的选型决策。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/service-mesh-comparison/featured.jpg"/></item><item><title>从 Ingress 迁移到 Gateway API：完整实操指南</title><link>https://socake.github.io/posts/ingress-to-gateway-api-migration/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ingress-to-gateway-api-migration/</guid><description>Gateway API 是 Kubernetes 官方下一代流量入口标准，解决了 Ingress 注解泛滥、跨实现不可移植等历史遗留问题。本文带你从零完成生产迁移。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ingress-to-gateway-api-migration/featured.jpg"/></item><item><title>Flagger 渐进式交付实战：金丝雀、蓝绿、A/B 与 Istio/NGINX/Gateway API 集成</title><link>https://socake.github.io/posts/flagger-progressive-delivery/</link><pubDate>Sat, 11 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/flagger-progressive-delivery/</guid><description>传统的 kubectl apply 发布方式让风险集中在发布那一刻。Flagger 通过指标驱动的渐进式切流（Canary Analysis），把风险摊到整个发布过程，异常自动回滚。本文基于官方文档，系统讲解 Canary CR 的完整字段、三种策略的配置模板、与 Istio/NGINX Ingress/Gateway API 的集成、自定义指标分析、自动化回滚机制，以及与 Argo Rollouts 的选型对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/flagger-progressive-delivery/featured.jpg"/></item><item><title>Temporal 分布式工作流引擎实战：Worker、Activity、重试语义与生产部署</title><link>https://socake.github.io/posts/temporal-workflow-engine/</link><pubDate>Wed, 08 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/temporal-workflow-engine/</guid><description>长流程业务编排历来头疼——状态机、定时器、补偿、幂等、失败恢复都要自己写。Temporal 用 event sourcing + 确定性 replay 把这些问题一次性解决。本文以 Go SDK 为主线，从编程模型、Workflow 确定性约束、Activity 重试、Signal/Query、child workflow、到生产集群部署、监控和容量规划，给出可直接落地的范式。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/temporal-workflow-engine/featured.jpg"/></item><item><title>故障排查实录：Terway CRD IPAM IP 泄漏导致 Pod 无法调度</title><link>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</link><pubDate>Tue, 07 Apr 2026 09:54:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/</guid><description>一次真实的连锁故障：节点磁盘告警 → Pod 被驱逐 → Terway IPAM IP 未正常回收 → 节点 ENI IP 耗尽 → 新 Pod 无法调度。排查链路、根因分析与修复方案完整记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5-terway-ip%E6%B3%84%E6%BC%8F/featured.jpg"/></item><item><title>AutoGen 多 Agent 协作实战：从 Group Chat 到生产落地</title><link>https://socake.github.io/posts/autogen-multi-agent-practice/</link><pubDate>Mon, 06 Apr 2026 11:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/autogen-multi-agent-practice/</guid><description>AutoGen 把多 Agent 协作从玩具推向生产。本文讲清它的核心抽象 (Conversable Agent / Group Chat / 工具调用)，以及从 demo 到生产要处理的那些事。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/autogen-multi-agent-practice/featured.jpg"/></item><item><title>运维工程师的 AI 工具实践</title><link>https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/</link><pubDate>Fri, 03 Apr 2026 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/</guid><description>从写 Shell 脚本、解读错误信息到辅助故障排查，分享运维工程师真实使用 AI 工具的高效场景、无效场景和 Prompt 技巧，以及各工具的适合场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/featured.jpg"/></item><item><title>LiteLLM 网关实战：多模型统一接入、限流、成本追踪与故障切换</title><link>https://socake.github.io/posts/litellm-gateway-proxy/</link><pubDate>Thu, 02 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/litellm-gateway-proxy/</guid><description>LiteLLM 是 LLM 多模型接入的事实标准。本文讲清它的 Proxy 模式部署、Model Config、Virtual Key、Router Fallback、成本追踪和踩坑实录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/litellm-gateway-proxy/featured.jpg"/></item><item><title>Tetragon eBPF 运行时安全实战：进程/网络/文件策略、与 Falco 的对比</title><link>https://socake.github.io/posts/tetragon-runtime-security/</link><pubDate>Thu, 02 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tetragon-runtime-security/</guid><description>Kubernetes 运行时安全是传统 EDR 难以覆盖的盲区。Tetragon 用 eBPF 在内核态采集进程、网络、文件和系统调用事件，并能在内核就地阻断攻击动作。本文从架构原理出发，讲解 TracingPolicy 语法、典型攻击检测（反弹 shell、提权、敏感文件访问）、阻断机制、性能开销，以及它与 Falco 的差异。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tetragon-runtime-security/featured.jpg"/></item><item><title>Ollama 在 K8s 上跑大模型：本地 LLM 的运维实践</title><link>https://socake.github.io/posts/ollama-kubernetes-llm/</link><pubDate>Mon, 30 Mar 2026 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ollama-kubernetes-llm/</guid><description>在 Kubernetes 上部署 Ollama 运行本地大模型，从 GPU 调度到 CPU 推理降级，再到运维场景的实际集成，记录完整的踩坑与实践过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ollama-kubernetes-llm/featured.jpg"/></item><item><title>Ray Serve 模型部署实战：Deployment、DAG 编排与弹性伸缩</title><link>https://socake.github.io/posts/ray-serve-model-deployment/</link><pubDate>Sun, 29 Mar 2026 10:45:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ray-serve-model-deployment/</guid><description>Ray Serve 是被很多团队忽视的模型服务框架。它在复杂 DAG、异构资源、弹性伸缩上的表现远超单纯的 FastAPI。本文讲清它的核心抽象和生产落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ray-serve-model-deployment/featured.jpg"/></item><item><title>GitHub Copilot 工程化使用：不只是代码补全</title><link>https://socake.github.io/posts/github-copilot-engineering/</link><pubDate>Sat, 28 Mar 2026 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/github-copilot-engineering/</guid><description>GitHub Copilot不只是Tab补全。Copilot Chat的/fix /explain /tests命令、workspace上下文、Copilot for CLI、在Terraform/Dockerfile/K8s YAML中的实际用法，以及提高补全命中率的技巧。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/github-copilot-engineering/featured.jpg"/></item><item><title>Volcano 批调度实战：AI 训练集群的 Gang Scheduling、队列与抢占</title><link>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</link><pubDate>Wed, 25 Mar 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</guid><description>K8s 默认调度器对 AI 训练极不友好。Volcano 把 HPC 调度理念搬进 K8s：Gang Scheduling、Queue、Fairshare、Preemption、拓扑亲和。这篇讲清楚它在 AI 训练集群的落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/volcano-gpu-batch-scheduling/featured.jpg"/></item><item><title>Cursor AI 编程助手深度使用指南</title><link>https://socake.github.io/posts/cursor-ai-editor-guide/</link><pubDate>Wed, 25 Mar 2026 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cursor-ai-editor-guide/</guid><description>Cursor不是装了AI插件的VSCode，它重新设计了人机协作的交互模型。本文拆解Tab补全、@上下文引用、Composer、Agent模式、.cursorrules配置，并以重构运维脚本为例演示完整工作流。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cursor-ai-editor-guide/featured.jpg"/></item><item><title>ComfyUI + Stable Diffusion：工作流自动化图像生成</title><link>https://socake.github.io/posts/comfyui-stable-diffusion-workflow/</link><pubDate>Mon, 23 Mar 2026 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/comfyui-stable-diffusion-workflow/</guid><description>对比SDXL/FLUX/SD3生态选型，讲清楚ComfyUI vs WebUI如何选，然后深入ComfyUI安装、节点图工作流设计、常用节点配置，重点讲API无头调用和服务器端批量生成部署方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/comfyui-stable-diffusion-workflow/featured.jpg"/></item><item><title>FluxCD vs ArgoCD 深度对比与迁移实战：架构、语义、多租户与选型决策</title><link>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</link><pubDate>Sun, 22 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</guid><description>GitOps 的两条主流路线——FluxCD 与 ArgoCD——在架构、语义、运维成本和扩展性上有显著差异。本文基于官方文档和生产实战，按同步模型、应用抽象、多租户隔离、Helm 支持、可观测性、扩展机制逐项对比，给出选型决策树，并提供一套可复用的从 ArgoCD 迁移到 FluxCD 的操作手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/fluxcd-vs-argocd-migration/featured.jpg"/></item><item><title>Unsloth 高效微调实战：单卡 QLoRA 的极致性能与内部原理</title><link>https://socake.github.io/posts/unsloth-efficient-finetuning/</link><pubDate>Sun, 22 Mar 2026 09:15:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/unsloth-efficient-finetuning/</guid><description>Unsloth 用手写 Triton kernel 把单卡 LoRA 微调速度和显存压到极致。本文讲清 Unsloth 的原理、和 LLaMA Factory/TRL 的组合用法，以及真实使用的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/unsloth-efficient-finetuning/featured.jpg"/></item><item><title>Linux 内核网络参数深度调优：高并发场景实战</title><link>https://socake.github.io/posts/linux-kernel-network-tuning/</link><pubDate>Fri, 20 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-kernel-network-tuning/</guid><description>在高并发场景下，Linux 默认内核参数往往成为系统瓶颈。本文从原理出发，系统讲解 TCP backlog、TIME_WAIT、keepalive、内存缓冲区、conntrack、网卡队列（RSS/RPS/RFS）的调优方法，并提供 K8s 节点专属的 sysctl DaemonSet 方案和完整的压测验证流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-kernel-network-tuning/featured.jpg"/></item><item><title>FastGPT 知识库问答系统：从部署到应用</title><link>https://socake.github.io/posts/fastgpt-knowledge-base-practice/</link><pubDate>Fri, 20 Mar 2026 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/fastgpt-knowledge-base-practice/</guid><description>FastGPT是专注知识库问答的开源平台，相比Dify上手更快。本文覆盖MongoDB+PgVector部署、知识库创建与文档导入、Flow工作流配置、相似度阈值调优、API接入钉钉，以及运维知识库的实战案例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/fastgpt-knowledge-base-practice/featured.jpg"/></item><item><title>LLaMA Factory 微调工具链实战：从数据准备到 LoRA 合并的全流程</title><link>https://socake.github.io/posts/llamafactory-finetuning/</link><pubDate>Wed, 18 Mar 2026 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llamafactory-finetuning/</guid><description>LLaMA Factory 把大模型微调的很多 trick 工程化了。本文按一个完整项目的节奏讲：数据、SFT、LoRA、DPO、合并、评估和常见坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llamafactory-finetuning/featured.jpg"/></item><item><title>容器镜像构建优化：BuildKit、多阶段构建与供应链安全</title><link>https://socake.github.io/posts/container-image-build-optimization/</link><pubDate>Wed, 18 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/container-image-build-optimization/</guid><description>深入剖析容器镜像构建优化的每个环节：BuildKit 并行构建与 Secrets 注入、Go/Python/Node.js 多阶段 Dockerfile 模板、&amp;ndash;mount=type=cache 与远程缓存、Distroless vs Alpine 选型、dive 分析层内容，以及完整的供应链安全闭环（syft SBOM + Cosign 签名 + K8s 准入控制验签）。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/container-image-build-optimization/featured.jpg"/></item><item><title>ClickHouse 生产运维实战：集群部署、副本分片、性能调优与故障排查</title><link>https://socake.github.io/posts/clickhouse-ops-practice/</link><pubDate>Sun, 15 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/clickhouse-ops-practice/</guid><description>ClickHouse 高吞吐 OLAP 能力背后有一套独特的运维范式：ReplicatedMergeTree、ZooKeeper/Keeper、分布式表、物化视图、TTL、MergeTree 家族选型。本文按生产落地路径，从集群规划、副本分片、写入优化、查询调优、物化视图到慢查询排查，配套可直接复用的 SQL 与运维脚本。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/clickhouse-ops-practice/featured.jpg"/></item><item><title>SGLang 结构化生成实战：RadixAttention、约束解码与多轮对话优化</title><link>https://socake.github.io/posts/sglang-structured-generation/</link><pubDate>Sat, 14 Mar 2026 16:45:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sglang-structured-generation/</guid><description>SGLang 是被低估的 LLM 推理框架，RadixAttention 对多轮对话和 Agent 场景收益巨大。本文讲清 SGLang 的核心机制、前端 DSL、约束解码、部署方式和踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sglang-structured-generation/featured.jpg"/></item><item><title>Dify 私有化部署与 RAG 应用构建实战</title><link>https://socake.github.io/posts/dify-self-hosted-rag-practice/</link><pubDate>Thu, 12 Mar 2026 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/dify-self-hosted-rag-practice/</guid><description>Dify是当前私有化部署最成熟的LLM应用构建平台。本文覆盖Docker Compose部署、多模型Provider配置、知识库创建与切片调优、RAG对话应用构建、工作流编排，以及API发布与生产监控。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/dify-self-hosted-rag-practice/featured.jpg"/></item><item><title>Triton Inference Server 生产部署：模型编排、动态批处理与多框架混部</title><link>https://socake.github.io/posts/triton-inference-server-production/</link><pubDate>Wed, 11 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/triton-inference-server-production/</guid><description>把 Triton 从一个陌生的 NVIDIA 推理服务器讲清楚：model repository、backend、动态批处理、ensemble、BLS、Python backend、生产监控和踩坑实录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/triton-inference-server-production/featured.jpg"/></item><item><title>多模态大模型实践：图像理解与视觉分析</title><link>https://socake.github.io/posts/multimodal-llm-vision-practice/</link><pubDate>Mon, 09 Mar 2026 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multimodal-llm-vision-practice/</guid><description>覆盖主流多模态模型选型对比、图像理解API调用方式、OCR/文档理解/图表解析等实际场景，以及一个完整的运维场景实战：用多模态模型自动分析Grafana截图并生成告警摘要。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multimodal-llm-vision-practice/featured.jpg"/></item><item><title>Prompt Engineering 完全指南：从入门到工程化</title><link>https://socake.github.io/posts/prompt-engineering-guide/</link><pubDate>Mon, 09 Mar 2026 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prompt-engineering-guide/</guid><description>Prompt Engineering 不是玄学，而是有规律可循的工程实践。从基础技巧到企业级工程化，本文覆盖提示词设计的完整方法论，包括 A/B 测试、版本管理、失效模式分析，以及在生产系统中管理提示词的最佳实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prompt-engineering-guide/featured.jpg"/></item><item><title>TensorRT-LLM 推理加速实战：从 engine 编译到 kernel 调优</title><link>https://socake.github.io/posts/tensorrt-llm-inference/</link><pubDate>Sat, 07 Mar 2026 14:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tensorrt-llm-inference/</guid><description>TensorRT-LLM 是 NVIDIA 端到端推理栈的关键一环，这篇把 engine 编译流程、plugin 机制、量化策略、inflight batching、kernel 调优和生产踩坑都梳理清楚。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tensorrt-llm-inference/featured.jpg"/></item><item><title>OpenAI API 工程化实践：从 Hello World 到生产</title><link>https://socake.github.io/posts/openai-api-engineering/</link><pubDate>Tue, 03 Mar 2026 11:41:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/openai-api-engineering/</guid><description>OpenAI API 是大多数 LLM 应用开发者的起点，但从 Hello World 到真正可靠的生产系统，中间有很多工程细节需要处理。本文覆盖 Function Calling、Structured Output、Batch API、Embeddings 的完整实践，以及速率限制、错误处理和成本控制的系统方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/openai-api-engineering/featured.jpg"/></item><item><title>vLLM 多机多卡分布式推理：Tensor Parallel 调优与踩坑实录</title><link>https://socake.github.io/posts/vllm-multi-node-distributed/</link><pubDate>Tue, 03 Mar 2026 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vllm-multi-node-distributed/</guid><description>从单机 8 卡讲到多机多卡，把 vLLM 的 TP/PP 拆分、Ray 启动方式、NCCL 调优、PagedAttention 显存核算和常见翻车场景串成一条完整的落地路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vllm-multi-node-distributed/featured.jpg"/></item><item><title>MCP 协议实战：给 AI Agent 接上运维工具</title><link>https://socake.github.io/posts/mcp-protocol-devops/</link><pubDate>Fri, 27 Feb 2026 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mcp-protocol-devops/</guid><description>Model Context Protocol 让 AI 能够标准化地调用外部工具。本文用 Python 实现一个运维 MCP Server，接入 kubectl、Prometheus、Loki，让 AI 直接查集群状态。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mcp-protocol-devops/featured.jpg"/></item><item><title>Claude Code CLI 使用指南：AI 驱动的终端编程助手</title><link>https://socake.github.io/posts/claude-code-cli-guide/</link><pubDate>Thu, 26 Feb 2026 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/claude-code-cli-guide/</guid><description>Claude Code是Anthropic推出的终端AI编程助手，不同于编辑器插件，它在终端里直接操作文件、执行命令、理解整个代码库。本文覆盖安装配置、核心交互模式、CLAUDE.md自定义、K8s排障和自动化脚本场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/claude-code-cli-guide/featured.jpg"/></item><item><title>自动化发版实战：semantic-release、release-please、changesets 对比选型</title><link>https://socake.github.io/posts/release-automation-changelog/</link><pubDate>Wed, 25 Feb 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/release-automation-changelog/</guid><description>手动维护 CHANGELOG.md、手动打 git tag、手动写 release notes——这些都是十年前的工作方式。现代发版应该是：每次合并 PR 时工具自动决定下一个版本号、自动生成 changelog、自动打 tag、自动发布。本文讲清楚三种方案的差异和选型。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/release-automation-changelog/featured.jpg"/></item><item><title>Claude API 开发完全指南：从调用到生产应用</title><link>https://socake.github.io/posts/claude-api-development-guide/</link><pubDate>Tue, 24 Feb 2026 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/claude-api-development-guide/</guid><description>Claude API 的设计哲学和 OpenAI 有些不同，但一旦理解其模式，就会发现它在长文本、代码生成和工具调用上非常可靠。本文覆盖从 SDK 配置到 Prompt Caching、Tool Use、Vision 的完整开发实践，以及生产中的错误处理与成本控制策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/claude-api-development-guide/featured.jpg"/></item><item><title>Embedding 模型选型与优化实战：从 BGE 到 OpenAI Embedding</title><link>https://socake.github.io/posts/embedding-model-selection-guide/</link><pubDate>Sat, 21 Feb 2026 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/embedding-model-selection-guide/</guid><description>系统对比 2026 年主流 Embedding 模型，从原理到工程实践，覆盖选型决策、缓存设计和批量优化</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/embedding-model-selection-guide/featured.jpg"/></item><item><title>Renovate 依赖升级机器人：从零到生产配置</title><link>https://socake.github.io/posts/renovate-bot-dependency-upgrade/</link><pubDate>Thu, 19 Feb 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/renovate-bot-dependency-upgrade/</guid><description>Dependabot 足够简单但能力单薄，Snyk 聚焦安全漏洞。Renovate 是介于两者之间的中庸选择：能升级一切、能分组、能调度、能自动合并、能 self-host。本文是完整的生产配置指南。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/renovate-bot-dependency-upgrade/featured.jpg"/></item><item><title>LangGraph 工作流编排：构建有状态的 AI 应用</title><link>https://socake.github.io/posts/langgraph-workflow-orchestration/</link><pubDate>Sun, 15 Feb 2026 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/langgraph-workflow-orchestration/</guid><description>从LangChain Chain的局限出发，讲清楚LangGraph的状态机模型、Graph/Node/Edge的设计方式，以及条件分支、循环、人工介入、Checkpoint持久化的工程实现，最后用一个运维诊断工作流串起来所有概念。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/langgraph-workflow-orchestration/featured.jpg"/></item><item><title>Langfuse：LLM 应用可观测性平台实战</title><link>https://socake.github.io/posts/langfuse-llm-observability/</link><pubDate>Sat, 14 Feb 2026 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/langfuse-llm-observability/</guid><description>讲清楚为什么LLM应用必须要可观测性，以及如何用Langfuse从链路追踪、Prompt版本管理、评估实验到成本分析做到全覆盖，包含Docker自托管部署和Python SDK完整集成示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/langfuse-llm-observability/featured.jpg"/></item><item><title>Terragrunt 规模化 Terraform 工程化：从 DRY 到 Stacks</title><link>https://socake.github.io/posts/terragrunt-terraform-at-scale/</link><pubDate>Sat, 14 Feb 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/terragrunt-terraform-at-scale/</guid><description>Terraform 写到 10 个 state 以上就开始痛苦：重复的 provider 配置、散落的变量、无法跨 state 引用、run-all 时的依赖混乱。Terragrunt 是 Terraform 的 wrapper，解决的就是&amp;rsquo;大规模&amp;rsquo;这个字——本文讲清楚它怎么用。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/terragrunt-terraform-at-scale/featured.jpg"/></item><item><title>LangChain 从入门到实战：构建 LLM 应用的工程框架</title><link>https://socake.github.io/posts/langchain-practical-guide/</link><pubDate>Mon, 09 Feb 2026 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/langchain-practical-guide/</guid><description>LangChain 是构建 LLM 应用最流行的框架，但也是踩坑最多的框架之一。本文从 LCEL 表达式、ReAct Agent、LangGraph 工作流到生产部署，梳理真正有用的部分，并指出哪些功能实际工程中应该避免。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/langchain-practical-guide/featured.jpg"/></item><item><title>Pulumi vs Terraform vs OpenTofu：2026 年 IaC 选型深度对比</title><link>https://socake.github.io/posts/pulumi-vs-terraform/</link><pubDate>Mon, 09 Feb 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/pulumi-vs-terraform/</guid><description>2023 年之后 IaC 世界变了：HashiCorp 把 Terraform 改成 BSL，Linux Foundation 接管了 OpenTofu。Pulumi 依然在代码式 IaC 的路上坚持。团队选型时面对的不是 Terraform 一家独大，而是三条技术路线的真实对比。本文试图给出一个不偏不倚的答案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/pulumi-vs-terraform/featured.jpg"/></item><item><title>RAG 评估体系：RAGAS 指标与幻觉检测实践</title><link>https://socake.github.io/posts/rag-evaluation-ragas/</link><pubDate>Thu, 05 Feb 2026 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/rag-evaluation-ragas/</guid><description>RAG 系统上线后，&amp;lsquo;感觉回答质量还不错&amp;rsquo;不是一个可持续的评估方式。RAGAS 提供了一套可量化的评估框架，让你能追踪 Faithfulness、Answer Relevancy 等指标随时间的变化，并在每次改动后自动验证系统质量没有退化。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/rag-evaluation-ragas/featured.jpg"/></item><item><title>Advanced RAG：超越 Naive RAG 的高级检索增强技术</title><link>https://socake.github.io/posts/advanced-rag-techniques/</link><pubDate>Wed, 04 Feb 2026 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/advanced-rag-techniques/</guid><description>系统拆解 Naive RAG 的三类失败模式，提供混合检索、HyDE、查询改写、Parent-Child 分块等高级技术的完整实现</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/advanced-rag-techniques/featured.jpg"/></item><item><title>Earthly 在 Monorepo 的构建统一：Earthfile + Satellites 实战</title><link>https://socake.github.io/posts/earthly-buildfile-monorepo/</link><pubDate>Tue, 03 Feb 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/earthly-buildfile-monorepo/</guid><description>Bazel 复杂度太高，Makefile 表达力不够，Dockerfile 只能构建一个镜像——Earthly 填的就是这个缝：像 Dockerfile 一样熟悉，像 Makefile 一样组合，像 Bazel 一样可并发、可缓存、可复用。本文讲清楚它在 Monorepo 里的真实位置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/earthly-buildfile-monorepo/featured.jpg"/></item><item><title>大模型赋能运维：LLM 在故障排查和自动化中的实际应用</title><link>https://socake.github.io/posts/aiops-llm-devops/</link><pubDate>Sat, 31 Jan 2026 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aiops-llm-devops/</guid><description>LLM 不能替代运维工程师，但确实能把重复性、低价值的工作自动化掉。本文分享我在实际工作中用 Claude 落地的几个场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aiops-llm-devops/featured.jpg"/></item><item><title>AI Agent 设计模式：从单步到复杂工作流</title><link>https://socake.github.io/posts/ai-agent-design-patterns/</link><pubDate>Thu, 29 Jan 2026 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ai-agent-design-patterns/</guid><description>Agent不是更智能的ChatGPT调用，它是一个能自主规划和执行多步骤任务的循环系统。本文拆解ReAct推理循环、Tool调用设计原则、Multi-Agent协作模式、Human-in-the-loop设计，以及告警分析Agent和巡检Agent的实战实现。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ai-agent-design-patterns/featured.jpg"/></item><item><title>Nix + devcontainer：彻底终结 works on my machine</title><link>https://socake.github.io/posts/nix-devcontainer-reproducible-env/</link><pubDate>Wed, 28 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/nix-devcontainer-reproducible-env/</guid><description>新同事入职第一天配环境要花一天，CI 和本地构建结果不一致，升级 Node 16 到 20 引发连锁故障——这些痛都源于&amp;rsquo;环境不是代码&amp;rsquo;。Nix 把工具链当成代码版本化，和 direnv/devcontainer 配合能做到 &amp;lsquo;git clone 后 10 秒进入完整可用环境&amp;rsquo;。本文是完整落地教程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/nix-devcontainer-reproducible-env/featured.jpg"/></item><item><title>LLM 应用安全：Prompt Injection 防御与 AI Guardrails 实战</title><link>https://socake.github.io/posts/llm-security-guardrails/</link><pubDate>Fri, 23 Jan 2026 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-security-guardrails/</guid><description>我们的 AI 客服系统曾被一个用户用一句话绕过所有限制，让它泄露了内部知识库的敏感信息。这篇文章系统梳理 LLM 应用的安全威胁模型，以及我们在生产系统中实施的防御层次。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-security-guardrails/featured.jpg"/></item><item><title>Dagger 实战：用代码而不是 YAML 编写 CI/CD</title><link>https://socake.github.io/posts/dagger-programmable-cicd/</link><pubDate>Wed, 21 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/dagger-programmable-cicd/</guid><description>每次迁移 CI 平台（Jenkins → GitLab → GitHub Actions → Tekton），业务流水线都要重写一遍。Dagger 的思路是：把流水线写成可移植的代码（Go/Python/TS），底层引擎负责执行和缓存，CI 平台只是调用方。本文讲清楚它怎么工作、什么时候值得引入。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/dagger-programmable-cicd/featured.jpg"/></item><item><title>LLM 成本优化实战：从 Token 预算到模型路由</title><link>https://socake.github.io/posts/llm-cost-optimization/</link><pubDate>Mon, 19 Jan 2026 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-cost-optimization/</guid><description>我们的 AI 功能上线第一个月，LLM API 账单是 $18,000。通过模型路由、Prompt Caching 和 Batch API，第三个月降到了 $3,200。这篇文章记录具体怎么做到的。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-cost-optimization/featured.jpg"/></item><item><title>LLM Tool Use 完全指南：Function Calling 设计模式与生产实践</title><link>https://socake.github.io/posts/llm-tool-use-function-calling/</link><pubDate>Sun, 18 Jan 2026 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-tool-use-function-calling/</guid><description>从工程视角深入 LLM Tool Use：覆盖 OpenAI 与 Claude API 差异、工具 Schema 设计、并发调用、错误恢复，附完整运维助手代码示例</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-tool-use-function-calling/featured.jpg"/></item><item><title>Tekton Pipelines 企业级落地：从 Task 抽象到供应链签名</title><link>https://socake.github.io/posts/tekton-pipelines-production/</link><pubDate>Thu, 15 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tekton-pipelines-production/</guid><description>Jenkins 扛不动 K8s Native 的调度压力，GitLab Runner 又太 monolithic。Tekton 把 &amp;lsquo;CI job&amp;rsquo; 拆成 Task + Pipeline + PipelineRun 三层 CRD，所有执行都是 Pod，天然贴合 K8s。本文讲清楚它在企业里该怎么用——以及怎么避免把它用成 YAML 地狱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tekton-pipelines-production/featured.jpg"/></item><item><title>LLM 微调入门：LoRA 让大模型适配私有场景</title><link>https://socake.github.io/posts/llm-finetuning-lora-practice/</link><pubDate>Wed, 14 Jan 2026 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-finetuning-lora-practice/</guid><description>什么时候该微调、什么时候该用提示工程？本文给出决策框架，然后用Unsloth+QLoRA实战微调Qwen2.5-7B，覆盖数据格式、训练监控、权重合并、部署到vLLM测试，以及10个真实踩坑记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-finetuning-lora-practice/featured.jpg"/></item><item><title>LLM 生产服务化：vLLM 部署与 GPU 推理优化实战</title><link>https://socake.github.io/posts/llm-production-serving-vllm/</link><pubDate>Tue, 13 Jan 2026 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-production-serving-vllm/</guid><description>团队把 Ollama 搬上生产后，高峰期请求排队超过 30 秒，用户纷纷反映 AI 功能不可用。这篇文章记录我们迁移到 vLLM 的全过程，包括 PagedAttention、Continuous Batching 原理，以及 Kubernetes GPU 部署的完整配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-production-serving-vllm/featured.jpg"/></item><item><title>2026 大模型全景：主力模型横评与选型指南</title><link>https://socake.github.io/posts/llm-landscape-2025/</link><pubDate>Fri, 09 Jan 2026 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-landscape-2025/</guid><description>GPT-5.4、Claude Opus 4.6、Gemini 2.5 Pro、Llama 4 Scout、DeepSeek V3.2——2026年4月的大模型格局已经和一年前完全不同。本文从工程师视角梳理当前主力模型的真实规格与适用边界，给出场景化选型矩阵，并讨论开源追平闭源、推理模型标配化、agent workload 崛起这三个2026年的核心判断。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-landscape-2025/featured.jpg"/></item><item><title>ko 实战：无 Dockerfile 构建 Go 容器镜像的正确姿势</title><link>https://socake.github.io/posts/ko-go-image-build/</link><pubDate>Fri, 09 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ko-go-image-build/</guid><description>同样是构建 Go 镜像，用 Dockerfile + BuildKit 要 2-3 分钟，用 ko 只需要 5-20 秒。差距来自 ko 不走 daemon、不写 tar、直接把 Go 编译产物塞进 OCI manifest。本文讲清楚这套 &amp;lsquo;Dockerfile-less&amp;rsquo; 构建到底怎么落地到生产，以及什么时候不该用它。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ko-go-image-build/featured.jpg"/></item><item><title>BuildKit 缓存生产实战：从多阶段到远端 Registry Cache</title><link>https://socake.github.io/posts/buildkit-cache-production/</link><pubDate>Sat, 03 Jan 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/buildkit-cache-production/</guid><description>BuildKit 的缓存体系看似简单一行 &amp;ndash;cache-to，实际生产里坑极多：mode=max 在多架构下的 manifest 行为、registry 后端每层 0.3s 的验证开销、cache mount 在 &amp;ndash;cache-to=registry 下不被导出的限制、GHA 后端 10GB 上限……本文基于真实 CI 流水线的调优记录，给出一套可复制的生产配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/buildkit-cache-production/featured.jpg"/></item><item><title>基于 Error Budget 的 Prometheus 告警设计——燃烧率告警实战</title><link>https://socake.github.io/posts/prometheus-error-budget-alerting/</link><pubDate>Thu, 25 Dec 2025 10:40:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-error-budget-alerting/</guid><description>错误率告警有一个致命问题：它不告诉你问题有多紧急。1% 的错误率，持续 2 小时和持续 10 分钟，对 SLO 的威胁完全不同。燃烧率告警从 Error Budget 消耗速度出发，让每一次告警都携带&amp;quot;紧急程度&amp;quot;信息。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-error-budget-alerting/featured.jpg"/></item><item><title>告警带图实战：Grafana Render + 钉钉推送趋势图</title><link>https://socake.github.io/posts/prometheus-alert-with-image/</link><pubDate>Tue, 23 Dec 2025 09:54:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-alert-with-image/</guid><description>收到告警只有一行数字，还要登录 Grafana 才能看趋势图——这是告警体验最大的痛点之一。本文介绍如何将 Grafana Image Renderer 与 Alertmanager Webhook 结合，实现告警消息自动附带趋势图的完整方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-alert-with-image/featured.jpg"/></item><item><title>Prometheus 进程监控：process-exporter 实战与告警配置</title><link>https://socake.github.io/posts/prometheus-process-monitoring/</link><pubDate>Thu, 18 Dec 2025 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-process-monitoring/</guid><description>K8s 有完善的 Pod 监控体系，但裸机和 VM 上运行的进程如何监控？本文介绍 process-exporter 的部署与配置实践，覆盖进程组匹配、核心指标、告警规则设计及实际踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-process-monitoring/featured.jpg"/></item><item><title>Kibana 实战：从日志查询到 Dashboard 可视化的完整指南</title><link>https://socake.github.io/posts/kibana-visualization-guide/</link><pubDate>Sat, 13 Dec 2025 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kibana-visualization-guide/</guid><description>Kibana 是我们 ELK 体系里使用频率最高的工具。这篇文章把我在实际运维中积累的 Kibana 使用技巧整理成体系，从 Discover 查询到 Dashboard 制作，再到 ILM 管理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kibana-visualization-guide/featured.jpg"/></item><item><title>高级运维/DevOps 工程师面试题精选：系统设计与深度考察</title><link>https://socake.github.io/posts/devops-senior-interview/</link><pubDate>Thu, 11 Dec 2025 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-senior-interview/</guid><description>高级运维面试考什么？本文整理 5 道系统设计题和 10 道深度技术题，每题给出答题框架。从监控体系设计到 K8s 调度器原理，从生产事故复盘到新技术引入决策，帮你建立完整的回答思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-senior-interview/featured.jpg"/></item><item><title>Dockerfile 编写最佳实践</title><link>https://socake.github.io/docs/cicd/dockerfile%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5/</link><pubDate>Tue, 09 Dec 2025 17:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/dockerfile%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5/</guid><description>系统讲解 Dockerfile 每条指令的最佳用法、ENTRYPOINT vs CMD 的组合方式、PID 1 信号处理问题，附 Go 服务和 Python 服务完整生产级示例。</description></item><item><title>云原生存储方案选型：EFS/EBS/OSS 实践</title><link>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</link><pubDate>Tue, 09 Dec 2025 17:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/%E4%BA%91%E5%8E%9F%E7%94%9F%E5%AD%98%E5%82%A8%E6%96%B9%E6%A1%88/</guid><description>系统梳理 AWS EBS、EFS、S3 在 Kubernetes 中的使用方式，覆盖 StorageClass 配置、动态供给、性能测试与数据备份策略，附阿里云 NAS/OSS 对比。</description></item><item><title>AWS IAM 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/aws-iam%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/aws-iam%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 IAM 核心概念到 IRSA/GitHub Actions OIDC 联合身份，再到权限边界与 SCP，系统梳理 AWS IAM 在生产环境的最佳实践。</description></item><item><title>发版回滚 SOP</title><link>https://socake.github.io/docs/cicd/%E5%8F%91%E7%89%88%E5%9B%9E%E6%BB%9Asop/</link><pubDate>Tue, 09 Dec 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/%E5%8F%91%E7%89%88%E5%9B%9E%E6%BB%9Asop/</guid><description>涵盖回滚判断标准、K8s/ArgoCD/配置各层回滚操作、数据库变更的前向修复 vs 回滚取舍，以及完整的值班人员操作 SOP 模板。</description></item><item><title>AWS EKS 实战指南</title><link>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</link><pubDate>Tue, 09 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/aws-eks%E5%AE%9E%E6%88%98/</guid><description>覆盖 EKS 核心架构、eksctl/aws cli 常用操作、IRSA 原理与配置、VPC CNI 网络限制、升级流程及常见故障排查。</description></item><item><title>多环境发版策略设计</title><link>https://socake.github.io/docs/cicd/%E5%A4%9A%E7%8E%AF%E5%A2%83%E5%8F%91%E7%89%88%E7%AD%96%E7%95%A5/</link><pubDate>Tue, 09 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/%E5%A4%9A%E7%8E%AF%E5%A2%83%E5%8F%91%E7%89%88%E7%AD%96%E7%95%A5/</guid><description>覆盖环境划分标准、分支策略（GitFlow vs Trunk-based）、镜像 tag 策略、自动/手动审批节点、金丝雀发布、蓝绿部署，以及发版后验证 checklist。</description></item><item><title>Docker 镜像优化实践</title><link>https://socake.github.io/docs/cicd/docker%E9%95%9C%E5%83%8F%E4%BC%98%E5%8C%96/</link><pubDate>Tue, 09 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/docker%E9%95%9C%E5%83%8F%E4%BC%98%E5%8C%96/</guid><description>覆盖多阶段构建、基础镜像选型（alpine/distroless/scratch）、layer 缓存优化、BuildKit cache mount、漏洞扫描等实战技巧，附优化前后对比数据。</description></item><item><title>Helm 使用指南：从入门到生产实践</title><link>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</guid><description>Helm 从入门到生产实践：Chart 结构、values 覆盖、模板语法、&amp;ndash;atomic/&amp;ndash;wait 等生产参数，以及常用 Chart 安装示例。</description></item><item><title>Kubernetes Ingress 配置实践</title><link>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</guid><description>从 Ingress 概念到生产实践：nginx/traefik/ALB 选型对比、TLS 自动签发、canary 灰度发布、限速超时等常用 annotations 详解。</description></item><item><title>Kubernetes 安全加固实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</guid><description>K8s 安全加固从 Pod 到集群：SecurityContext 配置、网络策略隔离、Secret 安全管理、镜像漏洞扫描、RBAC 最小权限原则的落地实践。</description></item><item><title>Kubernetes 故障排查 SOP</title><link>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</guid><description>从现象到根因的 K8s 故障排查全流程：Pod 异常状态、Node NotReady、Service 不通、存储挂载失败等场景的系统化排查方法。</description></item><item><title>Kubernetes 集群升级实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</guid><description>K8s 集群升级全流程：从版本兼容性检查、etcd 备份、EKS 托管升级命令，到节点蓝绿替换、PDB 配置、pluto 工具检测废弃 API，再到常见升级问题处理。</description></item><item><title>Go 标准库速查：运维工程师常用</title><link>https://socake.github.io/docs/languages/go/go%E6%A0%87%E5%87%86%E5%BA%93%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E6%A0%87%E5%87%86%E5%BA%93%E9%80%9F%E6%9F%A5/</guid><description>不查文档快速写出对的代码——整理了运维场景最常用的 Go 标准库用法，每节都是可直接复制的代码片段</description></item><item><title>Go 并发编程：goroutine 与 channel 实践</title><link>https://socake.github.io/docs/languages/go/go%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B/</guid><description>用 Go 并发特性加速运维工具：批量检查服务状态、并发执行 SSH 命令、控制超时与取消，都在这篇文章里</description></item><item><title>Go 错误处理最佳实践</title><link>https://socake.github.io/docs/languages/go/go%E9%94%99%E8%AF%AF%E5%A4%84%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E9%94%99%E8%AF%AF%E5%A4%84%E7%90%86/</guid><description>在运维工具中正确处理错误：错误包装与解包、可重试判断、统一错误输出格式、带上下文的错误信息，避免常见的错误处理反模式</description></item><item><title>Go 语言基础速查（运维向）</title><link>https://socake.github.io/docs/languages/go/go%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</guid><description>用 Go 写运维工具前必须掌握的语言基础，聚焦运维场景常用特性，配合实用代码示例</description></item><item><title>Go 运维工具开发实战</title><link>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</guid><description>从零写一个 Go 运维工具：cobra CLI 框架、执行 kubectl 命令、调用 K8s API、配置 zap 日志、viper 配置管理，完整可运行的代码示例</description></item><item><title>Kubernetes HPA/VPA 弹性伸缩配置</title><link>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</guid><description>从 HPA v2 到 KEDA 事件驱动伸缩，覆盖 CPU/内存/自定义指标配置、防抖参数调优、VPA 推荐器集成和生产级弹性伸缩最佳实践。</description></item><item><title>Kubernetes RBAC 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 RBAC 核心概念到生产级多租户权限设计，涵盖 ServiceAccount 最小权限、kubectl auth can-i 排查和命名空间隔离实践。</description></item><item><title>Kubernetes 存储：PV/PVC/StorageClass 实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</guid><description>从 PV/PVC 基础概念到生产级 CSI 配置，涵盖动态供给、StatefulSet 存储、AWS EBS/EFS、阿里云云盘/NAS 以及数据迁移实践。</description></item><item><title>Kubernetes 网络模型与 Service 详解</title><link>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</guid><description>从 K8s 网络基础模型到生产级 Service 配置，覆盖 CNI 插件对比、kube-proxy 模式选择、DNS 解析规则和排查思路。</description></item><item><title>Kubernetes 资源管理：requests/limits/QoS/配额</title><link>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</guid><description>从 CPU throttling 到内存 OOMKill，从 QoS 分类到驱逐优先级，系统梳理 Kubernetes 资源管理机制与生产调优实践。</description></item><item><title>Linux 磁盘与文件系统管理</title><link>https://socake.github.io/docs/linux/linux%E7%A3%81%E7%9B%98%E4%B8%8E%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%A3%81%E7%9B%98%E4%B8%8E%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F/</guid><description>从 fdisk 分区到 LVM 扩容快照，从 ext4 vs xfs 对比到 fsck 故障恢复，以及 /proc 和 /sys 中与存储相关的关键路径速查。</description></item><item><title>Linux 进程管理与作业控制</title><link>https://socake.github.io/docs/linux/linux%E8%BF%9B%E7%A8%8B%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E8%BF%9B%E7%A8%8B%E7%AE%A1%E7%90%86/</guid><description>从 ps/pstree 进程查看到 kill/pkill 信号发送，从 nice/ionice 优先级调整到 screen/tmux 会话管理，结合 systemctl/journalctl 和 ulimit 资源控制。</description></item><item><title>Linux 网络命令速查</title><link>https://socake.github.io/docs/linux/linux%E7%BD%91%E7%BB%9C%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%BD%91%E7%BB%9C%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>系统整理 Linux 网络排查工具链，包含 ss 连接状态过滤、tcpdump 过滤语法、iptables NAT 配置、curl 响应时间分析及 DNS 工具使用方法。</description></item><item><title>Linux 系统性能排查手册</title><link>https://socake.github.io/docs/linux/linux%E7%B3%BB%E7%BB%9F%E6%80%A7%E8%83%BD%E6%8E%92%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%B3%BB%E7%BB%9F%E6%80%A7%E8%83%BD%E6%8E%92%E6%9F%A5/</guid><description>覆盖 top/htop/mpstat/vmstat/iostat/sar 等核心命令，结合 iowait/softirq/CPU 窃取等指标含义，提供完整排查流程和组合命令速查。</description></item><item><title>Linux 用户权限与安全管理</title><link>https://socake.github.io/docs/linux/linux%E7%94%A8%E6%88%B7%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%94%A8%E6%88%B7%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 useradd/usermod 用户管理到 SUID/SGID 特殊权限，从 sudoers 配置到 fail2ban 防暴力破解，覆盖 Linux 系统安全加固的核心操作。</description></item><item><title>Python 操作 Kubernetes：kubernetes-client 实战</title><link>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</guid><description>系统介绍 Python kubernetes-client 的核心用法，从集群认证到资源操作，最终构建一个完整的 K8s 巡检脚本</description></item><item><title>Python 基础速查（运维向）</title><link>https://socake.github.io/docs/languages/python/python%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</guid><description>运维工程师必备的 Python 基础知识速查，从变量类型到标准库，聚焦实际使用场景</description></item><item><title>Python 网络编程与 HTTP 请求</title><link>https://socake.github.io/docs/languages/python/python%E7%BD%91%E7%BB%9C%E4%B8%8Ehttp/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E7%BD%91%E7%BB%9C%E4%B8%8Ehttp/</guid><description>从 requests 基础到 httpx 异步，再到并发健康检查脚本，覆盖运维工程师日常 HTTP 操作场景</description></item><item><title>Python 系统与文件操作实战</title><link>https://socake.github.io/docs/languages/python/python%E7%B3%BB%E7%BB%9F%E6%93%8D%E4%BD%9C/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E7%B3%BB%E7%BB%9F%E6%93%8D%E4%BD%9C/</guid><description>深入讲解 Python 系统操作，含 subprocess 进程管理、psutil 系统监控，以及一个完整的生产级日志清理脚本</description></item><item><title>Python 自动化运维脚本实战</title><link>https://socake.github.io/docs/languages/python/python%E8%87%AA%E5%8A%A8%E5%8C%96%E8%84%9A%E6%9C%AC/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E8%87%AA%E5%8A%A8%E5%8C%96%E8%84%9A%E6%9C%AC/</guid><description>系统化讲解 Python 自动化运维脚本的标准结构，包含命令行解析、日志、配置、告警和并发执行的完整最佳实践</description></item><item><title>Vim 速查手册</title><link>https://socake.github.io/docs/linux/vim%E9%80%9F%E6%9F%A5%E6%89%8B%E5%86%8C/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/vim%E9%80%9F%E6%9F%A5%E6%89%8B%E5%86%8C/</guid><description>覆盖 Vim 四种模式、所有移动方式、宏录制与寄存器、.vimrc 推荐配置，以及批量删除空行、注释多行、列操作等运维高频场景。</description></item><item><title>Prometheus + Grafana + Loki 可观测性体系建设</title><link>https://socake.github.io/docs/kubernetes/%E5%8F%AF%E8%A7%82%E6%B5%8B%E6%80%A7%E5%BB%BA%E8%AE%BE/</link><pubDate>Mon, 08 Dec 2025 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/%E5%8F%AF%E8%A7%82%E6%B5%8B%E6%80%A7%E5%BB%BA%E8%AE%BE/</guid><description>记录在多套 K8s 集群上建立统一可观测性平台的实践经验，包含 Prometheus 采集配置、告警规则设计、Grafana Dashboard 组织方式，以及跨集群日志聚合的 Loki 部署方案。</description></item><item><title>ArgoCD + Kustomize GitOps 体系实践</title><link>https://socake.github.io/docs/kubernetes/argocd-gitops%E5%AE%9E%E8%B7%B5/</link><pubDate>Mon, 08 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/argocd-gitops%E5%AE%9E%E8%B7%B5/</guid><description>记录在多套 K8s 集群（AWS EKS + 阿里云 ACK）上落地 GitOps 的完整过程：目录结构设计、Kustomize overlay 环境差异管理、ArgoCD ApplicationSet 自动化、以及真实踩过的坑。</description></item><item><title>Karpenter 弹性节点管理实战</title><link>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</link><pubDate>Mon, 08 Dec 2025 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/karpenter-%E5%BC%B9%E6%80%A7%E8%8A%82%E7%82%B9/</guid><description>Karpenter 替代 Cluster Autoscaler 的完整实践：NodePool 约束配置、EC2NodeClass 实例选型、consolidation 节点整合降本、Spot 实例容错，以及多套集群配置的组织方式。</description></item><item><title>kubectl 命令速查手册</title><link>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>kubectl 实用命令手册，按场景分类整理，涵盖资源查看、Pod调试、日志查看、滚动更新、扩缩容、强制删除等高频操作。</description></item><item><title>GitHub Actions CI/CD 实战：从镜像构建到 K8s 部署</title><link>https://socake.github.io/docs/cicd/github-actions-%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/github-actions-%E5%AE%9E%E6%88%98/</guid><description>完整的 GitHub Actions CI/CD 流水线设计：Docker 多阶段构建优化、ECR 推送、Kustomize 更新 GitOps 仓库触发 ArgoCD 自动部署，以及多环境（QA/PRE/PROD）的分支策略。</description></item><item><title>Kubernetes 核心架构全景</title><link>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubernetes-%E6%A0%B8%E5%BF%83%E6%9E%B6%E6%9E%84/</guid><description>深入理解 Kubernetes 控制面与工作节点各组件的职责与交互关系，结合生产环境实际经验，梳理核心资源对象与调度原理。</description></item><item><title>Shell 脚本运维速查手册</title><link>https://socake.github.io/docs/languages/shell/shell-%E8%BF%90%E7%BB%B4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/shell/shell-%E8%BF%90%E7%BB%B4%E9%80%9F%E6%9F%A5/</guid><description>Shell 运维速查手册，包含文本处理（awk/sed/grep）、进程排查、网络诊断、批量操作模板，以及实用的脚本编写规范。</description></item><item><title>DevOps/运维工程师面试题精选：K8s、Linux、网络高频考点</title><link>https://socake.github.io/posts/devops-interview-questions/</link><pubDate>Sun, 07 Dec 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-interview-questions/</guid><description>基于真实面试经验整理的运维/DevOps 面试题，覆盖 K8s 调度、故障排查、Linux 内核、网络协议等方向，附「面试官真正想考的点」，帮你把答案说到位。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-interview-questions/featured.jpg"/></item><item><title>SLSA 软件供应链等级实施：从 L1 到 L3 的工程化路径</title><link>https://socake.github.io/posts/supply-chain-slsa-framework/</link><pubDate>Fri, 05 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/supply-chain-slsa-framework/</guid><description>一份 SLSA v1.0 框架的实战落地笔记：讲清楚 Build Track 从 L1 到 L3 的具体要求、用 GitHub Actions 官方 generator 和 Tekton Chains 生成 provenance、用 slsa-verifier 和 Kyverno 做验证、以及和前面 Sigstore/Kyverno/Cosign 的整合。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/supply-chain-slsa-framework/featured.jpg"/></item><item><title>阿里云 SDK 运维自动化：ECS/ACK/RDS 资源管理与巡检脚本</title><link>https://socake.github.io/posts/aliyun-sdk-ops/</link><pubDate>Thu, 04 Dec 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aliyun-sdk-ops/</guid><description>用阿里云 Python SDK 实现 ECS 实例查询与监控、ACK 节点状态检查、RDS 慢查询巡检，整合成 HTML 格式巡检报告自动推送钉钉。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aliyun-sdk-ops/featured.jpg"/></item><item><title>Docker模板</title><link>https://socake.github.io/docs/docker/docker%E6%A8%A1%E6%9D%BF/</link><pubDate>Wed, 03 Dec 2025 22:57:03 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/docker/docker%E6%A8%A1%E6%9D%BF/</guid><description>Docker模板（如Dockerfile和docker-compose.yml）是容器化应用的蓝图。它们将应用的环境、依赖和配置代码化，实现了一次编写、处处运行的自动化部署</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/docs/docker/docker%E6%A8%A1%E6%9D%BF/featured.jpg"/></item><item><title>完整安装包下载</title><link>https://socake.github.io/docs/linux/linux%E5%91%BD%E4%BB%A4/%E5%AE%8C%E6%95%B4%E5%AE%89%E8%A3%85%E5%8C%85%E4%B8%8B%E8%BD%BD/</link><pubDate>Wed, 03 Dec 2025 22:53:28 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E5%91%BD%E4%BB%A4/%E5%AE%8C%E6%95%B4%E5%AE%89%E8%A3%85%E5%8C%85%E4%B8%8B%E8%BD%BD/</guid><description>在生产环境或内网服务器中，直接联网安装软件往往不可行。本文将详细介绍在Linux环境下离线下载软件包及其完整依赖的几种核心方法</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/docs/linux/linux%E5%91%BD%E4%BB%A4/%E5%AE%8C%E6%95%B4%E5%AE%89%E8%A3%85%E5%8C%85%E4%B8%8B%E8%BD%BD/featured.png"/></item><item><title>Docker存储及镜像制作</title><link>https://socake.github.io/docs/docker/docker%E5%AD%98%E5%82%A8%E5%8F%8A%E9%95%9C%E5%83%8F%E5%88%B6%E4%BD%9C/</link><pubDate>Wed, 03 Dec 2025 22:26:23 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/docker/docker%E5%AD%98%E5%82%A8%E5%8F%8A%E9%95%9C%E5%83%8F%E5%88%B6%E4%BD%9C/</guid><description>数据持久化和自定义镜像是Docker进阶使用的关键。本文将介绍如何通过卷(Volumes)和绑定挂载(Bind Mounts)管理数据，以及如何从零开始编写Dockerfile来构建符合自己需求的应用镜像</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/docs/docker/docker%E5%AD%98%E5%82%A8%E5%8F%8A%E9%95%9C%E5%83%8F%E5%88%B6%E4%BD%9C/featured.jpg"/></item><item><title>Docker基本使用</title><link>https://socake.github.io/docs/docker/docker%E5%9F%BA%E6%9C%AC%E4%BD%BF%E7%94%A8/</link><pubDate>Wed, 03 Dec 2025 22:26:23 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/docker/docker%E5%9F%BA%E6%9C%AC%E4%BD%BF%E7%94%A8/</guid><description>本文涵盖日常最常用的Docker命令。从拉取镜像、启动容器，到查看日志、进入容器内部调试，你将掌握容器生命周期的基本管理操作</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/docs/docker/docker%E5%9F%BA%E6%9C%AC%E4%BD%BF%E7%94%A8/featured.jpg"/></item><item><title>Docker简介</title><link>https://socake.github.io/docs/docker/docker%E7%AE%80%E4%BB%8B/</link><pubDate>Wed, 03 Dec 2025 22:26:23 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/docker/docker%E7%AE%80%E4%BB%8B/</guid><description>Docker是一个开源的容器化平台。它彻底改变了软件的打包、分发和运行方式，使应用及其运行环境成为一个轻量级、可移植的“容器”，从而解决了“在本地环境能运行，在其他环境却失败”的经典难题</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/docs/docker/docker%E7%AE%80%E4%BB%8B/featured.jpg"/></item><item><title>Kubernetes Operator 开发实战：Go + controller-runtime 完全指南</title><link>https://socake.github.io/posts/kubernetes-operator-development/</link><pubDate>Wed, 03 Dec 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-operator-development/</guid><description>用 Go + controller-runtime 开发生产级 Kubernetes Operator 的完整实战指南。以 DatabaseCluster Operator 为例，深入讲解 CRD 设计、Reconcile 模式、Status Conditions、Finalizer 防孤儿资源、Leader Election、指标暴露、Webhook 验证，以及 envtest + Kind 测试策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-operator-development/featured.jpg"/></item><item><title>Kubernetes 多租户方案深度对比：vCluster vs Capsule vs HNC</title><link>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</link><pubDate>Wed, 03 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/</guid><description>Namespace 级隔离远不够用。本文深入剖析 vCluster、Capsule、HNC 三种主流多租户方案的架构差异，给出完整的部署配置示例、隔离能力横向对比，以及 SaaS 平台、内部平台、开发环境三种场景下的选型建议。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-multitenancy-deep-dive/featured.jpg"/></item><item><title>网站收藏</title><link>https://socake.github.io/resources/website/</link><pubDate>Mon, 01 Dec 2025 00:00:00 +0000</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/resources/website/</guid><description/></item><item><title>基础设施即代码：Terraform 入门与实践</title><link>https://socake.github.io/posts/%E5%9F%BA%E7%A1%80%E8%AE%BE%E6%96%BD%E5%8D%B3%E4%BB%A3%E7%A0%81/</link><pubDate>Sun, 30 Nov 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E5%9F%BA%E7%A1%80%E8%AE%BE%E6%96%BD%E5%8D%B3%E4%BB%A3%E7%A0%81/</guid><description>从 IaC 解决的本质问题出发，系统介绍 Terraform 的核心概念和工作流，重点覆盖 State 管理、模块化最佳实践，以及常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E5%9F%BA%E7%A1%80%E8%AE%BE%E6%96%BD%E5%8D%B3%E4%BB%A3%E7%A0%81/featured.jpg"/></item><item><title>Kyverno 策略即代码实战：从准入到变异到生成的全场景落地</title><link>https://socake.github.io/posts/kyverno-policy-as-code/</link><pubDate>Fri, 28 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kyverno-policy-as-code/</guid><description>一份基于 Kyverno 1.12+ 的生产落地笔记：覆盖 validate/mutate/generate/verifyImages 四种策略类型的实战用法、CEL 和 JMESPath 表达式语法、策略分层治理、PolicyException、性能调优和常见踩坑，并与 OPA Gatekeeper 做对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kyverno-policy-as-code/featured.jpg"/></item><item><title>零信任网络改造：从公网暴露到 Headscale VPN</title><link>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</link><pubDate>Sat, 22 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</guid><description>从发现公网暴露的安全隐患开始，到用 Headscale 自建零信任网络，替代跳板机体系，实现 kubectl 和运维系统的 VPN 接入。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/featured.jpg"/></item><item><title>Pod Security Standards 生产落地：从 PSP 到 PSA 的迁移实战</title><link>https://socake.github.io/posts/kubernetes-pod-security-standards/</link><pubDate>Fri, 21 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-pod-security-standards/</guid><description>一份从 PSP 迁移到 Pod Security Standards 的实战笔记：对比 Baseline 与 Restricted 两套 profile 的实际约束、Pod Security Admission 的三种 mode、如何一次性迁移 200+ 命名空间、和 Kyverno/OPA 互补使用的最佳实践，以及遗留业务 securityContext 改造的典型模式。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-pod-security-standards/featured.jpg"/></item><item><title>如何设计一个好的告警体系</title><link>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</link><pubDate>Tue, 18 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</guid><description>从真实的告警噪音泛滥经历出发，分享如何用 SLI/SLO 重新设计告警体系，包括告警分级、规则设计原则、路由策略和复盘机制。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/featured.jpg"/></item><item><title>大模型核心概念：工程师需要理解的 LLM 基础</title><link>https://socake.github.io/posts/llm-core-concepts/</link><pubDate>Mon, 17 Nov 2025 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/llm-core-concepts/</guid><description>同事第一次用 GPT-4 API 写代码时问我：为什么我发了一段中文，token 消耗比英文多那么多？为什么模型有时候会一本正经地胡说八道？这篇文章把我认为工程师必须理解的 LLM 概念系统整理了一遍，不涉及 Transformer 数学，只讲对你写代码有帮助的部分。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/llm-core-concepts/featured.jpg"/></item><item><title>密钥自动轮换实战：Vault、AWS Secrets Manager 与 SOPS 的工程化方案</title><link>https://socake.github.io/posts/secret-rotation-automation/</link><pubDate>Fri, 14 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/secret-rotation-automation/</guid><description>一份来自生产环境的密钥轮换实战笔记：对比 Vault dynamic secret、AWS Secrets Manager 原生 rotation、SOPS + GitOps 三种方案的适用场景，给出数据库、Kafka SASL、TLS 证书、API key 的完整轮换工作流，并分享 ESO 同步、rotation 风暴、灰度发布等真实踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/secret-rotation-automation/featured.jpg"/></item><item><title>RAG 系统设计与实战：检索增强生成完全指南</title><link>https://socake.github.io/posts/rag-system-design-practice/</link><pubDate>Tue, 11 Nov 2025 11:41:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/rag-system-design-practice/</guid><description>RAG（检索增强生成）是目前企业落地 LLM 最主流的方式。本文覆盖 RAG 系统的完整设计：文档处理管线、分块策略、向量检索与关键词混合检索、Rerank 重排序、上下文压缩，以及用 RAGAS 框架评估 RAG 质量，最后分享生产环境踩坑记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/rag-system-design-practice/featured.jpg"/></item><item><title>WebAssembly 在云原生中的应用：从浏览器到 K8s 数据面</title><link>https://socake.github.io/posts/webassembly-cloud-native/</link><pubDate>Sat, 08 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/webassembly-cloud-native/</guid><description>WebAssembly 在云原生领域的热度持续上涨，但很多讨论都停留在概念层面。这篇文章试图给出一个务实的视角：Wasm 在哪些云原生场景已经可以生产落地，在哪些场景还需要等待，以及和容器相比的真实差异。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/webassembly-cloud-native/featured.jpg"/></item><item><title>Istio Ambient Mode 无 Sidecar 服务网格实践</title><link>https://socake.github.io/posts/istio-ambient-mesh-practice/</link><pubDate>Sat, 08 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-ambient-mesh-practice/</guid><description>Sidecar 模式已经陪我们走了六七年，但它的问题也越来越难以忽视。Ambient Mode 不是缝缝补补，而是从架构层面重新设计了服务网格的数据面。本文从实际运维视角深入拆解 ztunnel + Waypoint 两层架构，并给出从 Sidecar 迁移到 Ambient 的完整路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-ambient-mesh-practice/featured.jpg"/></item><item><title>用 WireGuard 构建多云 mesh VPN：从点对点到全网互联</title><link>https://socake.github.io/posts/wireguard-mesh-vpn/</link><pubDate>Fri, 07 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/wireguard-mesh-vpn/</guid><description>一份从实战出发的 WireGuard mesh VPN 笔记：讲清楚为什么不用 IPSec/OpenVPN、手写配置 vs Netmaker vs Tailscale 的选型对比、AWS 与阿里云跨云 mesh 的真实部署方案、MTU 与 NAT 穿透的踩坑，以及自动化密钥分发与监控方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/wireguard-mesh-vpn/featured.jpg"/></item><item><title>Milvus 向量数据库实战：从部署到生产应用</title><link>https://socake.github.io/posts/milvus-vector-database-practice/</link><pubDate>Thu, 06 Nov 2025 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/milvus-vector-database-practice/</guid><description>覆盖向量数据库选型对比（Milvus/Qdrant/Weaviate/pgvector）、Milvus Standalone与Cluster部署、Collection Schema设计、HNSW/IVF_FLAT索引调优、混合搜索实战，以及生产环境常见问题处理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/milvus-vector-database-practice/featured.jpg"/></item><item><title>Kubernetes GPU 调度实战：AI 训练与推理基础设施</title><link>https://socake.github.io/posts/kubernetes-gpu-scheduling/</link><pubDate>Wed, 05 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gpu-scheduling/</guid><description>GPU 是 AI 基础设施的核心资源，如何在 Kubernetes 上高效调度和管理 GPU 直接影响训练效率和推理成本。本文从底层驱动安装到上层调度策略，完整覆盖 K8s GPU 基础设施的搭建、监控和优化实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gpu-scheduling/featured.jpg"/></item><item><title>Python 操作 Elasticsearch：从索引管理到复杂聚合查询</title><link>https://socake.github.io/posts/python-elasticsearch-client/</link><pubDate>Tue, 04 Nov 2025 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-elasticsearch-client/</guid><description>从客户端初始化到批量操作、scroll 查询、聚合统计，一篇文章搞定 Python 操作 Elasticsearch 的高频场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-elasticsearch-client/featured.jpg"/></item><item><title>Python 定时任务工程化：APScheduler 与 Celery Beat 实战对比</title><link>https://socake.github.io/posts/python-scheduled-tasks/</link><pubDate>Sat, 01 Nov 2025 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-scheduled-tasks/</guid><description>APScheduler 和 Celery Beat 是 Python 定时任务的两大主流方案。本文从使用场景出发，对比两者的架构差异、适用边界，并介绍 K8s CronJob 作为第三条路的价值，帮你在项目里选对工具。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-scheduled-tasks/featured.jpg"/></item><item><title>Cilium NetworkPolicy 与 L7 过滤生产落地实战</title><link>https://socake.github.io/posts/cilium-network-policy-production/</link><pubDate>Fri, 31 Oct 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cilium-network-policy-production/</guid><description>一份基于 Cilium 1.16+ 的生产落地笔记：讲清楚 Kubernetes NetworkPolicy 的局限、CiliumNetworkPolicy 的扩展能力、L7 HTTP/Kafka/DNS 过滤的真实用法、Hubble 可观测性、策略开发方法论，以及多集群 ClusterMesh 场景下的策略治理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cilium-network-policy-production/featured.jpg"/></item><item><title>CoreDNS 深度排障：K8s DNS 问题完全指南</title><link>https://socake.github.io/posts/coredns-troubleshooting-guide/</link><pubDate>Wed, 29 Oct 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/coredns-troubleshooting-guide/</guid><description>DNS 问题是 K8s 中最难定位的问题之一，因为它的失败往往是间歇性的、有延迟的，看起来像网络问题，实际上是 DNS 超时。本文记录了我在生产环境排查过的多类 DNS 故障，附详细的抓包分析和调优配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/coredns-troubleshooting-guide/featured.jpg"/></item><item><title>SBOM 生成与 Dependency-Track 漏洞管理实战</title><link>https://socake.github.io/posts/sbom-dependency-track/</link><pubDate>Fri, 24 Oct 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sbom-dependency-track/</guid><description>一份基于生产环境的 SBOM 实战指南：讲清楚 CycloneDX 与 SPDX 的格式差异、Syft/cdxgen/Trivy 三款主流生成器的对比，部署 Dependency-Track 4.12 做持续漏洞监测，通过策略违规自动化处置 CVE，并分享 SBOM 消费链路上的真实踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sbom-dependency-track/featured.jpg"/></item><item><title>k6 压测实战：从脚本编写到性能分析</title><link>https://socake.github.io/posts/k6-load-testing-practice/</link><pubDate>Tue, 21 Oct 2025 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k6-load-testing-practice/</guid><description>压测不是跑一个脚本看能不能撑住，而是通过有设计的负载模型暴露系统瓶颈。本文记录了我用 k6 做生产级性能测试的完整实践：脚本设计、阈值配置、与 Grafana 集成，以及几个典型性能问题的定位过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k6-load-testing-practice/featured.jpg"/></item><item><title>TCP/IP 网络排障：抓包与连接问题诊断</title><link>https://socake.github.io/posts/tcp-network-troubleshooting/</link><pubDate>Tue, 21 Oct 2025 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tcp-network-troubleshooting/</guid><description>网络问题排查的核心是「眼见为实」，没有抓包的排障都是猜测。本文系统梳理了 tcpdump 的实战用法、TCP 连接状态机分析、conntrack 追踪，以及 Kubernetes 中 NodePort/LoadBalancer 的典型网络故障定位方法。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tcp-network-troubleshooting/featured.jpg"/></item><item><title>Sigstore/Cosign 镜像签名实战：从 keyless 签名到准入策略验证</title><link>https://socake.github.io/posts/sigstore-cosign-signing-workflow/</link><pubDate>Fri, 17 Oct 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sigstore-cosign-signing-workflow/</guid><description>一份 Sigstore 生产化落地笔记：讲清楚 Fulcio/Rekor/Cosign 三件套的工作原理，演示 GitHub Actions 和 GitLab CI 下的 keyless 签名流水线，对接 Kyverno/Policy Controller 做准入验证，并分享签名验证性能、Rekor 不可用降级、多签策略等真实运维经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sigstore-cosign-signing-workflow/featured.jpg"/></item><item><title>Vector 日志处理管道：高性能日志采集与转换实践</title><link>https://socake.github.io/posts/vector-log-pipeline/</link><pubDate>Tue, 14 Oct 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vector-log-pipeline/</guid><description>从架构对比到 K8s DaemonSet 落地，结合 VRL 实战示例和踩坑经验，讲透 Vector 在日志采集管道中的应用。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vector-log-pipeline/featured.jpg"/></item><item><title>Filebeat + Logstash 日志采集管道：大规模日志处理实战</title><link>https://socake.github.io/posts/filebeat-logstash-pipeline/</link><pubDate>Fri, 10 Oct 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/filebeat-logstash-pipeline/</guid><description>大流量日志场景下，Fleet 直写 ES 会出现严重写入堆积。本文记录了我们从 Fleet 切换到 Filebeat + Kafka + Logstash 管道的全过程，重点讲 Logstash pipeline 配置和性能调优。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/filebeat-logstash-pipeline/featured.jpg"/></item><item><title>SPIFFE/SPIRE 工作负载身份实战：零信任网络的身份基石</title><link>https://socake.github.io/posts/spiffe-spire-workload-identity/</link><pubDate>Fri, 10 Oct 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/spiffe-spire-workload-identity/</guid><description>一份从生产部署出发的 SPIFFE/SPIRE 实战笔记：讲清楚 SVID、节点证明、工作负载证明、信任域联邦这些核心概念，用 Kubernetes + Istio + 非 K8s 工作负载的混合场景展示 SPIRE 如何统一身份，并分享升级、备份、Agent 崩溃等真实运维踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/spiffe-spire-workload-identity/featured.jpg"/></item><item><title>ELK 集群监控：用 Prometheus + Grafana 监控 Elasticsearch 健康</title><link>https://socake.github.io/posts/elk-prometheus-monitoring/</link><pubDate>Wed, 08 Oct 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elk-prometheus-monitoring/</guid><description>Kibana 内置的 Stack Monitoring 免费功能有限，告警媒介也受商业授权约束。我们最终选择 Prometheus + Grafana 方案监控 ELK 集群，这篇文章记录完整的落地过程和踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elk-prometheus-monitoring/featured.jpg"/></item><item><title>Elasticsearch 备份与恢复：快照管理与跨集群迁移实践</title><link>https://socake.github.io/posts/elasticsearch-backup-restore/</link><pubDate>Fri, 03 Oct 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-backup-restore/</guid><description>Snapshot API 配置、S3 IRSA 认证、定时快照脚本，以及跨集群迁移三种方案的对比与实战踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-backup-restore/featured.jpg"/></item><item><title>Falco 运行时安全实战：从规则开发到生产级调优</title><link>https://socake.github.io/posts/falco-runtime-security-deep/</link><pubDate>Fri, 03 Oct 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/falco-runtime-security-deep/</guid><description>一份来自生产环境的 Falco 实战笔记：从 eBPF 驱动选型、规则开发方法论、误报治理，到与 Falcosidekick、Loki、SIEM 的告警联动，覆盖 0.40/0.41/0.42 三个版本的关键变更与真实踩坑案例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/falco-runtime-security-deep/featured.jpg"/></item><item><title>Elasticsearch 查询实战：从 URI Search 到 DSL 复杂聚合</title><link>https://socake.github.io/posts/elasticsearch-dsl-query/</link><pubDate>Wed, 01 Oct 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-dsl-query/</guid><description>ES 查询是每个运维必须掌握的技能。这篇文章从 URI Search 快速上手，到 DSL bool 查询、聚合分析，再到运维常用的 _cat API，配合真实排障场景整理成一篇实战手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-dsl-query/featured.jpg"/></item><item><title>Prometheus 高基数治理实战：从 8 亿 series 到可控增长</title><link>https://socake.github.io/posts/metric-cardinality-governance/</link><pubDate>Sun, 28 Sep 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/metric-cardinality-governance/</guid><description>高基数是 Prometheus 生态里最常见的性能杀手。这篇把「为什么发生、怎么发现、怎么治理」讲清楚，并给出一套可推广的组织治理方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/metric-cardinality-governance/featured.jpg"/></item><item><title>Elasticsearch 索引策略：ILM 生命周期管理与写入性能优化</title><link>https://socake.github.io/posts/elasticsearch-index-optimization/</link><pubDate>Wed, 24 Sep 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-index-optimization/</guid><description>ILM 四阶段配置、rollover 策略、bulk 写入调优，以及分片数规划和 mapping 爆炸的避坑指南。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-index-optimization/featured.jpg"/></item><item><title>On-Call 轮值管理实战：从告警疲劳到可持续值班</title><link>https://socake.github.io/posts/oncall-rotation-management/</link><pubDate>Wed, 24 Sep 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/oncall-rotation-management/</guid><description>On-call 不是福利也不是惩罚，是一份职责。把它做成可持续的工程实践，比任何高级监控工具都重要。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/oncall-rotation-management/featured.jpg"/></item><item><title>Elasticsearch 集群部署实战：ECK 在 K8s 上的生产级配置</title><link>https://socake.github.io/posts/elasticsearch-cluster-deployment/</link><pubDate>Fri, 19 Sep 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-cluster-deployment/</guid><description>从集群角色规划到 ECK Operator 落地，结合生产环境踩坑经验，完整讲解 Elasticsearch 在 Kubernetes 上的生产级部署方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-cluster-deployment/featured.jpg"/></item><item><title>eBPF 可观测性实践：Cilium 网络监控与 Tetragon 安全审计</title><link>https://socake.github.io/posts/ebpf-observability/</link><pubDate>Wed, 17 Sep 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ebpf-observability/</guid><description>eBPF 正在重塑云原生可观测性的底层基础。本文记录在 K8s 集群中落地 Cilium + Hubble 网络监控和 Tetragon 安全审计的实践经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ebpf-observability/featured.jpg"/></item><item><title>混沌工程实战：Chaos Mesh 在 K8s 中注入故障</title><link>https://socake.github.io/posts/chaos-mesh-practice/</link><pubDate>Sat, 13 Sep 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/chaos-mesh-practice/</guid><description>混沌工程不是破坏系统，而是在可控环境中提前暴露脆弱点。本文记录了我用 Chaos Mesh 在生产级 K8s 集群中设计并执行混沌演练的完整过程，包括安装、实验配置、Workflow 编排和游戏日流程设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/chaos-mesh-practice/featured.jpg"/></item><item><title>Backstage 开发者门户实战：构建内部开发者平台</title><link>https://socake.github.io/posts/backstage-developer-portal/</link><pubDate>Fri, 12 Sep 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/backstage-developer-portal/</guid><description>当团队规模超过 50 人，服务数量超过 100 个，「配置漂移」和「信息孤岛」就成了真实痛点。Backstage 是解决这个问题的平台工程利器。本文从部署到定制，完整拆解如何用 Backstage 构建真正能用起来的内部开发者平台。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/backstage-developer-portal/featured.jpg"/></item><item><title>OPA/Kyverno：K8s 准入控制策略实战</title><link>https://socake.github.io/posts/opa-kyverno-admission-control/</link><pubDate>Thu, 11 Sep 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opa-kyverno-admission-control/</guid><description>没有准入控制的 K8s 集群就像一个没有门卫的机房——任何人都能随意进出。本文记录了我在多个生产集群部署 Kyverno 策略的实战经验，涵盖资源限制强制、镜像来源白名单、标签规范、以及与 OPA Gatekeeper 的对比选型思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opa-kyverno-admission-control/featured.jpg"/></item><item><title>故障响应与 Blameless 复盘：让每一次事故都变成组织资产</title><link>https://socake.github.io/posts/incident-response-postmortem/</link><pubDate>Wed, 10 Sep 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/incident-response-postmortem/</guid><description>事故响应不是英雄主义，是一套可重复的流程。把流程、模板、文化讲清楚，让每次事故都能沉淀成组织资产。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/incident-response-postmortem/featured.jpg"/></item><item><title>供应链安全：Trivy 镜像扫描 + Cosign 签名验证实践</title><link>https://socake.github.io/posts/trivy-cosign-supply-chain/</link><pubDate>Sat, 06 Sep 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/trivy-cosign-supply-chain/</guid><description>你的镜像安全吗？本文梳理容器供应链的主要攻击面，手把手演示 Trivy 扫描、Cosign 签名、K8s 准入控制三层防护的搭建过程，并给出 GitLab CI 集成示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/trivy-cosign-supply-chain/featured.jpg"/></item><item><title>混沌工程 GameDay 实战指南：从第一次演练到常态化故障注入</title><link>https://socake.github.io/posts/chaos-engineering-gameday/</link><pubDate>Wed, 27 Aug 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/chaos-engineering-gameday/</guid><description>别把混沌工程理解成随便 kill pod。真正有价值的是一套假设驱动的演练方法论：演练前写下假设，演练中验证，复盘后改进系统和流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/chaos-engineering-gameday/featured.jpg"/></item><item><title>用 Go 写 K8s 运维工具：client-go 实战</title><link>https://socake.github.io/posts/go-kubernetes-client-tools/</link><pubDate>Mon, 25 Aug 2025 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/go-kubernetes-client-tools/</guid><description>kubectl 能解决 80% 的日常问题，剩下 20% 需要你自己写工具。本文用实际可运行的 Go 代码，展示如何用 client-go 构建批量重启 Deployment、Pod 资源报告、过期 ConfigMap 清理等运维工具，并用 cobra 封装成 CLI。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/go-kubernetes-client-tools/featured.jpg"/></item><item><title>AWS EKS 生产实践：网络、安全与多集群管理</title><link>https://socake.github.io/posts/aws-eks-best-practices/</link><pubDate>Fri, 22 Aug 2025 12:51:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aws-eks-best-practices/</guid><description>管理多套 EKS 集群两年下来，踩了不少坑。本文系统整理网络选型、IAM 权限、节点管理、集群升级、安全加固和成本控制这六个核心话题，每个话题都有具体配置示例和实际遇到的问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aws-eks-best-practices/featured.jpg"/></item><item><title>DevSecOps 安全左移实践：从代码到生产的全链路安全</title><link>https://socake.github.io/posts/devsecops-practice/</link><pubDate>Wed, 20 Aug 2025 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devsecops-practice/</guid><description>安全不是最后一道关卡，而是嵌入每个研发环节的连续过程。本文从代码静态分析、依赖漏洞扫描、镜像安全、K8s 运行时防护到供应链签名，逐层拆解 DevSecOps 的完整实施路径，并给出一个可落地的流水线设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devsecops-practice/featured.jpg"/></item><item><title>Kubernetes 成本优化实战：系统性降本的四条路径</title><link>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 18 Aug 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</guid><description>真实的降本案例：从发现成本异常到分析根因，通过 Karpenter 节点弹性伸缩、资源请求规格治理、大机型收敛等手段，系统性降低 AWS EC2 成本。包含具体配置和执行思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/featured.jpg"/></item><item><title>云原生转型实践：从传统运维到 K8s 的迁移经验</title><link>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</link><pubDate>Thu, 14 Aug 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</guid><description>这是一篇个人经验向的文章，记录了从传统虚拟机运维转向 Kubernetes 的全过程：为什么要迁移、迁移中踩了哪些坑、团队如何度过学习曲线，以及回头看哪些事情当时做对了。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/featured.jpg"/></item><item><title>Kiali 服务网格可观测性实战：从拓扑图到告警联动</title><link>https://socake.github.io/posts/kiali-service-mesh-observability/</link><pubDate>Tue, 12 Aug 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kiali-service-mesh-observability/</guid><description>Kiali 不只是画拓扑图的工具，它是服务网格的诊断中心。本文把 Kiali 2.x 在生产中的配置、用法、踩坑都写清楚。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kiali-service-mesh-observability/featured.jpg"/></item><item><title>平台工程实践：构建 Internal Developer Platform</title><link>https://socake.github.io/posts/platform-engineering-practice/</link><pubDate>Sun, 10 Aug 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/platform-engineering-practice/</guid><description>平台工程不是给 DevOps 换个名字，而是把基础设施能力产品化——让开发者像用 SaaS 一样消费平台能力。这篇文章记录我们团队从 0 到 MVP 的六个月实践，包括 Backstage 落地、黄金路径设计、以及用 DORA 指标验证平台价值。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/platform-engineering-practice/featured.jpg"/></item><item><title>SLO/SLI/Error Budget 从理论到落地：SRE 可靠性工程实战</title><link>https://socake.github.io/posts/slo-sli-error-budget-practice/</link><pubDate>Fri, 01 Aug 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/slo-sli-error-budget-practice/</guid><description>从 SLI 指标选取到 Error Budget 消耗速率告警，系统讲解 SRE 可靠性工程体系的落地实践，包括 Prometheus recording rules 计算 SLI、多窗口 burn rate 告警规则配置、SLO 违规复盘流程，以及与开发团队的协作策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/slo-sli-error-budget-practice/featured.jpg"/></item><item><title>Cilium Hubble 实战：用 eBPF 看透 Kubernetes 网络</title><link>https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/</link><pubDate>Wed, 30 Jul 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/</guid><description>Cilium Hubble 是 Kubernetes 下最接近交换机镜像端口的东西。本文讲清楚它的架构、关键配置和生产上如何读 flow 定位网络问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ebpf-network-observability-cilium-hubble/featured.jpg"/></item><item><title>VictoriaMetrics：比 Prometheus 更省资源的监控存储方案</title><link>https://socake.github.io/posts/victoriametrics-prometheus/</link><pubDate>Mon, 28 Jul 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/victoriametrics-prometheus/</guid><description>Prometheus 撑不住了？本文对比 VictoriaMetrics 与 Prometheus 的核心差异，介绍 remote_write 无缝迁移方案，以及 VM 在资源占用、压缩率、查询性能上的实际提升。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/victoriametrics-prometheus/featured.jpg"/></item><item><title>Thanos 实战：多 K8s 集群 Prometheus 统一监控与长期存储</title><link>https://socake.github.io/posts/thanos-multi-cluster/</link><pubDate>Sat, 26 Jul 2025 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/thanos-multi-cluster/</guid><description>记录我们将三套 EKS 集群的独立 Prometheus 迁移到 Thanos 统一监控体系的全过程，重点覆盖选型决策、生产配置和踩坑总结。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/thanos-multi-cluster/featured.jpg"/></item><item><title>OpenTelemetry 落地实践：统一采集 Traces、Metrics、Logs</title><link>https://socake.github.io/posts/opentelemetry-practice/</link><pubDate>Sun, 20 Jul 2025 11:41:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opentelemetry-practice/</guid><description>从为什么选 OpenTelemetry 讲起，给出 DaemonSet + Gateway 的 Collector 部署架构、关键配置和实际踩坑记录。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opentelemetry-practice/featured.jpg"/></item><item><title>Grafana Tempo 大规模分布式追踪实战：从 OTel 接入到 TraceQL 调优</title><link>https://socake.github.io/posts/grafana-tempo-distributed-tracing/</link><pubDate>Wed, 16 Jul 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grafana-tempo-distributed-tracing/</guid><description>Tempo 是目前最便宜的分布式追踪后端。本文把架构、接入、TraceQL、tail sampling、成本优化、事故案例都串起来，供团队直接抄作业。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grafana-tempo-distributed-tracing/featured.jpg"/></item><item><title>可观测性三支柱实战：Metrics/Logs/Traces 联动</title><link>https://socake.github.io/posts/observability-three-pillars/</link><pubDate>Mon, 14 Jul 2025 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/observability-three-pillars/</guid><description>监控告诉你系统挂了，可观测性告诉你为什么挂。本文从三支柱的核心差异出发，讲透 Prometheus+Loki+Tempo 的联动排障流程，覆盖 OpenTelemetry 采集标准、Exemplar 原理与配置，以及可观测性建设的优先级策略。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/observability-three-pillars/featured.jpg"/></item><item><title>DORA 指标与平台工程效能度量：用数据驱动 DevOps 改进</title><link>https://socake.github.io/posts/dora-metrics-platform-engineering/</link><pubDate>Sat, 12 Jul 2025 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/dora-metrics-platform-engineering/</guid><description>DORA 四个指标不是考核工具，是诊断工具。从 CI/CD 流水线和 Incident 系统采集数据，找到部署频率低、前置时间长的真实原因，然后用平台工程手段系统性改进。本文给出采集方案、Grafana 看板设计和常见误用陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/dora-metrics-platform-engineering/featured.jpg"/></item><item><title>分布式链路追踪实战：Jaeger 与 Tempo 选型对比</title><link>https://socake.github.io/posts/distributed-tracing-jaeger-tempo/</link><pubDate>Thu, 10 Jul 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/distributed-tracing-jaeger-tempo/</guid><description>系统梳理 Jaeger 与 Tempo 的架构差异与适用场景，结合 OpenTelemetry SDK 插桩、TraceQL 查询、采样策略和 Traces/Metrics/Logs 关联，给出可落地的生产实战方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/distributed-tracing-jaeger-tempo/featured.jpg"/></item><item><title>On-Call 工程实践：从告警响应到 Runbook 设计</title><link>https://socake.github.io/posts/on-call-engineering-practice/</link><pubDate>Tue, 08 Jul 2025 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/on-call-engineering-practice/</guid><description>好的 On-Call 体系不是让人 24 小时盯着屏幕，而是让每一次叫醒都有价值。从告警质量到 Runbook 设计，从轮班制度到数据驱动改进，这篇文章是我们团队在生产环境打磨 3 年的实践总结。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/on-call-engineering-practice/featured.jpg"/></item><item><title>SRE 故障管理全生命周期：从响应到复盘</title><link>https://socake.github.io/posts/sre-incident-management/</link><pubDate>Sat, 05 Jul 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sre-incident-management/</guid><description>故障处理不只是技术问题，更是协作和信息流问题。这篇文章完整梳理了从故障触发到 Post-Mortem 归档的每个环节，包括 IC 角色的意义、15 分钟定界框架，以及如何让 Post-Mortem 真正推动改进而不是走过场。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre-incident-management/featured.jpg"/></item><item><title>Pyroscope 持续性能剖析生产实战：给每一行代码一个性能画像</title><link>https://socake.github.io/posts/pyroscope-continuous-profiling/</link><pubDate>Wed, 02 Jul 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/pyroscope-continuous-profiling/</guid><description>为什么 metrics/logs/traces 之外还需要 profiling，它解决的是什么问题，Pyroscope 的架构是什么，怎样以 2%~5% overhead 把它铺到整个 K8s 集群。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/pyroscope-continuous-profiling/featured.jpg"/></item><item><title>Crossplane：用 GitOps 方式管理云资源（AWS/阿里云）</title><link>https://socake.github.io/posts/crossplane-gitops-cloud/</link><pubDate>Thu, 26 Jun 2025 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/crossplane-gitops-cloud/</guid><description>Crossplane 把 AWS RDS、S3、EKS 变成 K8s CRD，用 GitOps 方式持续协调云资源状态。记录从概念到落地的实践过程和踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/crossplane-gitops-cloud/featured.jpg"/></item><item><title>SRE 核心理念：从运维思维到可靠性工程</title><link>https://socake.github.io/posts/sre-concepts-and-principles/</link><pubDate>Thu, 26 Jun 2025 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sre-concepts-and-principles/</guid><description>SRE 不是给运维换了个更好听的名字。它是一套用软件工程思维解决可靠性问题的方法论。本文从 Error Budget 切入，覆盖 SLI/SLO 制定、Toil 识别、On-call 设计、故障复盘文化，以及从传统运维转型 SRE 的实际路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre-concepts-and-principles/featured.jpg"/></item><item><title>OpenTofu 实战：开源 Terraform 管理 AWS 和阿里云基础设施</title><link>https://socake.github.io/posts/opentofu-terraform-practice/</link><pubDate>Wed, 18 Jun 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/opentofu-terraform-practice/</guid><description>Terraform 改协议了，OpenTofu 是开源的替代。本文介绍 OpenTofu 核心概念，并给出创建 AWS EKS 和阿里云 ACK 的完整配置示例，以及 State 管理、Module 复用和 Atlantis GitOps 集成方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/opentofu-terraform-practice/featured.jpg"/></item><item><title>Grafana Mimir 长期指标存储实战：从单集群 Prometheus 到 10 亿级 series</title><link>https://socake.github.io/posts/grafana-mimir-long-term-metrics/</link><pubDate>Wed, 18 Jun 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grafana-mimir-long-term-metrics/</guid><description>从一套 Prometheus HA pair 起步，一路扩到跨三地多活 Mimir，把 series 数从千万推到十亿级。本文把架构、配置、监控、事故按顺序讲清楚。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grafana-mimir-long-term-metrics/featured.jpg"/></item><item><title>Kubernetes NetworkPolicy 网络隔离实战</title><link>https://socake.github.io/posts/kubernetes-network-policy/</link><pubDate>Sun, 15 Jun 2025 09:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-network-policy/</guid><description>系统讲解 Kubernetes NetworkPolicy 的工作机制与生产实战配置，覆盖 deny-all 基础模板、常见隔离场景、Cilium 扩展、多租户设计、测试验证方法及常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-network-policy/featured.jpg"/></item><item><title>Helm 工程化实践：从 Chart 设计到多环境管理</title><link>https://socake.github.io/posts/helm-engineering-practice/</link><pubDate>Sat, 14 Jun 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/helm-engineering-practice/</guid><description>基于生产踩坑经验，系统梳理 Helm Chart 结构设计、_helpers.tpl 复用技巧、多环境 values 管理策略、私有 Harbor 仓库推送流程，以及 &amp;ndash;atomic 升级与回滚的正确姿势。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/helm-engineering-practice/featured.jpg"/></item><item><title>Karpenter 深度解析：下一代 K8s 节点自动扩缩</title><link>https://socake.github.io/posts/karpenter-deep-dive/</link><pubDate>Wed, 11 Jun 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karpenter-deep-dive/</guid><description>从 Cluster Autoscaler 迁移到 Karpenter 之后，集群扩容速度和节点利用率都有明显提升。本文详细拆解 Karpenter 的核心机制、关键配置项，以及在多套生产集群运行中踩过的坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karpenter-deep-dive/featured.jpg"/></item><item><title>Istio Service Mesh 落地实战：从 Sidecar 注入到灰度发布</title><link>https://socake.github.io/posts/istio-service-mesh-practice/</link><pubDate>Fri, 06 Jun 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-service-mesh-practice/</guid><description>记录 Istio Service Mesh 从零落地的完整过程，包括 sidecar 注入原理、VirtualService 灰度发布流量切分、DestinationRule 熔断与负载均衡配置、PeerAuthentication mTLS 加固，以及用 istioctl analyze 排查常见问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-service-mesh-practice/featured.jpg"/></item><item><title>Loki 架构深度解析：从写入路径到 PB 级日志查询优化</title><link>https://socake.github.io/posts/loki-architecture-deep-dive/</link><pubDate>Thu, 05 Jun 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/loki-architecture-deep-dive/</guid><description>围绕 Loki 3.x 架构拆解写入、索引、查询三条链路，给出 schema_config、compactor、bloom、TSDB 的可直接复用配置，并复盘两次线上事故带来的调参经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/loki-architecture-deep-dive/featured.jpg"/></item><item><title>GitOps 落地实战：ArgoCD + Kustomize 多环境管理</title><link>https://socake.github.io/posts/gitops-argocd/</link><pubDate>Tue, 03 Jun 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/gitops-argocd/</guid><description>GitOps 不只是「把配置放 Git 里」，真正落地需要解决 overlay 结构设计、ApplicationSet 管理多集群、image updater 自动化，以及 sync wave、resource hook 这些细节。这篇文章记录我们团队从传统 CI/CD 迁移到 GitOps 的实际过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/gitops-argocd/featured.jpg"/></item><item><title>ArgoCD 高级模式：ApplicationSet、Sync Waves 与 GitOps 企业级实践</title><link>https://socake.github.io/posts/argocd-advanced-patterns/</link><pubDate>Tue, 27 May 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argocd-advanced-patterns/</guid><description>从 ApplicationSet 的四种 Generator 到 Sync Waves 控制数据库迁移顺序，再到 Image Updater 打通 ECR 自动触发 GitOps 流程，这篇文章覆盖 ArgoCD 在企业级多集群环境下的高级用法和常见陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argocd-advanced-patterns/featured.jpg"/></item><item><title>多集群 Kubernetes 运维：跨集群管理与统一可观测</title><link>https://socake.github.io/posts/multi-cluster-k8s-management/</link><pubDate>Wed, 21 May 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multi-cluster-k8s-management/</guid><description>从单集群到多集群，运维复杂度不是线性增加，而是指数级。这篇文章总结了我们管理跨地域、跨环境多套 K8s 集群的实际经验：如何用 ArgoCD ApplicationSet 统一部署、如何用 Thanos 聚合多集群指标、以及一次真实的跨集群迁移过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multi-cluster-k8s-management/featured.jpg"/></item><item><title>业务上云实战：传统应用容器化迁移的踩坑与经验</title><link>https://socake.github.io/posts/kubernetes-migration-practice/</link><pubDate>Mon, 19 May 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-migration-practice/</guid><description>把一批跑在虚拟机上的 Java 应用迁移到 Kubernetes，踩过的坑比想象中多。本文记录整个迁移过程的关键决策和教训。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-migration-practice/featured.jpg"/></item><item><title>Kubernetes 集群升级策略：零停机升级的完整实践指南</title><link>https://socake.github.io/posts/kubernetes-upgrade-strategy/</link><pubDate>Wed, 14 May 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-upgrade-strategy/</guid><description>K8s 集群升级听起来简单，实际操作中坑很多：API 弃用导致的 Helm 失败、Admission Webhook 拦截升级流量、PDB 配置不当导致服务中断。这篇文章从真实的升级经验出发，给出一套可复用的零停机升级方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-upgrade-strategy/featured.jpg"/></item><item><title>K8s Gateway API：告别 Ingress，拥抱下一代流量路由</title><link>https://socake.github.io/posts/kubernetes-gateway-api/</link><pubDate>Mon, 12 May 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-gateway-api/</guid><description>Gateway API 已经 GA，是时候认真考虑从 Ingress 迁移了。本文梳理 Gateway API 的设计理念、实际配置示例和迁移注意事项。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-gateway-api/featured.jpg"/></item><item><title>Kubernetes 存储体系生产实践：PV/PVC/StorageClass 全解</title><link>https://socake.github.io/posts/kubernetes-storage-practice/</link><pubDate>Tue, 06 May 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-storage-practice/</guid><description>从存储基础概念到生产实战，覆盖 StorageClass 动态供给配置、AWS EBS 和 EFS CSI 驱动安装、StatefulSet 存储管理、PVC 在线扩容操作、跨 AZ 挂载失败排查，以及有状态服务数据迁移方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-storage-practice/featured.jpg"/></item><item><title>从 Nginx Ingress 迁移到 Traefik：为什么换，怎么换</title><link>https://socake.github.io/posts/traefik-vs-nginx-ingress/</link><pubDate>Sun, 27 Apr 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/traefik-vs-nginx-ingress/</guid><description>从实际痛点出发，讲清楚 Traefik 和 Nginx Ingress 的本质区别，给出可直接参考的迁移路径和配置示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/traefik-vs-nginx-ingress/featured.jpg"/></item><item><title>RabbitMQ 运维实战：集群部署、消费者可靠性与监控体系</title><link>https://socake.github.io/posts/rabbitmq-ops-practice/</link><pubDate>Tue, 22 Apr 2025 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/rabbitmq-ops-practice/</guid><description>系统梳理 RabbitMQ 运维核心技能：Quorum Queue 集群部署与镜像队列对比、生产配置调优、消费者 prefetch 与死信队列配置、基于 Management API 和 rabbitmq_exporter 的监控体系，以及消息堆积、脑裂等常见故障的处理方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/rabbitmq-ops-practice/featured.jpg"/></item><item><title>Celery 异步任务详解：任务队列、重试策略与分布式部署</title><link>https://socake.github.io/posts/celery-async-tasks/</link><pubDate>Tue, 22 Apr 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/celery-async-tasks/</guid><description>从 Celery 架构到 K8s 部署，覆盖任务定义、重试策略、队列路由、Beat 定时任务和 Flower 监控，附完整的生产部署配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/celery-async-tasks/featured.jpg"/></item><item><title>ETCD 运维实战：部署、备份恢复与 K8s 集群数据管理</title><link>https://socake.github.io/posts/etcd-ops-practice/</link><pubDate>Sun, 13 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/etcd-ops-practice/</guid><description>ETCD 是 Kubernetes 的命脉，所有集群状态都存储在这里。本文从实际运维角度梳理部署、备份、恢复和配置动态更新的完整操作链路，包含多个踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/etcd-ops-practice/featured.jpg"/></item><item><title>自研 Kubernetes Admission Webhook 开发实战：从零到生产</title><link>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</link><pubDate>Sat, 12 Apr 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-admission-webhook-dev/</guid><description>Kubernetes 的 admission 体系是一个强大但脆弱的扩展点。webhook 挂了能让集群所有 Pod 创建卡死。写一个能上生产的 webhook 不难，但要让它在面对各种怪异请求、证书轮换、集群升级、大流量突发时都不挂，就是另一回事了。这是一份从零到生产的工程笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-admission-webhook-dev/featured.jpg"/></item><item><title>数据库运维实践：MySQL 高可用与 PostgreSQL 调优经验</title><link>https://socake.github.io/posts/database-ops-practice/</link><pubDate>Tue, 08 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/database-ops-practice/</guid><description>数据库运维不复杂，但细节多、出问题代价大。本文整理了 MySQL 主从复制、慢查询分析、PostgreSQL 连接池这几个高频话题的实战经验，以及一些日常运维 SQL 备忘。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/database-ops-practice/featured.jpg"/></item><item><title>Kafka 运维实战：消息堆积排查、分区再平衡与监控体系</title><link>https://socake.github.io/posts/kafka-ops-practice/</link><pubDate>Mon, 07 Apr 2025 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kafka-ops-practice/</guid><description>系统梳理 Kafka 运维核心技能：消费者延迟监控告警、消息堆积根因分析、分区扩容规划、Rebalance 风暴处理，以及 KEDA 基于 lag 自动扩缩的配置实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kafka-ops-practice/featured.jpg"/></item><item><title>Cluster API 实战：用声明式的方式管理 Kubernetes 集群的生命周期</title><link>https://socake.github.io/posts/cluster-api-infrastructure/</link><pubDate>Sat, 05 Apr 2025 14:15:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cluster-api-infrastructure/</guid><description>用 Terraform 建集群是起手式，但集群一旦多起来 Terraform 的代码量和状态管理开始爆炸。Cluster API 把&amp;rsquo;集群&amp;rsquo;本身做成了 Kubernetes CRD——你在 Management Cluster 里 kubectl apply 一个 Cluster 对象，就能得到一个新集群。这是 Kubernetes 治理 Kubernetes 的一种优雅解法。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cluster-api-infrastructure/featured.jpg"/></item><item><title>MongoDB 运维入门：部署、备份与生产性能调优</title><link>https://socake.github.io/posts/mongodb-ops-practice/</link><pubDate>Mon, 31 Mar 2025 11:41:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mongodb-ops-practice/</guid><description>MongoDB 运维从选型到调优：何时选 MongoDB、Replica Set 三节点部署、索引设计、mongodump 备份，以及 wiredTiger、连接池、大文档等生产踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mongodb-ops-practice/featured.jpg"/></item><item><title>KubeVirt 生产实战：在 Kubernetes 上跑虚拟机的完整路线</title><link>https://socake.github.io/posts/kubevirt-vm-on-kubernetes/</link><pubDate>Sat, 29 Mar 2025 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubevirt-vm-on-kubernetes/</guid><description>Broadcom 吃掉 VMware 之后，VMware 替代方案成了所有基础设施团队的议题。KubeVirt 1.8 已经是个相当成熟的选择，能在 Kubernetes 里跑真正的 VM——不是轻量容器、不是 microVM，是完整的 Windows/Linux VM。这是一年多的实战笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubevirt-vm-on-kubernetes/featured.jpg"/></item><item><title>Alertmanager Webhook 开发：自定义告警处理与 API 集成</title><link>https://socake.github.io/posts/alertmanager-webhook-api/</link><pubDate>Tue, 25 Mar 2025 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/alertmanager-webhook-api/</guid><description>Alertmanager 内置的通知渠道不支持钉钉、飞书等国内工具，Webhook 是扩展告警通知的标准方式。本文用 Python Flask 实现完整的 Webhook 接收器，涵盖消息格式化、降噪去重、Alertmanager API 集成和 K8s 部署。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/alertmanager-webhook-api/featured.jpg"/></item><item><title>Descheduler 深度实战：Kubernetes 自动再平衡的正确打开方式</title><link>https://socake.github.io/posts/descheduler-workload-rebalance/</link><pubDate>Sat, 22 Mar 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/descheduler-workload-rebalance/</guid><description>kube-scheduler 只在 Pod 创建那一刻做决策，之后集群状态变了它就不管了。几个月下来，你的集群会变成 hot node + cold node 混杂、同一个 Deployment 的 Pod 全挤在一个 node、failure-domain 完全失衡。Descheduler 就是把调度决策后置、周期性重新评估的那只手。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/descheduler-workload-rebalance/featured.jpg"/></item><item><title>Alertmanager 完全指南：路由、抑制、静默与多渠道通知</title><link>https://socake.github.io/posts/alertmanager-routing-config/</link><pubDate>Sat, 22 Mar 2025 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/alertmanager-routing-config/</guid><description>告警太多和告警太少一样有害。Alertmanager 的路由、抑制、分组机制是控制告警噪声的核心手段，本文从一个真实的多环境告警体系出发，讲清楚每个配置的意图和陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/alertmanager-routing-config/featured.jpg"/></item><item><title>Grafana API 自动化：用代码管理 Dashboard、数据源和告警</title><link>https://socake.github.io/posts/grafana-api-automation/</link><pubDate>Tue, 18 Mar 2025 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grafana-api-automation/</guid><description>手动点 UI 管理 Grafana Dashboard 在多环境场景下是噩梦。用 API 把 Dashboard 代码化，实现版本控制和环境同步，才是正确姿势。本文提供完整的 Python 工具脚本和实战踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grafana-api-automation/featured.jpg"/></item><item><title>PostgreSQL 运维实战：配置调优、连接池、慢查询与高可用</title><link>https://socake.github.io/posts/postgresql-ops-practice/</link><pubDate>Tue, 18 Mar 2025 10:15:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/postgresql-ops-practice/</guid><description>系统梳理 PostgreSQL 运维核心技能：从 shared_buffers、WAL 参数调优，到 PgBouncer 事务模式配置；从 pg_stat_statements 慢查询分析到 PITR 时间点恢复；以及主从流复制、膨胀表清理和 Prometheus 监控指标的完整实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/postgresql-ops-practice/featured.jpg"/></item><item><title>Kueue 批处理调度实战：让 Kubernetes 真正承担 AI/HPC 工作负载</title><link>https://socake.github.io/posts/kueue-batch-workload/</link><pubDate>Sat, 15 Mar 2025 09:40:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kueue-batch-workload/</guid><description>把 AI 训练任务塞进 Kubernetes，第一天你会发现原生调度器完全不够用：没有队列、没有 quota、没有 gang scheduling、没有公平共享、preemption 语义一塌糊涂。Kueue 是 sig-scheduling 官方给出的答案，它比 Volcano 更贴近 Kubernetes 原生、比自研 controller 更成熟。这是一份真实的生产笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kueue-batch-workload/featured.jpg"/></item><item><title>Prometheus 服务发现深度解析：kubernetes_sd_configs 实战</title><link>https://socake.github.io/posts/prometheus-service-discovery/</link><pubDate>Sat, 15 Mar 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-service-discovery/</guid><description>在 K8s 环境里手动维护 Prometheus scrape targets 是不现实的，kubernetes_sd_configs 配合 relabel_configs 是解决这个问题的核心机制。本文从原理到实践，把这套体系讲透。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-service-discovery/featured.jpg"/></item><item><title>vcluster 虚拟集群实战：比 namespace 强一百倍的多租户方案</title><link>https://socake.github.io/posts/vcluster-virtual-cluster/</link><pubDate>Sat, 08 Mar 2025 15:10:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vcluster-virtual-cluster/</guid><description>namespace 不是隔离边界，它只是一层命名约定。ClusterRole、CRD、webhook、LimitRange 全都穿透 namespace。真正的多租户需要每个租户有自己的 kube-apiserver。vcluster 让这件事便宜到几乎免费——一个 namespace 里起一个完整的 Kubernetes 控制平面。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vcluster-virtual-cluster/featured.jpg"/></item><item><title>Elastic Agent + Fleet：下一代统一日志采集管理实践</title><link>https://socake.github.io/posts/elastic-agent-fleet/</link><pubDate>Thu, 06 Mar 2025 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elastic-agent-fleet/</guid><description>Filebeat + Metricbeat + Auditbeat 三个 Agent 各管一摊，配置分散难以维护。Elastic Agent 将它们统一为一个 All-in-One Agent，配合 Fleet 实现中央化管理。本文记录从部署到踩坑的完整实践过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elastic-agent-fleet/featured.jpg"/></item><item><title>EFK 日志系统实战：Fluent Bit + Fluentd + Elasticsearch 完整部署</title><link>https://socake.github.io/posts/efk-logging-practice/</link><pubDate>Wed, 05 Mar 2025 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/efk-logging-practice/</guid><description>讲清楚为什么要 Fluent Bit + Fluentd 两层架构，给出可直接参考的完整 ConfigMap 配置和 ES 索引模板设计。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/efk-logging-practice/featured.jpg"/></item><item><title>Zookeeper 运维实战：集群部署、调优与故障排查</title><link>https://socake.github.io/posts/zookeeper-ops-practice/</link><pubDate>Wed, 05 Mar 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/zookeeper-ops-practice/</guid><description>系统梳理 Zookeeper 生产运维核心技能：ZNode 类型与 Watcher 机制、ZAB 选举算法、3/5 节点集群部署配置、JVM 与 zoo.cfg 调优、四字命令实战诊断、常见故障处理，以及与 Kafka KRaft 模式的关系和云原生场景下的定位。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/zookeeper-ops-practice/featured.jpg"/></item><item><title>Karmada 多集群联邦实战：PropagationPolicy、OverridePolicy 与 FailOver 的真实用法</title><link>https://socake.github.io/posts/karmada-multi-cluster/</link><pubDate>Sun, 02 Mar 2025 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karmada-multi-cluster/</guid><description>如果你有 2 个以上 Kubernetes 集群，跨集群发同一个应用这件事迟早成为你的日常。Karmada 是 CNCF 孵化项目里做多集群联邦最完整的一个，但它的 CRD 设计比较克制，生产要用得好，得理清资源分发、差异覆盖、调度和 failover 四层语义。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karmada-multi-cluster/featured.jpg"/></item><item><title>Kubernetes 日志采集方案选型：从技术对比到生产落地</title><link>https://socake.github.io/posts/k8s-logging-solution/</link><pubDate>Tue, 25 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-logging-solution/</guid><description>记录我们团队从无到有建立 Kubernetes 日志采集系统的完整历程，最终选择 Fluent Bit + Fluentd + Elasticsearch 方案的技术依据，以及生产环境踩过的那些坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-logging-solution/featured.jpg"/></item><item><title>ExternalDNS 多云 DNS 同步实战：从 Route53 到 Cloudflare 再到阿里云 DNS</title><link>https://socake.github.io/posts/external-dns-multi-provider/</link><pubDate>Sat, 22 Feb 2025 09:45:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/external-dns-multi-provider/</guid><description>手工在 Cloudflare 控制台点 DNS 记录这件事，随着集群和业务增长最终必然崩溃。ExternalDNS 就是把 Kubernetes 资源当 source-of-truth、DNS provider 当执行器的一个 controller。但真要用好，你得理解 txtOwnerId、policy、provider 各自的限制以及跨集群共享 zone 的几个坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/external-dns-multi-provider/featured.jpg"/></item><item><title>Secret 管理实战：HashiCorp Vault + External Secrets Operator</title><link>https://socake.github.io/posts/vault-external-secrets/</link><pubDate>Thu, 20 Feb 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vault-external-secrets/</guid><description>base64 不是加密。本文从 Secret 泄露风险说起，完整介绍 Vault 核心概念、K8s 部署方式、ESO 集成配置，以及动态数据库凭证的自动轮换实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vault-external-secrets/featured.jpg"/></item><item><title>Consul 服务注册与发现：从入门到生产级健康检查</title><link>https://socake.github.io/posts/consul-service-discovery/</link><pubDate>Tue, 18 Feb 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/consul-service-discovery/</guid><description>微服务时代，动态 IP 和服务健康状态管理是绕不过去的问题。Consul 提供了一套完整的服务发现解决方案，本文从实操角度梳理其核心用法和生产踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/consul-service-discovery/featured.jpg"/></item><item><title>Harbor 镜像仓库生产运维：高可用、安全扫描与 CI/CD 集成</title><link>https://socake.github.io/posts/harbor-registry-ops/</link><pubDate>Tue, 18 Feb 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/harbor-registry-ops/</guid><description>从 Harbor 架构原理出发，系统梳理生产环境中高可用部署方案、镜像安全扫描策略、跨区域复制配置、权限体系设计，以及与 Jenkins/GitLab CI 的集成实践，附故障排查手册与 Prometheus 监控配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/harbor-registry-ops/featured.jpg"/></item><item><title>cert-manager 生产级实战：从 Let's Encrypt 到企业内网 PKI 的完整路线</title><link>https://socake.github.io/posts/cert-manager-production/</link><pubDate>Sat, 15 Feb 2025 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cert-manager-production/</guid><description>cert-manager 几乎是每个 Kubernetes 集群的标配，但真正跑到生产的团队都会遇到：Let&amp;rsquo;s Encrypt 限流被打爆、通配符证书续期失败、内部服务想要私有 CA、Istio / Gateway API 的证书怎么发。这篇把一年里我在 5 个集群上做 cert-manager 运维踩过的坑写成一份实操手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cert-manager-production/featured.jpg"/></item><item><title>Ansible 批量运维自动化：从临时命令到 Role 工程化</title><link>https://socake.github.io/posts/ansible-ops-automation/</link><pubDate>Wed, 12 Feb 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ansible-ops-automation/</guid><description>Ansible 无 Agent、SSH 推送、幂等性三大特性让它成为 Linux 批量运维的利器。本文从入门用法到 Role 工程化实践，梳理了日常运维中高频场景的完整操作思路和踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ansible-ops-automation/featured.jpg"/></item><item><title>CI/CD 流水线设计：从代码提交到自动部署的工程化实践</title><link>https://socake.github.io/posts/cicd-pipeline-design/</link><pubDate>Sun, 09 Feb 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cicd-pipeline-design/</guid><description>一条好的 CI/CD 流水线不只是「能跑」，而是快、可靠、边界清晰。本文从构建缓存到 GitOps 分工，从多分支策略到故障排查，整理了在实际项目中反复用到的工程化实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cicd-pipeline-design/featured.jpg"/></item><item><title>KEDA 事件驱动弹性伸缩实战：从 HPA 的尽头到真正按业务信号扩缩</title><link>https://socake.github.io/posts/keda-event-driven-autoscaling/</link><pubDate>Sat, 08 Feb 2025 10:12:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/keda-event-driven-autoscaling/</guid><description>HPA 只能看 CPU/内存，但生产环境真正的扩缩信号往往是 Kafka lag、RabbitMQ 队列深度、Prometheus 自定义指标、甚至 cron。本文把 KEDA 的架构、核心 CRD、常见 scaler 的坑和运维动作写成一份资深工程师的备忘录，不讲理论，只讲什么样的配置能在凌晨 3 点把你从告警里救出来。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/keda-event-driven-autoscaling/featured.jpg"/></item><item><title>GitLab CI/CD + Kubernetes：从代码提交到生产部署全流程</title><link>https://socake.github.io/posts/gitlab-ci-kubernetes/</link><pubDate>Sat, 01 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/gitlab-ci-kubernetes/</guid><description>从 GitLab Runner 的 Kubernetes executor 配置，到 kaniko 替代 DinD 的镜像构建方案，再到通过更新 GitOps 仓库完成生产部署——记录一套在真实 AWS EKS 环境跑通的 CI/CD 全流程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/gitlab-ci-kubernetes/featured.jpg"/></item><item><title>Jenkins + Kubernetes：动态 Agent 构建与流水线最佳实践</title><link>https://socake.github.io/posts/jenkins-kubernetes-cicd/</link><pubDate>Sun, 26 Jan 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/jenkins-kubernetes-cicd/</guid><description>静态 Jenkins Slave 的资源浪费和配置混乱问题，在 Kubernetes 动态 Pod Agent 模式下得到根本解决。本文记录在真实生产环境中把 Jenkins 迁移到 K8s 的完整过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/jenkins-kubernetes-cicd/featured.jpg"/></item><item><title>Kubernetes RBAC 安全加固实战：最小权限到 NetworkPolicy</title><link>https://socake.github.io/posts/kubernetes-rbac-security/</link><pubDate>Fri, 24 Jan 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-rbac-security/</guid><description>从真实安全事件出发，系统讲解 Kubernetes RBAC 最小权限设计、ClusterRole 与 Role 的适用场景、审计日志分析 RBAC 问题的方法，以及 NetworkPolicy 实现命名空间和 Pod 级别的网络隔离。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-rbac-security/featured.jpg"/></item><item><title>Doris 与 StarRocks：一次严肃的生产选型笔记</title><link>https://socake.github.io/posts/columnar-warehouse-doris-starrocks/</link><pubDate>Wed, 22 Jan 2025 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/columnar-warehouse-doris-starrocks/</guid><description>Doris 和 StarRocks 同源、相似、又各有偏好。选哪个不是&amp;quot;谁更好&amp;quot;的问题，而是&amp;quot;谁更适合我们的场景&amp;quot;的问题。这篇文章是我在两套 OLAP 集群（一套 Doris、一套 StarRocks）上运维一年多后写的深度对比，希望能帮你跳过几个月的调研和踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/columnar-warehouse-doris-starrocks/featured.jpg"/></item><item><title>Kubernetes YAML 工程化：常用资源模板与生产最佳实践</title><link>https://socake.github.io/posts/kubernetes-yaml-patterns/</link><pubDate>Sun, 19 Jan 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-yaml-patterns/</guid><description>写好 Kubernetes YAML 不只是语法问题，更多是工程经验的沉淀。本文梳理了生产环境中常见的 YAML 反模式，并给出各类资源的完整可用模板。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-yaml-patterns/featured.jpg"/></item><item><title>Kubernetes 资源管理实战——QoS、ResourceQuota、VPA 体系化实践</title><link>https://socake.github.io/posts/kubernetes-resource-management/</link><pubDate>Thu, 16 Jan 2025 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-resource-management/</guid><description>我在生产中见过太多因为资源配置不当导致的事故：不设 limits 的服务把节点内存吃光导致 OOM 驱逐、requests 设得过高导致 Pod 调度不上去、HPA 配置错误导致扩缩失灵。这篇文章把 K8s 资源管理体系从头到尾捋一遍，让你建立完整的资源治理思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-resource-management/featured.jpg"/></item><item><title>Kubernetes 网络深度解析——CNI、kube-proxy、NetworkPolicy 完全指南</title><link>https://socake.github.io/posts/kubernetes-networking-deep-dive/</link><pubDate>Fri, 10 Jan 2025 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-networking-deep-dive/</guid><description>K8s 网络是很多工程师的知识盲区，平时不出问题就忽略，一出问题就完全不知道从哪下手。我在多次生产网络故障的排查中，深刻理解了 K8s 网络的每一层。这篇文章从 Pod 网络模型讲到 NetworkPolicy 实战，帮你建立完整的 K8s 网络知识体系。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-networking-deep-dive/featured.jpg"/></item><item><title>数据库变更管理：从 gh-ost 到 Flyway 的完整工程化路径</title><link>https://socake.github.io/posts/database-change-management/</link><pubDate>Wed, 08 Jan 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/database-change-management/</guid><description>很多团队把&amp;quot;数据库变更管理&amp;quot;当成几条 SQL + 一个工单，实际上这是工程化程度最低的一块地方。一边是开发随手写 ALTER 把线上锁住，一边是 DBA 手动盯着进度条祈祷不出事。这篇文章把我总结的 DB 变更管理最佳实践分成工具、流程、组织三个层面讲，每一层都有可以直接落地的方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/database-change-management/featured.jpg"/></item><item><title>Vitess 实战：把 MySQL 水平扩展到 PB 级的路</title><link>https://socake.github.io/posts/vitess-mysql-sharding/</link><pubDate>Tue, 24 Dec 2024 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vitess-mysql-sharding/</guid><description>当 MySQL 单库扛不住、又不想切 TiDB 或 PG 的时候，Vitess 就成了最后一个选项。它保留了 MySQL 兼容性，用 vtgate 做分片代理，用 VReplication 做在线 resharding。听起来很美，但 Vitess 的学习曲线陡得惊人。这篇文章是我调研 Vitess 几个月、在 staging 跑通一个 4 shard 集群后的全面笔记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vitess-mysql-sharding/featured.jpg"/></item><item><title>运维工程师的技术成长：从执行者到架构者的路径规划</title><link>https://socake.github.io/posts/devops-career-growth/</link><pubDate>Sun, 22 Dec 2024 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-career-growth/</guid><description>运维工程师的成长不是工具的堆砌，而是认知层次的跃迁。这篇文章记录了我对这条路的观察和思考——哪些时机会让人真正进阶，哪些惯性思维会让人原地踏步。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-career-growth/featured.jpg"/></item><item><title>故障排查方法论：从现象到根因</title><link>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/</link><pubDate>Tue, 17 Dec 2024 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/</guid><description>好的排查不靠直觉，靠方法。这篇文章总结了我在多次生产故障中提炼出的排查框架：从时间线构建到假设优先级，再到认知陷阱的识别与规避。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/featured.jpg"/></item><item><title>Rook-Ceph on Kubernetes 运维实战：从部署到故障恢复</title><link>https://socake.github.io/posts/ceph-rook-kubernetes/</link><pubDate>Fri, 13 Dec 2024 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ceph-rook-kubernetes/</guid><description>当你需要在 Kubernetes 上提供 block、file、object 三种存储时，Rook-Ceph 是几乎没有替代品的方案。但它的复杂度也是所有 K8s 存储方案里最高的。这篇文章是我在一套裸金属 Rook-Ceph 生产集群上两年运维经验的整理，包括几次把集群从悬崖边拉回来的复盘。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ceph-rook-kubernetes/featured.jpg"/></item><item><title>SRE 实践心得：从运维到 SRE 的思维转变</title><link>https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/</link><pubDate>Wed, 11 Dec 2024 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/</guid><description>SRE 不是换了个头衔的运维，而是一套用软件工程思维解决可靠性问题的方法论。这篇文章记录了我在实践过程中最有感触的几个转变。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/featured.jpg"/></item><item><title>可观测性建设：从 Prometheus 采集到 Grafana 告警联动</title><link>https://socake.github.io/posts/prometheus-grafana/</link><pubDate>Fri, 06 Dec 2024 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-grafana/</guid><description>可观测性不是装几个监控工具，而是让系统在出问题时能快速定位根因。这篇文章从采集架构到 PromQL 到告警路由，覆盖我们在生产环境中实际遇到的 cardinality 爆炸、告警噪音等问题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-grafana/featured.jpg"/></item><item><title>MinIO 分布式对象存储生产实践：从 Erasure Code 到多租户</title><link>https://socake.github.io/posts/minio-distributed-storage/</link><pubDate>Mon, 02 Dec 2024 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/minio-distributed-storage/</guid><description>自建对象存储曾经是件麻烦事，直到 MinIO 把 S3 API + Erasure Code + 简单部署这件事做到了极致。这篇文章是我在三套生产 MinIO 集群上的运维笔记，覆盖从硬件选型到故障救火的全链路。同时会聊一下 2024 年 MinIO 商业化策略调整后，社区版用户应该怎么办。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/minio-distributed-storage/featured.jpg"/></item><item><title>Python 对接 Prometheus：查询监控数据与告警状态自动化</title><link>https://socake.github.io/posts/python-prometheus-monitoring/</link><pubDate>Mon, 25 Nov 2024 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-prometheus-monitoring/</guid><description>用 Python 直接调 Prometheus HTTP API，实现服务存活巡检、可用率日报生成，最后接入钉钉每日自动推送集群健康摘要。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-prometheus-monitoring/featured.jpg"/></item><item><title>Python 异步编程实战：asyncio 在 AI 应用中的使用</title><link>https://socake.github.io/posts/python-async-programming/</link><pubDate>Fri, 22 Nov 2024 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-async-programming/</guid><description>AI 应用天然是 I/O 密集型的：等 LLM 响应、等向量数据库检索、等多个工具调用返回。同步写法在这里是性能杀手。这篇文章从 event loop 原理讲到实际的 AI 应用模式，重点是 asyncio.gather 并发调用、SSE 流式输出处理和常见陷阱排查。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-async-programming/featured.jpg"/></item><item><title>MongoDB 分片集群实战：从 shard key 设计到 chunk 均衡的全链路</title><link>https://socake.github.io/posts/mongodb-sharding-practice/</link><pubDate>Wed, 20 Nov 2024 15:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mongodb-sharding-practice/</guid><description>很多团队把 MongoDB 分片当成&amp;quot;设个 shard key 就完事&amp;quot;，结果上线半年后发现 80% 数据在一个 shard 上、balancer 每天搬几十 GB 却怎么都追不上、某个 collection 出现 jumbo chunk 无法分裂。这篇文章把我在几套 MongoDB 分片集群上的经验整理出来，希望能让你在分片之前少走一些弯路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mongodb-sharding-practice/featured.jpg"/></item><item><title>Python 自动化运维：从脚本到完整工具的工程化实践</title><link>https://socake.github.io/posts/python-devops-automation/</link><pubDate>Tue, 12 Nov 2024 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-devops-automation/</guid><description>系统梳理 Python 运维自动化的工程化方法：boto3 操作 AWS 资源、Kubernetes Python SDK 使用、Click/Typer CLI 框架选型、数据库批量运维脚本、钉钉 Webhook 集成，以及类型注解与错误处理的实践经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-devops-automation/featured.jpg"/></item><item><title>Redis Cluster 扩缩容与数据迁移实战：从 SETSLOT 到 Atomic Slot Migration</title><link>https://socake.github.io/posts/redis-cluster-migration/</link><pubDate>Fri, 08 Nov 2024 10:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/redis-cluster-migration/</guid><description>很多团队把 Redis Cluster 当成&amp;quot;开箱即用&amp;quot;的分布式 Redis，直到要做扩缩容或数据迁移时才发现：SETSLOT 协议里有十几种状态，迁移过程中客户端重定向要么不生效要么风暴，migrate 卡住没法断，big key 直接把迁移拖垮。这篇文章把我在几套千亿级 Cluster 上做过的扩缩容、迁移、救火全过一遍。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/redis-cluster-migration/featured.jpg"/></item><item><title>Redis 运维实践：持久化配置、集群模式与生产监控</title><link>https://socake.github.io/posts/redis-ops-practice/</link><pubDate>Wed, 06 Nov 2024 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/redis-ops-practice/</guid><description>Redis 运维看起来简单，但真到了生产出了问题才知道水有多深。本文整理了持久化、集群、监控、故障处理等核心运维主题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/redis-ops-practice/featured.jpg"/></item><item><title>MySQL 备份与恢复实战：从 mysqldump 到 XtraBackup 的完整方案</title><link>https://socake.github.io/posts/mysql-backup-restore/</link><pubDate>Fri, 01 Nov 2024 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mysql-backup-restore/</guid><description>从 mysqldump 到 XtraBackup，从全量备份到基于 binlog 的时间点恢复，这篇文章覆盖了 MySQL 备份恢复的完整知识体系，包括生产环境的踩坑和自动化验证方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mysql-backup-restore/featured.jpg"/></item><item><title>PostgreSQL 膨胀治理：把 autovacuum 调到你真正需要的样子</title><link>https://socake.github.io/posts/postgresql-vacuum-bloat-tuning/</link><pubDate>Tue, 29 Oct 2024 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/postgresql-vacuum-bloat-tuning/</guid><description>大部分 PostgreSQL DBA 对 autovacuum 的理解停留在&amp;quot;它会自己跑&amp;quot;，但一旦膨胀起来才发现：默认参数对现代硬件完全不够用，几十个 autovacuum_* 参数各管一摊，出了问题根本不知道从哪儿看。这篇文章把我在几套 PG 集群上治理膨胀的经验整理出来，从 MVCC 原理讲到参数调优、从监控到应急处置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/postgresql-vacuum-bloat-tuning/featured.jpg"/></item><item><title>Nginx 运维完全指南：反向代理、负载均衡、HTTPS 与限流</title><link>https://socake.github.io/posts/nginx-ops-complete/</link><pubDate>Thu, 24 Oct 2024 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/nginx-ops-complete/</guid><description>Nginx 知道怎么装，但真的会用吗？本文从配置结构说起，完整覆盖反向代理、负载均衡策略、Let&amp;rsquo;s Encrypt 证书、限流配置、日志分析和性能调优，附常见 502/SSL 故障排查。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/nginx-ops-complete/featured.jpg"/></item><item><title>Kubernetes 从零开始：工程师视角的入门指南</title><link>https://socake.github.io/posts/kubernetes-beginner-guide/</link><pubDate>Sun, 20 Oct 2024 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-beginner-guide/</guid><description>Docker Compose 能运行多个容器，为什么还需要 Kubernetes？本文从这个问题出发，用类比的方式讲清楚 Pod/Deployment/Service/Ingress 等核心概念，给出最常用的 kubectl 命令和完整的入门部署示例。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-beginner-guide/featured.jpg"/></item><item><title>MySQL 深度调优：从 Buffer Pool 到锁等待的生产手册</title><link>https://socake.github.io/posts/mysql-performance-tuning-deep-dive/</link><pubDate>Fri, 18 Oct 2024 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mysql-performance-tuning-deep-dive/</guid><description>你有没有过这种体验：按网上教程把 innodb_buffer_pool_size 调到 75%、关了 query cache、打开了 innodb_file_per_table，然后告诉自己&amp;quot;MySQL 调优就这样了&amp;quot;？真正的调优是一个持续观察、假设、验证、回滚的过程。这篇文章把我在过去几年维护的十几套 MySQL 实例上积累的调参经验整理出来，每一条都能追到具体指标和业务效果。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mysql-performance-tuning-deep-dive/featured.jpg"/></item><item><title>Git 工作流实战：分支策略与团队协作规范</title><link>https://socake.github.io/posts/git-workflow-practice/</link><pubDate>Thu, 10 Oct 2024 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/git-workflow-practice/</guid><description>Git 用了五年，最大的感悟是：工作流问题本质上是团队协作问题，不是工具问题。本文对比 Git Flow / GitHub Flow / Trunk-Based 三种策略，覆盖分支命名、Commit Message、rebase 哲学、大型重构分支处理、冲突解决等高频话题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/git-workflow-practice/featured.jpg"/></item><item><title>TiDB 生产环境实战：从 Placement Rules 到 TiKV 调优的全链路经验</title><link>https://socake.github.io/posts/tidb-production-practice/</link><pubDate>Sat, 05 Oct 2024 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/tidb-production-practice/</guid><description>把 TiDB 当成&amp;quot;分布式 MySQL&amp;quot;跑起来并不难，真正难的是让 TiKV 在高并发写入下不抖动、让 PD 调度不误伤业务、让跨机房副本在 RPO=0 的前提下活下去。本文把过去两年我在几套 TiDB 集群上踩过的坑、调过的参数和定过的 SOP 都摊开来讲，不是教程，而是一份能直接照抄的作战手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/tidb-production-practice/featured.jpg"/></item><item><title>Shell 脚本实战：Bash 自动化运维从入门到工程化</title><link>https://socake.github.io/posts/shell-script-automation/</link><pubDate>Wed, 02 Oct 2024 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/shell-script-automation/</guid><description>Shell 脚本是 SRE 的第一生产力工具。本文从语法精要出发，覆盖批量操作、日志轮转、健康检查等常用运维模式，再到 getopts、trap 信号处理和脚本工程化思路，最后总结引号地狱、变量作用域等经典踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/shell-script-automation/featured.jpg"/></item><item><title>Docker Compose 本地开发工作流：多服务环境搭建最佳实践</title><link>https://socake.github.io/posts/docker-compose-dev-workflow/</link><pubDate>Fri, 27 Sep 2024 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/docker-compose-dev-workflow/</guid><description>用 Docker Compose 搭建包含数据库、缓存、消息队列的完整本地环境，配合 healthcheck 确保启动顺序、bind mount 实现热更新，还有 override 模式分离开发和生产配置。这篇文章覆盖所有关键细节和常见踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/docker-compose-dev-workflow/featured.jpg"/></item><item><title>Docker 最佳实践：从 Dockerfile 到生产部署</title><link>https://socake.github.io/posts/docker-best-practices/</link><pubDate>Sat, 21 Sep 2024 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/docker-best-practices/</guid><description>多阶段构建、.dockerignore 遗漏、非 root 运行、构建缓存优化，以及 entrypoint/cmd 信号处理这些在生产中实际踩过的问题，用具体的 Dockerfile 示例逐一拆解。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/docker-best-practices/featured.jpg"/></item><item><title>Linux 系统管理精要——DevOps 工程师必知的系统层知识</title><link>https://socake.github.io/posts/linux-system-admin-devops/</link><pubDate>Mon, 16 Sep 2024 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-system-admin-devops/</guid><description>做了多年 DevOps，我越来越觉得 Linux 系统层的知识是一切排障的基础。当 Kubernetes Pod 莫名被杀、Java 服务突然无响应、磁盘 IO 飙高导致整机卡顿——最终都要落到系统层来定位。这篇文章把我在生产中最常用的系统管理技能系统梳理一遍。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-system-admin-devops/featured.jpg"/></item><item><title>Linux 性能调优实战：CPU、内存、IO 瓶颈的系统排查方法</title><link>https://socake.github.io/posts/linux-performance-tuning/</link><pubDate>Sun, 08 Sep 2024 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-performance-tuning/</guid><description>从工具链选择到实战排查，梳理 Linux 性能调优的完整方法论：CPU 上下文切换与软中断分析、OOM 日志解读、IO 调度器选择、TCP TIME_WAIT 处理，以及容器环境下 cgroup 限制的特殊影响。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-performance-tuning/featured.jpg"/></item><item><title>关于我</title><link>https://socake.github.io/posts/authors/</link><pubDate>Sun, 08 Sep 2024 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/authors/</guid><description/><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/authors/featured.png"/></item><item><title>更新日志</title><link>https://socake.github.io/changelog/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/changelog/</guid><description/></item><item><title>我的数字书架</title><link>https://socake.github.io/books/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/books/</guid><description/></item><item><title>学习路线图</title><link>https://socake.github.io/roadmap/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/roadmap/</guid><description/></item><item><title>支持我们</title><link>https://socake.github.io/sponsor-diy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/sponsor-diy/</guid><description/></item></channel></rss>