<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>运维 on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/tags/%E8%BF%90%E7%BB%B4/</link><description>Recent content in 运维 on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Sat, 18 Apr 2026 14:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/tags/%E8%BF%90%E7%BB%B4/index.xml" rel="self" type="application/rss+xml"/><item><title>Nacos 一文通：从零基础到生产精通的配置中心与服务发现实战</title><link>https://socake.github.io/posts/nacos-config-service-discovery-guide/</link><pubDate>Sat, 18 Apr 2026 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/nacos-config-service-discovery-guide/</guid><description>Nacos 同时承担配置中心和服务注册发现两个核心职责，是 Spring Cloud Alibaba 生态的基石。本文系统梳理 Nacos 的数据模型、一致性协议、长轮询推送机制、临时实例健康检查、生产集群部署、多语言 SDK 接入、灰度发布、权限控制、常见故障排查（配置不生效/密码漂移/集群脑裂）以及云原生时代的定位，适合从入门到生产运维的完整参考。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/nacos-config-service-discovery-guide/featured.jpg"/></item><item><title>运维工程师的 AI 工具实践</title><link>https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/</link><pubDate>Fri, 03 Apr 2026 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/</guid><description>从写 Shell 脚本、解读错误信息到辅助故障排查，分享运维工程师真实使用 AI 工具的高效场景、无效场景和 Prompt 技巧，以及各工具的适合场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E8%BF%90%E7%BB%B4%E5%B7%A5%E7%A8%8B%E5%B8%88ai%E5%B7%A5%E5%85%B7%E5%AE%9E%E8%B7%B5/featured.jpg"/></item><item><title>Ollama 在 K8s 上跑大模型：本地 LLM 的运维实践</title><link>https://socake.github.io/posts/ollama-kubernetes-llm/</link><pubDate>Mon, 30 Mar 2026 09:08:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ollama-kubernetes-llm/</guid><description>在 Kubernetes 上部署 Ollama 运行本地大模型，从 GPU 调度到 CPU 推理降级，再到运维场景的实际集成，记录完整的踩坑与实践过程。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ollama-kubernetes-llm/featured.jpg"/></item><item><title>多模态大模型实践：图像理解与视觉分析</title><link>https://socake.github.io/posts/multimodal-llm-vision-practice/</link><pubDate>Mon, 09 Mar 2026 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multimodal-llm-vision-practice/</guid><description>覆盖主流多模态模型选型对比、图像理解API调用方式、OCR/文档理解/图表解析等实际场景，以及一个完整的运维场景实战：用多模态模型自动分析Grafana截图并生成告警摘要。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multimodal-llm-vision-practice/featured.jpg"/></item><item><title>MCP 协议实战：给 AI Agent 接上运维工具</title><link>https://socake.github.io/posts/mcp-protocol-devops/</link><pubDate>Fri, 27 Feb 2026 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mcp-protocol-devops/</guid><description>Model Context Protocol 让 AI 能够标准化地调用外部工具。本文用 Python 实现一个运维 MCP Server，接入 kubectl、Prometheus、Loki，让 AI 直接查集群状态。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mcp-protocol-devops/featured.jpg"/></item><item><title>大模型赋能运维：LLM 在故障排查和自动化中的实际应用</title><link>https://socake.github.io/posts/aiops-llm-devops/</link><pubDate>Sat, 31 Jan 2026 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aiops-llm-devops/</guid><description>LLM 不能替代运维工程师，但确实能把重复性、低价值的工作自动化掉。本文分享我在实际工作中用 Claude 落地的几个场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aiops-llm-devops/featured.jpg"/></item><item><title>Prometheus 进程监控：process-exporter 实战与告警配置</title><link>https://socake.github.io/posts/prometheus-process-monitoring/</link><pubDate>Thu, 18 Dec 2025 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-process-monitoring/</guid><description>K8s 有完善的 Pod 监控体系，但裸机和 VM 上运行的进程如何监控？本文介绍 process-exporter 的部署与配置实践，覆盖进程组匹配、核心指标、告警规则设计及实际踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-process-monitoring/featured.jpg"/></item><item><title>发版回滚 SOP</title><link>https://socake.github.io/docs/cicd/%E5%8F%91%E7%89%88%E5%9B%9E%E6%BB%9Asop/</link><pubDate>Tue, 09 Dec 2025 16:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/cicd/%E5%8F%91%E7%89%88%E5%9B%9E%E6%BB%9Asop/</guid><description>涵盖回滚判断标准、K8s/ArgoCD/配置各层回滚操作、数据库变更的前向修复 vs 回滚取舍，以及完整的值班人员操作 SOP 模板。</description></item><item><title>Helm 使用指南：从入门到生产实践</title><link>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/helm%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97/</guid><description>Helm 从入门到生产实践：Chart 结构、values 覆盖、模板语法、&amp;ndash;atomic/&amp;ndash;wait 等生产参数，以及常用 Chart 安装示例。</description></item><item><title>Kubernetes Ingress 配置实践</title><link>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/ingress%E9%85%8D%E7%BD%AE%E5%AE%9E%E8%B7%B5/</guid><description>从 Ingress 概念到生产实践：nginx/traefik/ALB 选型对比、TLS 自动签发、canary 灰度发布、限速超时等常用 annotations 详解。</description></item><item><title>Kubernetes 安全加固实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AE%89%E5%85%A8%E5%8A%A0%E5%9B%BA/</guid><description>K8s 安全加固从 Pod 到集群：SecurityContext 配置、网络策略隔离、Secret 安全管理、镜像漏洞扫描、RBAC 最小权限原则的落地实践。</description></item><item><title>Kubernetes 故障排查 SOP</title><link>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5sop/</guid><description>从现象到根因的 K8s 故障排查全流程：Pod 异常状态、Node NotReady、Service 不通、存储挂载失败等场景的系统化排查方法。</description></item><item><title>Kubernetes 集群升级实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</link><pubDate>Tue, 09 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E9%9B%86%E7%BE%A4%E5%8D%87%E7%BA%A7/</guid><description>K8s 集群升级全流程：从版本兼容性检查、etcd 备份、EKS 托管升级命令，到节点蓝绿替换、PDB 配置、pluto 工具检测废弃 API，再到常见升级问题处理。</description></item><item><title>Go 标准库速查：运维工程师常用</title><link>https://socake.github.io/docs/languages/go/go%E6%A0%87%E5%87%86%E5%BA%93%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E6%A0%87%E5%87%86%E5%BA%93%E9%80%9F%E6%9F%A5/</guid><description>不查文档快速写出对的代码——整理了运维场景最常用的 Go 标准库用法，每节都是可直接复制的代码片段</description></item><item><title>Go 并发编程：goroutine 与 channel 实践</title><link>https://socake.github.io/docs/languages/go/go%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B/</guid><description>用 Go 并发特性加速运维工具：批量检查服务状态、并发执行 SSH 命令、控制超时与取消，都在这篇文章里</description></item><item><title>Go 错误处理最佳实践</title><link>https://socake.github.io/docs/languages/go/go%E9%94%99%E8%AF%AF%E5%A4%84%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E9%94%99%E8%AF%AF%E5%A4%84%E7%90%86/</guid><description>在运维工具中正确处理错误：错误包装与解包、可重试判断、统一错误输出格式、带上下文的错误信息，避免常见的错误处理反模式</description></item><item><title>Go 语言基础速查（运维向）</title><link>https://socake.github.io/docs/languages/go/go%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</guid><description>用 Go 写运维工具前必须掌握的语言基础，聚焦运维场景常用特性，配合实用代码示例</description></item><item><title>Go 运维工具开发实战</title><link>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/go/go%E8%BF%90%E7%BB%B4%E5%B7%A5%E5%85%B7%E5%BC%80%E5%8F%91/</guid><description>从零写一个 Go 运维工具：cobra CLI 框架、执行 kubectl 命令、调用 K8s API、配置 zap 日志、viper 配置管理，完整可运行的代码示例</description></item><item><title>Kubernetes HPA/VPA 弹性伸缩配置</title><link>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-hpa%E5%BC%B9%E6%80%A7%E4%BC%B8%E7%BC%A9/</guid><description>从 HPA v2 到 KEDA 事件驱动伸缩，覆盖 CPU/内存/自定义指标配置、防抖参数调优、VPA 推荐器集成和生产级弹性伸缩最佳实践。</description></item><item><title>Kubernetes RBAC 权限管理实践</title><link>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-rbac%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 RBAC 核心概念到生产级多租户权限设计，涵盖 ServiceAccount 最小权限、kubectl auth can-i 排查和命名空间隔离实践。</description></item><item><title>Kubernetes 存储：PV/PVC/StorageClass 实践</title><link>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E5%AD%98%E5%82%A8pvc/</guid><description>从 PV/PVC 基础概念到生产级 CSI 配置，涵盖动态供给、StatefulSet 存储、AWS EBS/EFS、阿里云云盘/NAS 以及数据迁移实践。</description></item><item><title>Kubernetes 网络模型与 Service 详解</title><link>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E7%BD%91%E7%BB%9C%E4%B8%8Eservice/</guid><description>从 K8s 网络基础模型到生产级 Service 配置，覆盖 CNI 插件对比、kube-proxy 模式选择、DNS 解析规则和排查思路。</description></item><item><title>Kubernetes 资源管理：requests/limits/QoS/配额</title><link>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/k8s-%E8%B5%84%E6%BA%90%E7%AE%A1%E7%90%86/</guid><description>从 CPU throttling 到内存 OOMKill，从 QoS 分类到驱逐优先级，系统梳理 Kubernetes 资源管理机制与生产调优实践。</description></item><item><title>Linux 磁盘与文件系统管理</title><link>https://socake.github.io/docs/linux/linux%E7%A3%81%E7%9B%98%E4%B8%8E%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%A3%81%E7%9B%98%E4%B8%8E%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F/</guid><description>从 fdisk 分区到 LVM 扩容快照，从 ext4 vs xfs 对比到 fsck 故障恢复，以及 /proc 和 /sys 中与存储相关的关键路径速查。</description></item><item><title>Linux 进程管理与作业控制</title><link>https://socake.github.io/docs/linux/linux%E8%BF%9B%E7%A8%8B%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E8%BF%9B%E7%A8%8B%E7%AE%A1%E7%90%86/</guid><description>从 ps/pstree 进程查看到 kill/pkill 信号发送，从 nice/ionice 优先级调整到 screen/tmux 会话管理，结合 systemctl/journalctl 和 ulimit 资源控制。</description></item><item><title>Linux 网络命令速查</title><link>https://socake.github.io/docs/linux/linux%E7%BD%91%E7%BB%9C%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%BD%91%E7%BB%9C%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>系统整理 Linux 网络排查工具链，包含 ss 连接状态过滤、tcpdump 过滤语法、iptables NAT 配置、curl 响应时间分析及 DNS 工具使用方法。</description></item><item><title>Linux 系统性能排查手册</title><link>https://socake.github.io/docs/linux/linux%E7%B3%BB%E7%BB%9F%E6%80%A7%E8%83%BD%E6%8E%92%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%B3%BB%E7%BB%9F%E6%80%A7%E8%83%BD%E6%8E%92%E6%9F%A5/</guid><description>覆盖 top/htop/mpstat/vmstat/iostat/sar 等核心命令，结合 iowait/softirq/CPU 窃取等指标含义，提供完整排查流程和组合命令速查。</description></item><item><title>Linux 用户权限与安全管理</title><link>https://socake.github.io/docs/linux/linux%E7%94%A8%E6%88%B7%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/linux%E7%94%A8%E6%88%B7%E6%9D%83%E9%99%90%E7%AE%A1%E7%90%86/</guid><description>从 useradd/usermod 用户管理到 SUID/SGID 特殊权限，从 sudoers 配置到 fail2ban 防暴力破解，覆盖 Linux 系统安全加固的核心操作。</description></item><item><title>Python 操作 Kubernetes：kubernetes-client 实战</title><link>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E6%93%8D%E4%BD%9Ckubernetes/</guid><description>系统介绍 Python kubernetes-client 的核心用法，从集群认证到资源操作，最终构建一个完整的 K8s 巡检脚本</description></item><item><title>Python 基础速查（运维向）</title><link>https://socake.github.io/docs/languages/python/python%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E5%9F%BA%E7%A1%80%E9%80%9F%E6%9F%A5/</guid><description>运维工程师必备的 Python 基础知识速查，从变量类型到标准库，聚焦实际使用场景</description></item><item><title>Python 网络编程与 HTTP 请求</title><link>https://socake.github.io/docs/languages/python/python%E7%BD%91%E7%BB%9C%E4%B8%8Ehttp/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E7%BD%91%E7%BB%9C%E4%B8%8Ehttp/</guid><description>从 requests 基础到 httpx 异步，再到并发健康检查脚本，覆盖运维工程师日常 HTTP 操作场景</description></item><item><title>Python 系统与文件操作实战</title><link>https://socake.github.io/docs/languages/python/python%E7%B3%BB%E7%BB%9F%E6%93%8D%E4%BD%9C/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E7%B3%BB%E7%BB%9F%E6%93%8D%E4%BD%9C/</guid><description>深入讲解 Python 系统操作，含 subprocess 进程管理、psutil 系统监控，以及一个完整的生产级日志清理脚本</description></item><item><title>Python 自动化运维脚本实战</title><link>https://socake.github.io/docs/languages/python/python%E8%87%AA%E5%8A%A8%E5%8C%96%E8%84%9A%E6%9C%AC/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/python/python%E8%87%AA%E5%8A%A8%E5%8C%96%E8%84%9A%E6%9C%AC/</guid><description>系统化讲解 Python 自动化运维脚本的标准结构，包含命令行解析、日志、配置、告警和并发执行的完整最佳实践</description></item><item><title>Vim 速查手册</title><link>https://socake.github.io/docs/linux/vim%E9%80%9F%E6%9F%A5%E6%89%8B%E5%86%8C/</link><pubDate>Tue, 09 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/linux/vim%E9%80%9F%E6%9F%A5%E6%89%8B%E5%86%8C/</guid><description>覆盖 Vim 四种模式、所有移动方式、宏录制与寄存器、.vimrc 推荐配置，以及批量删除空行、注释多行、列操作等运维高频场景。</description></item><item><title>kubectl 命令速查手册</title><link>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/kubernetes/kubectl-%E5%91%BD%E4%BB%A4%E9%80%9F%E6%9F%A5/</guid><description>kubectl 实用命令手册，按场景分类整理，涵盖资源查看、Pod调试、日志查看、滚动更新、扩缩容、强制删除等高频操作。</description></item><item><title>Shell 脚本运维速查手册</title><link>https://socake.github.io/docs/languages/shell/shell-%E8%BF%90%E7%BB%B4%E9%80%9F%E6%9F%A5/</link><pubDate>Mon, 08 Dec 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/docs/languages/shell/shell-%E8%BF%90%E7%BB%B4%E9%80%9F%E6%9F%A5/</guid><description>Shell 运维速查手册，包含文本处理（awk/sed/grep）、进程排查、网络诊断、批量操作模板，以及实用的脚本编写规范。</description></item><item><title>DevOps/运维工程师面试题精选：K8s、Linux、网络高频考点</title><link>https://socake.github.io/posts/devops-interview-questions/</link><pubDate>Sun, 07 Dec 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-interview-questions/</guid><description>基于真实面试经验整理的运维/DevOps 面试题，覆盖 K8s 调度、故障排查、Linux 内核、网络协议等方向，附「面试官真正想考的点」，帮你把答案说到位。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-interview-questions/featured.jpg"/></item><item><title>阿里云 SDK 运维自动化：ECS/ACK/RDS 资源管理与巡检脚本</title><link>https://socake.github.io/posts/aliyun-sdk-ops/</link><pubDate>Thu, 04 Dec 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/aliyun-sdk-ops/</guid><description>用阿里云 Python SDK 实现 ECS 实例查询与监控、ACK 节点状态检查、RDS 慢查询巡检，整合成 HTML 格式巡检报告自动推送钉钉。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/aliyun-sdk-ops/featured.jpg"/></item><item><title>零信任网络改造：从公网暴露到 Headscale VPN</title><link>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</link><pubDate>Sat, 22 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/</guid><description>从发现公网暴露的安全隐患开始，到用 Headscale 自建零信任网络，替代跳板机体系，实现 kubectl 和运维系统的 VPN 接入。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E9%9B%B6%E4%BF%A1%E4%BB%BB%E7%BD%91%E7%BB%9C%E5%AE%9E%E8%B7%B5/featured.jpg"/></item><item><title>如何设计一个好的告警体系</title><link>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</link><pubDate>Tue, 18 Nov 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/</guid><description>从真实的告警噪音泛滥经历出发，分享如何用 SLI/SLO 重新设计告警体系，包括告警分级、规则设计原则、路由策略和复盘机制。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E5%91%8A%E8%AD%A6%E4%BD%93%E7%B3%BB%E8%AE%BE%E8%AE%A1/featured.jpg"/></item><item><title>Python 操作 Elasticsearch：从索引管理到复杂聚合查询</title><link>https://socake.github.io/posts/python-elasticsearch-client/</link><pubDate>Tue, 04 Nov 2025 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-elasticsearch-client/</guid><description>从客户端初始化到批量操作、scroll 查询、聚合统计，一篇文章搞定 Python 操作 Elasticsearch 的高频场景。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-elasticsearch-client/featured.jpg"/></item><item><title>Python 定时任务工程化：APScheduler 与 Celery Beat 实战对比</title><link>https://socake.github.io/posts/python-scheduled-tasks/</link><pubDate>Sat, 01 Nov 2025 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-scheduled-tasks/</guid><description>APScheduler 和 Celery Beat 是 Python 定时任务的两大主流方案。本文从使用场景出发，对比两者的架构差异、适用边界，并介绍 K8s CronJob 作为第三条路的价值，帮你在项目里选对工具。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-scheduled-tasks/featured.jpg"/></item><item><title>Vector 日志处理管道：高性能日志采集与转换实践</title><link>https://socake.github.io/posts/vector-log-pipeline/</link><pubDate>Tue, 14 Oct 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vector-log-pipeline/</guid><description>从架构对比到 K8s DaemonSet 落地，结合 VRL 实战示例和踩坑经验，讲透 Vector 在日志采集管道中的应用。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vector-log-pipeline/featured.jpg"/></item><item><title>Filebeat + Logstash 日志采集管道：大规模日志处理实战</title><link>https://socake.github.io/posts/filebeat-logstash-pipeline/</link><pubDate>Fri, 10 Oct 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/filebeat-logstash-pipeline/</guid><description>大流量日志场景下，Fleet 直写 ES 会出现严重写入堆积。本文记录了我们从 Fleet 切换到 Filebeat + Kafka + Logstash 管道的全过程，重点讲 Logstash pipeline 配置和性能调优。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/filebeat-logstash-pipeline/featured.jpg"/></item><item><title>Elasticsearch 备份与恢复：快照管理与跨集群迁移实践</title><link>https://socake.github.io/posts/elasticsearch-backup-restore/</link><pubDate>Fri, 03 Oct 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-backup-restore/</guid><description>Snapshot API 配置、S3 IRSA 认证、定时快照脚本，以及跨集群迁移三种方案的对比与实战踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-backup-restore/featured.jpg"/></item><item><title>Elasticsearch 查询实战：从 URI Search 到 DSL 复杂聚合</title><link>https://socake.github.io/posts/elasticsearch-dsl-query/</link><pubDate>Wed, 01 Oct 2025 09:17:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-dsl-query/</guid><description>ES 查询是每个运维必须掌握的技能。这篇文章从 URI Search 快速上手，到 DSL bool 查询、聚合分析，再到运维常用的 _cat API，配合真实排障场景整理成一篇实战手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-dsl-query/featured.jpg"/></item><item><title>Elasticsearch 索引策略：ILM 生命周期管理与写入性能优化</title><link>https://socake.github.io/posts/elasticsearch-index-optimization/</link><pubDate>Wed, 24 Sep 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-index-optimization/</guid><description>ILM 四阶段配置、rollover 策略、bulk 写入调优，以及分片数规划和 mapping 爆炸的避坑指南。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-index-optimization/featured.jpg"/></item><item><title>Elasticsearch 集群部署实战：ECK 在 K8s 上的生产级配置</title><link>https://socake.github.io/posts/elasticsearch-cluster-deployment/</link><pubDate>Fri, 19 Sep 2025 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/elasticsearch-cluster-deployment/</guid><description>从集群角色规划到 ECK Operator 落地，结合生产环境踩坑经验，完整讲解 Elasticsearch 在 Kubernetes 上的生产级部署方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/elasticsearch-cluster-deployment/featured.jpg"/></item><item><title>Kubernetes 成本优化实战：系统性降本的四条路径</title><link>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</link><pubDate>Mon, 18 Aug 2025 13:07:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/</guid><description>真实的降本案例：从发现成本异常到分析根因，通过 Karpenter 节点弹性伸缩、资源请求规格治理、大机型收敛等手段，系统性降低 AWS EC2 成本。包含具体配置和执行思路。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E6%88%98/featured.jpg"/></item><item><title>云原生转型实践：从传统运维到 K8s 的迁移经验</title><link>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</link><pubDate>Thu, 14 Aug 2025 12:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/</guid><description>这是一篇个人经验向的文章，记录了从传统虚拟机运维转向 Kubernetes 的全过程：为什么要迁移、迁移中踩了哪些坑、团队如何度过学习曲线，以及回头看哪些事情当时做对了。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E4%BA%91%E5%8E%9F%E7%94%9F%E8%BD%AC%E5%9E%8B%E7%BB%8F%E9%AA%8C/featured.jpg"/></item><item><title>VictoriaMetrics：比 Prometheus 更省资源的监控存储方案</title><link>https://socake.github.io/posts/victoriametrics-prometheus/</link><pubDate>Mon, 28 Jul 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/victoriametrics-prometheus/</guid><description>Prometheus 撑不住了？本文对比 VictoriaMetrics 与 Prometheus 的核心差异，介绍 remote_write 无缝迁移方案，以及 VM 在资源占用、压缩率、查询性能上的实际提升。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/victoriametrics-prometheus/featured.jpg"/></item><item><title>SRE 核心理念：从运维思维到可靠性工程</title><link>https://socake.github.io/posts/sre-concepts-and-principles/</link><pubDate>Thu, 26 Jun 2025 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sre-concepts-and-principles/</guid><description>SRE 不是给运维换了个更好听的名字。它是一套用软件工程思维解决可靠性问题的方法论。本文从 Error Budget 切入，覆盖 SLI/SLO 制定、Toil 识别、On-call 设计、故障复盘文化，以及从传统运维转型 SRE 的实际路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre-concepts-and-principles/featured.jpg"/></item><item><title>Kubernetes 集群升级策略：零停机升级的完整实践指南</title><link>https://socake.github.io/posts/kubernetes-upgrade-strategy/</link><pubDate>Wed, 14 May 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-upgrade-strategy/</guid><description>K8s 集群升级听起来简单，实际操作中坑很多：API 弃用导致的 Helm 失败、Admission Webhook 拦截升级流量、PDB 配置不当导致服务中断。这篇文章从真实的升级经验出发，给出一套可复用的零停机升级方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-upgrade-strategy/featured.jpg"/></item><item><title>RabbitMQ 运维实战：集群部署、消费者可靠性与监控体系</title><link>https://socake.github.io/posts/rabbitmq-ops-practice/</link><pubDate>Tue, 22 Apr 2025 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/rabbitmq-ops-practice/</guid><description>系统梳理 RabbitMQ 运维核心技能：Quorum Queue 集群部署与镜像队列对比、生产配置调优、消费者 prefetch 与死信队列配置、基于 Management API 和 rabbitmq_exporter 的监控体系，以及消息堆积、脑裂等常见故障的处理方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/rabbitmq-ops-practice/featured.jpg"/></item><item><title>Celery 异步任务详解：任务队列、重试策略与分布式部署</title><link>https://socake.github.io/posts/celery-async-tasks/</link><pubDate>Tue, 22 Apr 2025 09:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/celery-async-tasks/</guid><description>从 Celery 架构到 K8s 部署，覆盖任务定义、重试策略、队列路由、Beat 定时任务和 Flower 监控，附完整的生产部署配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/celery-async-tasks/featured.jpg"/></item><item><title>ETCD 运维实战：部署、备份恢复与 K8s 集群数据管理</title><link>https://socake.github.io/posts/etcd-ops-practice/</link><pubDate>Sun, 13 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/etcd-ops-practice/</guid><description>ETCD 是 Kubernetes 的命脉，所有集群状态都存储在这里。本文从实际运维角度梳理部署、备份、恢复和配置动态更新的完整操作链路，包含多个踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/etcd-ops-practice/featured.jpg"/></item><item><title>数据库运维实践：MySQL 高可用与 PostgreSQL 调优经验</title><link>https://socake.github.io/posts/database-ops-practice/</link><pubDate>Tue, 08 Apr 2025 13:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/database-ops-practice/</guid><description>数据库运维不复杂，但细节多、出问题代价大。本文整理了 MySQL 主从复制、慢查询分析、PostgreSQL 连接池这几个高频话题的实战经验，以及一些日常运维 SQL 备忘。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/database-ops-practice/featured.jpg"/></item><item><title>Kafka 运维实战：消息堆积排查、分区再平衡与监控体系</title><link>https://socake.github.io/posts/kafka-ops-practice/</link><pubDate>Mon, 07 Apr 2025 11:37:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kafka-ops-practice/</guid><description>系统梳理 Kafka 运维核心技能：消费者延迟监控告警、消息堆积根因分析、分区扩容规划、Rebalance 风暴处理，以及 KEDA 基于 lag 自动扩缩的配置实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kafka-ops-practice/featured.jpg"/></item><item><title>MongoDB 运维入门：部署、备份与生产性能调优</title><link>https://socake.github.io/posts/mongodb-ops-practice/</link><pubDate>Mon, 31 Mar 2025 11:41:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mongodb-ops-practice/</guid><description>MongoDB 运维从选型到调优：何时选 MongoDB、Replica Set 三节点部署、索引设计、mongodump 备份，以及 wiredTiger、连接池、大文档等生产踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mongodb-ops-practice/featured.jpg"/></item><item><title>Alertmanager 完全指南：路由、抑制、静默与多渠道通知</title><link>https://socake.github.io/posts/alertmanager-routing-config/</link><pubDate>Sat, 22 Mar 2025 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/alertmanager-routing-config/</guid><description>告警太多和告警太少一样有害。Alertmanager 的路由、抑制、分组机制是控制告警噪声的核心手段，本文从一个真实的多环境告警体系出发，讲清楚每个配置的意图和陷阱。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/alertmanager-routing-config/featured.jpg"/></item><item><title>Grafana API 自动化：用代码管理 Dashboard、数据源和告警</title><link>https://socake.github.io/posts/grafana-api-automation/</link><pubDate>Tue, 18 Mar 2025 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grafana-api-automation/</guid><description>手动点 UI 管理 Grafana Dashboard 在多环境场景下是噩梦。用 API 把 Dashboard 代码化，实现版本控制和环境同步，才是正确姿势。本文提供完整的 Python 工具脚本和实战踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grafana-api-automation/featured.jpg"/></item><item><title>PostgreSQL 运维实战：配置调优、连接池、慢查询与高可用</title><link>https://socake.github.io/posts/postgresql-ops-practice/</link><pubDate>Tue, 18 Mar 2025 10:15:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/postgresql-ops-practice/</guid><description>系统梳理 PostgreSQL 运维核心技能：从 shared_buffers、WAL 参数调优，到 PgBouncer 事务模式配置；从 pg_stat_statements 慢查询分析到 PITR 时间点恢复；以及主从流复制、膨胀表清理和 Prometheus 监控指标的完整实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/postgresql-ops-practice/featured.jpg"/></item><item><title>Prometheus 服务发现深度解析：kubernetes_sd_configs 实战</title><link>https://socake.github.io/posts/prometheus-service-discovery/</link><pubDate>Sat, 15 Mar 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/prometheus-service-discovery/</guid><description>在 K8s 环境里手动维护 Prometheus scrape targets 是不现实的，kubernetes_sd_configs 配合 relabel_configs 是解决这个问题的核心机制。本文从原理到实践，把这套体系讲透。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/prometheus-service-discovery/featured.jpg"/></item><item><title>Zookeeper 运维实战：集群部署、调优与故障排查</title><link>https://socake.github.io/posts/zookeeper-ops-practice/</link><pubDate>Wed, 05 Mar 2025 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/zookeeper-ops-practice/</guid><description>系统梳理 Zookeeper 生产运维核心技能：ZNode 类型与 Watcher 机制、ZAB 选举算法、3/5 节点集群部署配置、JVM 与 zoo.cfg 调优、四字命令实战诊断、常见故障处理，以及与 Kafka KRaft 模式的关系和云原生场景下的定位。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/zookeeper-ops-practice/featured.jpg"/></item><item><title>Kubernetes 日志采集方案选型：从技术对比到生产落地</title><link>https://socake.github.io/posts/k8s-logging-solution/</link><pubDate>Tue, 25 Feb 2025 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/k8s-logging-solution/</guid><description>记录我们团队从无到有建立 Kubernetes 日志采集系统的完整历程，最终选择 Fluent Bit + Fluentd + Elasticsearch 方案的技术依据，以及生产环境踩过的那些坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/k8s-logging-solution/featured.jpg"/></item><item><title>Secret 管理实战：HashiCorp Vault + External Secrets Operator</title><link>https://socake.github.io/posts/vault-external-secrets/</link><pubDate>Thu, 20 Feb 2025 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/vault-external-secrets/</guid><description>base64 不是加密。本文从 Secret 泄露风险说起，完整介绍 Vault 核心概念、K8s 部署方式、ESO 集成配置，以及动态数据库凭证的自动轮换实践。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/vault-external-secrets/featured.jpg"/></item><item><title>Consul 服务注册与发现：从入门到生产级健康检查</title><link>https://socake.github.io/posts/consul-service-discovery/</link><pubDate>Tue, 18 Feb 2025 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/consul-service-discovery/</guid><description>微服务时代，动态 IP 和服务健康状态管理是绕不过去的问题。Consul 提供了一套完整的服务发现解决方案，本文从实操角度梳理其核心用法和生产踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/consul-service-discovery/featured.jpg"/></item><item><title>Harbor 镜像仓库生产运维：高可用、安全扫描与 CI/CD 集成</title><link>https://socake.github.io/posts/harbor-registry-ops/</link><pubDate>Tue, 18 Feb 2025 09:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/harbor-registry-ops/</guid><description>从 Harbor 架构原理出发，系统梳理生产环境中高可用部署方案、镜像安全扫描策略、跨区域复制配置、权限体系设计，以及与 Jenkins/GitLab CI 的集成实践，附故障排查手册与 Prometheus 监控配置。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/harbor-registry-ops/featured.jpg"/></item><item><title>Ansible 批量运维自动化：从临时命令到 Role 工程化</title><link>https://socake.github.io/posts/ansible-ops-automation/</link><pubDate>Wed, 12 Feb 2025 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ansible-ops-automation/</guid><description>Ansible 无 Agent、SSH 推送、幂等性三大特性让它成为 Linux 批量运维的利器。本文从入门用法到 Role 工程化实践，梳理了日常运维中高频场景的完整操作思路和踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ansible-ops-automation/featured.jpg"/></item><item><title>Kubernetes YAML 工程化：常用资源模板与生产最佳实践</title><link>https://socake.github.io/posts/kubernetes-yaml-patterns/</link><pubDate>Sun, 19 Jan 2025 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-yaml-patterns/</guid><description>写好 Kubernetes YAML 不只是语法问题，更多是工程经验的沉淀。本文梳理了生产环境中常见的 YAML 反模式，并给出各类资源的完整可用模板。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-yaml-patterns/featured.jpg"/></item><item><title>运维工程师的技术成长：从执行者到架构者的路径规划</title><link>https://socake.github.io/posts/devops-career-growth/</link><pubDate>Sun, 22 Dec 2024 09:52:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/devops-career-growth/</guid><description>运维工程师的成长不是工具的堆砌，而是认知层次的跃迁。这篇文章记录了我对这条路的观察和思考——哪些时机会让人真正进阶，哪些惯性思维会让人原地踏步。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/devops-career-growth/featured.jpg"/></item><item><title>故障排查方法论：从现象到根因</title><link>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/</link><pubDate>Tue, 17 Dec 2024 12:27:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/</guid><description>好的排查不靠直觉，靠方法。这篇文章总结了我在多次生产故障中提炼出的排查框架：从时间线构建到假设优先级，再到认知陷阱的识别与规避。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/%E6%95%85%E9%9A%9C%E6%8E%92%E6%9F%A5%E6%96%B9%E6%B3%95%E8%AE%BA/featured.jpg"/></item><item><title>SRE 实践心得：从运维到 SRE 的思维转变</title><link>https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/</link><pubDate>Wed, 11 Dec 2024 11:26:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/</guid><description>SRE 不是换了个头衔的运维，而是一套用软件工程思维解决可靠性问题的方法论。这篇文章记录了我在实践过程中最有感触的几个转变。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre%E5%AE%9E%E8%B7%B5%E5%BF%83%E5%BE%97/featured.jpg"/></item><item><title>Python 对接 Prometheus：查询监控数据与告警状态自动化</title><link>https://socake.github.io/posts/python-prometheus-monitoring/</link><pubDate>Mon, 25 Nov 2024 11:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-prometheus-monitoring/</guid><description>用 Python 直接调 Prometheus HTTP API，实现服务存活巡检、可用率日报生成，最后接入钉钉每日自动推送集群健康摘要。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-prometheus-monitoring/featured.jpg"/></item><item><title>Python 自动化运维：从脚本到完整工具的工程化实践</title><link>https://socake.github.io/posts/python-devops-automation/</link><pubDate>Tue, 12 Nov 2024 11:01:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/python-devops-automation/</guid><description>系统梳理 Python 运维自动化的工程化方法：boto3 操作 AWS 资源、Kubernetes Python SDK 使用、Click/Typer CLI 框架选型、数据库批量运维脚本、钉钉 Webhook 集成，以及类型注解与错误处理的实践经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/python-devops-automation/featured.jpg"/></item><item><title>Redis 运维实践：持久化配置、集群模式与生产监控</title><link>https://socake.github.io/posts/redis-ops-practice/</link><pubDate>Wed, 06 Nov 2024 10:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/redis-ops-practice/</guid><description>Redis 运维看起来简单，但真到了生产出了问题才知道水有多深。本文整理了持久化、集群、监控、故障处理等核心运维主题。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/redis-ops-practice/featured.jpg"/></item><item><title>MySQL 备份与恢复实战：从 mysqldump 到 XtraBackup 的完整方案</title><link>https://socake.github.io/posts/mysql-backup-restore/</link><pubDate>Fri, 01 Nov 2024 11:33:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/mysql-backup-restore/</guid><description>从 mysqldump 到 XtraBackup，从全量备份到基于 binlog 的时间点恢复，这篇文章覆盖了 MySQL 备份恢复的完整知识体系，包括生产环境的踩坑和自动化验证方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/mysql-backup-restore/featured.jpg"/></item><item><title>Nginx 运维完全指南：反向代理、负载均衡、HTTPS 与限流</title><link>https://socake.github.io/posts/nginx-ops-complete/</link><pubDate>Thu, 24 Oct 2024 12:06:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/nginx-ops-complete/</guid><description>Nginx 知道怎么装，但真的会用吗？本文从配置结构说起，完整覆盖反向代理、负载均衡策略、Let&amp;rsquo;s Encrypt 证书、限流配置、日志分析和性能调优，附常见 502/SSL 故障排查。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/nginx-ops-complete/featured.jpg"/></item><item><title>Shell 脚本实战：Bash 自动化运维从入门到工程化</title><link>https://socake.github.io/posts/shell-script-automation/</link><pubDate>Wed, 02 Oct 2024 13:03:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/shell-script-automation/</guid><description>Shell 脚本是 SRE 的第一生产力工具。本文从语法精要出发，覆盖批量操作、日志轮转、健康检查等常用运维模式，再到 getopts、trap 信号处理和脚本工程化思路，最后总结引号地狱、变量作用域等经典踩坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/shell-script-automation/featured.jpg"/></item><item><title>Docker 最佳实践：从 Dockerfile 到生产部署</title><link>https://socake.github.io/posts/docker-best-practices/</link><pubDate>Sat, 21 Sep 2024 09:56:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/docker-best-practices/</guid><description>多阶段构建、.dockerignore 遗漏、非 root 运行、构建缓存优化，以及 entrypoint/cmd 信号处理这些在生产中实际踩过的问题，用具体的 Dockerfile 示例逐一拆解。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/docker-best-practices/featured.jpg"/></item><item><title>Linux 系统管理精要——DevOps 工程师必知的系统层知识</title><link>https://socake.github.io/posts/linux-system-admin-devops/</link><pubDate>Mon, 16 Sep 2024 13:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-system-admin-devops/</guid><description>做了多年 DevOps，我越来越觉得 Linux 系统层的知识是一切排障的基础。当 Kubernetes Pod 莫名被杀、Java 服务突然无响应、磁盘 IO 飙高导致整机卡顿——最终都要落到系统层来定位。这篇文章把我在生产中最常用的系统管理技能系统梳理一遍。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-system-admin-devops/featured.jpg"/></item><item><title>Linux 性能调优实战：CPU、内存、IO 瓶颈的系统排查方法</title><link>https://socake.github.io/posts/linux-performance-tuning/</link><pubDate>Sun, 08 Sep 2024 13:50:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/linux-performance-tuning/</guid><description>从工具链选择到实战排查，梳理 Linux 性能调优的完整方法论：CPU 上下文切换与软中断分析、OOM 日志解读、IO 调度器选择、TCP TIME_WAIT 处理，以及容器环境下 cgroup 限制的特殊影响。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/linux-performance-tuning/featured.jpg"/></item></channel></rss>