<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>云原生 on 黄文卓 | DevOps Engineer</title><link>https://socake.github.io/categories/%E4%BA%91%E5%8E%9F%E7%94%9F/</link><description>Recent content in 云原生 on 黄文卓 | DevOps Engineer</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor><webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster><copyright>© 2026 Wenzhuo Huang</copyright><lastBuildDate>Sat, 18 Apr 2026 13:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/categories/%E4%BA%91%E5%8E%9F%E7%94%9F/index.xml" rel="self" type="application/rss+xml"/><item><title>多云中间件横向速查与跨环境隔离实战</title><link>https://socake.github.io/posts/multi-cloud-middleware-and-isolation/</link><pubDate>Sat, 18 Apr 2026 13:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/multi-cloud-middleware-and-isolation/</guid><description>做多云运维最容易的事就是把 AWS 那套思维原样搬到阿里云，然后在某次故障里发现选型完全错位。本文整理了一份 AWS↔阿里云中间件横向对照表，附上跨环境隔离强制 checklist 和高频运维命令速查，是我自己工作中反复回查的一份速记。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/multi-cloud-middleware-and-isolation/featured.jpg"/></item><item><title>Argo Workflows 工作流实战：批处理与 ML Pipeline</title><link>https://socake.github.io/posts/argo-workflows-practice/</link><pubDate>Sun, 12 Apr 2026 11:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/argo-workflows-practice/</guid><description>Argo Workflows 是 Kubernetes 原生的工作流引擎，适合批处理和 ML Pipeline 场景。本文涵盖与 Airflow/Temporal 的选型对比、核心资源模型、三个完整实战（DAG 数据处理、ML 训练 Pipeline、定时备份）、资源管控（Semaphore/Node Selector）、Argo Events 事件驱动触发，以及 Prometheus 监控和常见问题处理。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/argo-workflows-practice/featured.jpg"/></item><item><title>gRPC 微服务实践：协议、负载均衡与 Kubernetes 集成</title><link>https://socake.github.io/posts/grpc-microservices-practice/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/grpc-microservices-practice/</guid><description>从协议原理到 Kubernetes 生产落地，系统梳理 gRPC 微服务的核心实践：Protobuf 向后兼容设计、拦截器链（日志/限流/OTel）、长连接负载不均问题（headless Service + round_robin vs Envoy L7）、健康检查 Probe 配置、以及 grpc-gateway REST 共存方案。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/grpc-microservices-practice/featured.jpg"/></item><item><title>Service Mesh 技术选型：Istio vs Cilium vs Linkerd 深度对比</title><link>https://socake.github.io/posts/service-mesh-comparison/</link><pubDate>Sun, 12 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/service-mesh-comparison/</guid><description>Istio、Cilium Service Mesh、Linkerd 三种方案各有侧重：Istio 功能最全但最重，Cilium 基于 eBPF 性能最优，Linkerd 最轻量最易运维。本文从架构、性能、功能、运维四个维度全面拆解，帮助架构师做出有数据支撑的选型决策。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/service-mesh-comparison/featured.jpg"/></item><item><title>Flagger 渐进式交付实战：金丝雀、蓝绿、A/B 与 Istio/NGINX/Gateway API 集成</title><link>https://socake.github.io/posts/flagger-progressive-delivery/</link><pubDate>Sat, 11 Apr 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/flagger-progressive-delivery/</guid><description>传统的 kubectl apply 发布方式让风险集中在发布那一刻。Flagger 通过指标驱动的渐进式切流（Canary Analysis），把风险摊到整个发布过程，异常自动回滚。本文基于官方文档，系统讲解 Canary CR 的完整字段、三种策略的配置模板、与 Istio/NGINX Ingress/Gateway API 的集成、自定义指标分析、自动化回滚机制，以及与 Argo Rollouts 的选型对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/flagger-progressive-delivery/featured.jpg"/></item><item><title>Volcano 批调度实战：AI 训练集群的 Gang Scheduling、队列与抢占</title><link>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</link><pubDate>Wed, 25 Mar 2026 15:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/volcano-gpu-batch-scheduling/</guid><description>K8s 默认调度器对 AI 训练极不友好。Volcano 把 HPC 调度理念搬进 K8s：Gang Scheduling、Queue、Fairshare、Preemption、拓扑亲和。这篇讲清楚它在 AI 训练集群的落地。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/volcano-gpu-batch-scheduling/featured.jpg"/></item><item><title>FluxCD vs ArgoCD 深度对比与迁移实战：架构、语义、多租户与选型决策</title><link>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</link><pubDate>Sun, 22 Mar 2026 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/fluxcd-vs-argocd-migration/</guid><description>GitOps 的两条主流路线——FluxCD 与 ArgoCD——在架构、语义、运维成本和扩展性上有显著差异。本文基于官方文档和生产实战，按同步模型、应用抽象、多租户隔离、Helm 支持、可观测性、扩展机制逐项对比，给出选型决策树，并提供一套可复用的从 ArgoCD 迁移到 FluxCD 的操作手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/fluxcd-vs-argocd-migration/featured.jpg"/></item><item><title>Kyverno 策略即代码实战：从准入到变异到生成的全场景落地</title><link>https://socake.github.io/posts/kyverno-policy-as-code/</link><pubDate>Fri, 28 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kyverno-policy-as-code/</guid><description>一份基于 Kyverno 1.12+ 的生产落地笔记：覆盖 validate/mutate/generate/verifyImages 四种策略类型的实战用法、CEL 和 JMESPath 表达式语法、策略分层治理、PolicyException、性能调优和常见踩坑，并与 OPA Gatekeeper 做对比。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kyverno-policy-as-code/featured.jpg"/></item><item><title>Pod Security Standards 生产落地：从 PSP 到 PSA 的迁移实战</title><link>https://socake.github.io/posts/kubernetes-pod-security-standards/</link><pubDate>Fri, 21 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/kubernetes-pod-security-standards/</guid><description>一份从 PSP 迁移到 Pod Security Standards 的实战笔记：对比 Baseline 与 Restricted 两套 profile 的实际约束、Pod Security Admission 的三种 mode、如何一次性迁移 200+ 命名空间、和 Kyverno/OPA 互补使用的最佳实践，以及遗留业务 securityContext 改造的典型模式。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/kubernetes-pod-security-standards/featured.jpg"/></item><item><title>WebAssembly 在云原生中的应用：从浏览器到 K8s 数据面</title><link>https://socake.github.io/posts/webassembly-cloud-native/</link><pubDate>Sat, 08 Nov 2025 14:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/webassembly-cloud-native/</guid><description>WebAssembly 在云原生领域的热度持续上涨，但很多讨论都停留在概念层面。这篇文章试图给出一个务实的视角：Wasm 在哪些云原生场景已经可以生产落地，在哪些场景还需要等待，以及和容器相比的真实差异。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/webassembly-cloud-native/featured.jpg"/></item><item><title>Istio Ambient Mode 无 Sidecar 服务网格实践</title><link>https://socake.github.io/posts/istio-ambient-mesh-practice/</link><pubDate>Sat, 08 Nov 2025 10:00:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/istio-ambient-mesh-practice/</guid><description>Sidecar 模式已经陪我们走了六七年，但它的问题也越来越难以忽视。Ambient Mode 不是缝缝补补，而是从架构层面重新设计了服务网格的数据面。本文从实际运维视角深入拆解 ztunnel + Waypoint 两层架构，并给出从 Sidecar 迁移到 Ambient 的完整路径。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/istio-ambient-mesh-practice/featured.jpg"/></item><item><title>eBPF 可观测性实践：Cilium 网络监控与 Tetragon 安全审计</title><link>https://socake.github.io/posts/ebpf-observability/</link><pubDate>Wed, 17 Sep 2025 12:36:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/ebpf-observability/</guid><description>eBPF 正在重塑云原生可观测性的底层基础。本文记录在 K8s 集群中落地 Cilium + Hubble 网络监控和 Tetragon 安全审计的实践经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/ebpf-observability/featured.jpg"/></item><item><title>Crossplane：用 GitOps 方式管理云资源（AWS/阿里云）</title><link>https://socake.github.io/posts/crossplane-gitops-cloud/</link><pubDate>Thu, 26 Jun 2025 12:44:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/crossplane-gitops-cloud/</guid><description>Crossplane 把 AWS RDS、S3、EKS 变成 K8s CRD，用 GitOps 方式持续协调云资源状态。记录从概念到落地的实践过程和踩坑经验。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/crossplane-gitops-cloud/featured.jpg"/></item><item><title>Karmada 多集群联邦实战：PropagationPolicy、OverridePolicy 与 FailOver 的真实用法</title><link>https://socake.github.io/posts/karmada-multi-cluster/</link><pubDate>Sun, 02 Mar 2025 11:20:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/karmada-multi-cluster/</guid><description>如果你有 2 个以上 Kubernetes 集群，跨集群发同一个应用这件事迟早成为你的日常。Karmada 是 CNCF 孵化项目里做多集群联邦最完整的一个，但它的 CRD 设计比较克制，生产要用得好，得理清资源分发、差异覆盖、调度和 failover 四层语义。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/karmada-multi-cluster/featured.jpg"/></item><item><title>ExternalDNS 多云 DNS 同步实战：从 Route53 到 Cloudflare 再到阿里云 DNS</title><link>https://socake.github.io/posts/external-dns-multi-provider/</link><pubDate>Sat, 22 Feb 2025 09:45:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/external-dns-multi-provider/</guid><description>手工在 Cloudflare 控制台点 DNS 记录这件事，随着集群和业务增长最终必然崩溃。ExternalDNS 就是把 Kubernetes 资源当 source-of-truth、DNS provider 当执行器的一个 controller。但真要用好，你得理解 txtOwnerId、policy、provider 各自的限制以及跨集群共享 zone 的几个坑。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/external-dns-multi-provider/featured.jpg"/></item><item><title>cert-manager 生产级实战：从 Let's Encrypt 到企业内网 PKI 的完整路线</title><link>https://socake.github.io/posts/cert-manager-production/</link><pubDate>Sat, 15 Feb 2025 14:30:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/cert-manager-production/</guid><description>cert-manager 几乎是每个 Kubernetes 集群的标配，但真正跑到生产的团队都会遇到：Let&amp;rsquo;s Encrypt 限流被打爆、通配符证书续期失败、内部服务想要私有 CA、Istio / Gateway API 的证书怎么发。这篇把一年里我在 5 个集群上做 cert-manager 运维踩过的坑写成一份实操手册。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/cert-manager-production/featured.jpg"/></item><item><title>KEDA 事件驱动弹性伸缩实战：从 HPA 的尽头到真正按业务信号扩缩</title><link>https://socake.github.io/posts/keda-event-driven-autoscaling/</link><pubDate>Sat, 08 Feb 2025 10:12:00 +0800</pubDate><author>17691281867@163.com (Wenzhuo Huang)</author><guid>https://socake.github.io/posts/keda-event-driven-autoscaling/</guid><description>HPA 只能看 CPU/内存，但生产环境真正的扩缩信号往往是 Kafka lag、RabbitMQ 队列深度、Prometheus 自定义指标、甚至 cron。本文把 KEDA 的架构、核心 CRD、常见 scaler 的坑和运维动作写成一份资深工程师的备忘录，不讲理论，只讲什么样的配置能在凌晨 3 点把你从告警里救出来。</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/keda-event-driven-autoscaling/featured.jpg"/></item></channel></rss>