跳过正文
Backstage 开发者门户实战:构建内部开发者平台

Backstage 开发者门户实战:构建内部开发者平台

·2146 字·11 分钟·
目录
68 - 这篇文章属于一个选集。

团队 10 人的时候,口耳相传就够了——新人入职,老人带一天就能上手。团队 100 人的时候,口耳相传是灾难:谁知道 payment-service 用的是哪个 Kafka topic?前端该用哪个 API Gateway 地址?新建一个 Go 微服务需要配哪些 CI/CD 变量?这些知识分散在 Confluence、Slack、脑子里,每个人都在重复解答同样的问题。

Backstage 是 Spotify 开源的内部开发者平台(IDP)框架,干的就是把这些零散的知识和工具塞进一个入口。下面直接上落地过程。

为什么需要 IDP
#

配置漂移与知识孤岛
#

以下场景是否熟悉?

  • 生产环境某个服务的 Deployment 配置跟 Git 仓库不一致,没人知道是谁改的
  • 新建服务时,每个团队的 CI/CD 流水线配置各不相同,有的忘了加健康检查,有的忘了配告警
  • 某个关键服务的文档上次更新是两年前,实际行为早已改变,新人踩坑
  • 要找某个 API 的 owner,需要问一圈才能找到负责人

这些问题的本质是缺乏单一可信信息源(Single Source of Truth)。Backstage 的 Software Catalog 解决信息孤岛,Scaffolder 模板解决配置漂移,TechDocs 解决文档腐化。

IDP 带来的可量化价值
#

根据 DORA 2023 报告,使用内部开发者平台的团队相比未使用的团队:

  • 部署频率高 2.1 倍
  • 变更失败率低 22%
  • 新服务从 0 到上线时间缩短 60%+

Spotify 在推广 Backstage 内部使用后,开发者调研满意度提升了 35%,新人 onboarding 时间从 2 周缩短到 3 天。


Backstage 核心概念
#

Software Catalog
#

Catalog 是 Backstage 的核心,存储所有"软件实体"(Entity)的元数据:

  • Component:服务、库、网站、数据管道等
  • API:服务暴露的接口(OpenAPI/AsyncAPI/GraphQL)
  • Resource:数据库、S3 bucket、消息队列等基础设施
  • Group:团队或部门
  • User:开发者个人信息
  • System:相关组件的集合(如"支付系统"包含多个 Component)
  • Domain:业务领域(如"订单域"“用户域”)

每个实体通过 catalog-info.yaml 描述,存在对应的代码仓库里。

Scaffolder Templates
#

脚手架模板允许用户通过表单界面,一键创建符合团队规范的新服务——包括代码仓库、CI/CD 流水线、K8s 配置、监控告警规则一次性生成到位。这是从根源解决配置漂移的方案。

TechDocs
#

基于 MkDocs 的文档系统,文档以 Markdown 格式存在代码仓库里(文档即代码),Backstage 负责构建和展示,自动关联到对应的 Component。

Plugins
#

Backstage 的扩展机制。官方提供了 100+ 插件,社区还有更多。插件可以在 Catalog 页面增加 Tab,提供额外的上下文信息。


部署
#

本地快速体验
#

用 npx 快速启动(需要 Node.js 18+):

npx @backstage/create-app@latest
# 输入项目名称,如 my-backstage

cd my-backstage
yarn install
yarn dev

浏览器访问 http://localhost:3000 即可看到 Backstage 界面。

这个方式适合评估和开发,生产环境需要 Docker 镜像化后部署到 K8s。

生产环境:K8s + Helm 部署
#

构建镜像:

# packages/backend/Dockerfile
FROM node:18-bookworm-slim

# 安装依赖
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    apt-get update && \
    apt-get install -y --no-install-recommends python3 g++ build-essential && \
    rm -rf /var/lib/apt/lists/*

USER node

WORKDIR /app

COPY --chown=node:node yarn.lock package.json packages/backend/dist/bundle.tar.gz ./

RUN tar xzf bundle.tar.gz && \
    yarn install --frozen-lockfile --production --network-timeout 300000

CMD ["node", "packages/backend", "--config", "app-config.yaml", "--config", "app-config.production.yaml"]

Helm 部署:

helm repo add backstage https://backstage.github.io/charts
helm repo update

helm install backstage backstage/backstage \
  --namespace backstage \
  --create-namespace \
  --values values.yaml

values.yaml 核心配置:

backstage:
  image:
    registry: my-registry.com
    repository: backstage
    tag: "1.0.0"
  
  appConfig:
    app:
      baseUrl: https://backstage.company.com
    
    backend:
      baseUrl: https://backstage.company.com
      database:
        client: pg
        connection:
          host: ${POSTGRES_HOST}
          port: 5432
          user: ${POSTGRES_USER}
          password: ${POSTGRES_PASSWORD}
          database: backstage
    
    auth:
      providers:
        github:
          development:
            clientId: ${GITHUB_CLIENT_ID}
            clientSecret: ${GITHUB_CLIENT_SECRET}
    
    catalog:
      providers:
        github:
          myOrg:
            organization: "my-github-org"
            catalogPath: "/catalog-info.yaml"
            filters:
              branch: "main"
            schedule:
              frequency: { minutes: 30 }
              timeout: { minutes: 3 }

postgresql:
  enabled: true
  auth:
    password: ${POSTGRES_PASSWORD}

ingress:
  enabled: true
  host: backstage.company.com
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
  tls:
    - secretName: backstage-tls
      hosts:
        - backstage.company.com

Software Catalog 配置
#

catalog-info.yaml 规范
#

每个代码仓库根目录放一个 catalog-info.yaml

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  title: "支付服务"
  description: "处理订单支付、退款、对账的核心服务"
  annotations:
    # 关联 GitHub 仓库
    github.com/project-slug: "my-org/payment-service"
    # 关联 ArgoCD 应用
    argocd/app-name: "payment-service-prod"
    # 关联 Grafana 仪表盘
    grafana/dashboard-selector: "service=payment-service"
    # 关联 PagerDuty 服务
    pagerduty.com/service-id: "PXXXXXX"
    # 关联 Kubernetes 部署
    backstage.io/kubernetes-id: payment-service
    backstage.io/kubernetes-namespace: production
    # TechDocs
    backstage.io/techdocs-ref: dir:.
  tags:
    - go
    - payment
    - kafka
  links:
    - url: https://grafana.company.com/d/payment
      title: Grafana 监控
      icon: dashboard
    - url: https://runbook.company.com/payment-service
      title: Runbook
      icon: docs
spec:
  type: service
  lifecycle: production      # experimental / deprecated / production
  owner: group:payments-team
  system: payment-system
  dependsOn:
    - resource:default/payment-db
    - component:default/order-service
  providesApis:
    - payment-api

API 实体:

apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: payment-api
  description: "支付服务 REST API"
  annotations:
    backstage.io/techdocs-ref: dir:.
spec:
  type: openapi
  lifecycle: production
  owner: group:payments-team
  definition:
    $text: ./openapi.yaml   # 引用本仓库的 OpenAPI spec 文件

Resource 实体(数据库):

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
  name: payment-db
  description: "支付服务 PostgreSQL 数据库"
spec:
  type: database
  owner: group:payments-team
  system: payment-system

批量导入 GitHub 组织仓库
#

手动在每个仓库添加 catalog-info.yaml 是起步阶段的做法。规模大了需要自动化发现:

# app-config.yaml
catalog:
  providers:
    github:
      myOrg:
        organization: "my-github-org"
        # 自动扫描所有仓库
        catalogPath: "/catalog-info.yaml"
        filters:
          # 只扫描非归档仓库
          visibility: ["public", "private"]
        schedule:
          frequency: { hours: 1 }
          timeout: { minutes: 5 }

对于已有几百个仓库但没有 catalog-info.yaml 的情况,可以用脚本批量创建 PR:

#!/usr/bin/env python3
"""批量为 GitHub 组织仓库生成 catalog-info.yaml"""

import os
import base64
from github import Github

g = Github(os.environ["GITHUB_TOKEN"])
org = g.get_organization("my-github-org")

TEMPLATE = """apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: {repo_name}
  description: "{description}"
  annotations:
    github.com/project-slug: "my-github-org/{repo_name}"
  tags: []
spec:
  type: service
  lifecycle: production
  owner: group:default/unknown
"""

for repo in org.get_repos():
    if repo.archived:
        continue
    
    # 检查是否已有 catalog-info.yaml
    try:
        repo.get_contents("catalog-info.yaml")
        print(f"{repo.name}: 已存在,跳过")
        continue
    except Exception:
        pass
    
    content = TEMPLATE.format(
        repo_name=repo.name,
        description=repo.description or f"{repo.name} service"
    )
    
    # 创建 PR
    default_branch = repo.default_branch
    main_ref = repo.get_git_ref(f"heads/{default_branch}")
    
    branch_name = "add-backstage-catalog"
    try:
        repo.create_git_ref(
            f"refs/heads/{branch_name}",
            main_ref.object.sha
        )
    except Exception:
        pass
    
    repo.create_file(
        "catalog-info.yaml",
        "chore: add Backstage catalog config",
        content,
        branch=branch_name
    )
    
    repo.create_pull(
        title="Add Backstage catalog-info.yaml",
        body="自动生成 Backstage catalog 配置,请 review 后合并",
        head=branch_name,
        base=default_branch
    )
    print(f"{repo.name}: 已创建 PR")

脚手架模板:一键创建服务
#

模板结构
#

Scaffolder Template 由三部分组成:

  1. Parameters:用户填写的表单字段
  2. Steps:执行的操作(获取代码、生成文件、发布到 GitHub、注册到 Catalog)
  3. Output:完成后展示给用户的链接
# template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: create-go-service
  title: "创建 Go 微服务"
  description: "一键创建符合公司规范的 Go 微服务,包含 CI/CD 流水线、K8s 配置和监控告警"
  tags:
    - recommended
    - go
    - microservice
spec:
  owner: group:platform-team
  type: service

  parameters:
    - title: "服务基本信息"
      required: [name, description, owner]
      properties:
        name:
          title: 服务名称
          type: string
          description: "小写字母和连字符,如 payment-service"
          pattern: "^[a-z][a-z0-9-]*[a-z0-9]$"
          maxLength: 50
          ui:autofocus: true
        description:
          title: 服务描述
          type: string
          maxLength: 200
        owner:
          title: 负责团队
          type: string
          ui:field: OwnerPicker
          ui:options:
            allowArbitraryValues: false

    - title: "技术选型"
      properties:
        httpPort:
          title: HTTP 端口
          type: integer
          default: 8080
        enableKafka:
          title: 是否使用 Kafka
          type: boolean
          default: false
        enablePostgres:
          title: 是否使用 PostgreSQL
          type: boolean
          default: false
        deployEnvs:
          title: 部署环境
          type: array
          items:
            type: string
            enum: [dev, staging, production]
          uniqueItems: true
          ui:widget: checkboxes

    - title: "代码仓库"
      required: [repoOrg]
      properties:
        repoOrg:
          title: GitHub 组织
          type: string
          default: my-github-org
        repoVisibility:
          title: 仓库可见性
          type: string
          default: private
          enum: [public, private, internal]

  steps:
    # 从模板目录拉取骨架代码
    - id: fetch-base
      name: 初始化代码模板
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          httpPort: ${{ parameters.httpPort }}
          enableKafka: ${{ parameters.enableKafka }}
          enablePostgres: ${{ parameters.enablePostgres }}

    # 创建 GitHub 仓库并推送代码
    - id: publish
      name: 创建 GitHub 仓库
      action: publish:github
      input:
        allowedHosts: ["github.com"]
        description: ${{ parameters.description }}
        repoUrl: "github.com?owner=${{ parameters.repoOrg }}&repo=${{ parameters.name }}"
        defaultBranch: main
        repoVisibility: ${{ parameters.repoVisibility }}
        gitCommitMessage: "feat: initial service scaffold"
        topics:
          - go
          - microservice

    # 注册到 Backstage Catalog
    - id: register
      name: 注册到 Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: "/catalog-info.yaml"

    # 触发 CI 初始化
    - id: github-actions
      name: 配置 GitHub Actions
      action: github:actions:dispatch
      input:
        repoUrl: ${{ steps.publish.output.remoteUrl }}
        workflowId: init-service.yml
        branchOrTagName: main
        workflowInputs:
          service_name: ${{ parameters.name }}
          deploy_envs: ${{ parameters.deployEnvs | join(',') }}

  output:
    links:
      - title: 打开代码仓库
        url: ${{ steps.publish.output.remoteUrl }}
        icon: github
      - title: 查看 Catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}
      - title: 查看 CI 流水线
        url: ${{ steps.publish.output.remoteUrl }}/actions
        icon: launch

骨架代码模板(skeleton)
#

模板的 skeleton/ 目录包含实际的文件模板,使用 Nunjucks 语法插值:

skeleton/
├── catalog-info.yaml
├── go.mod
├── main.go
├── .github/
│   └── workflows/
│       └── ci.yml
├── k8s/
│   ├── deployment.yaml
│   └── service.yaml
└── docs/
    └── index.md

skeleton/catalog-info.yaml

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${{ values.name }}
  description: "${{ values.description }}"
  annotations:
    github.com/project-slug: "my-github-org/${{ values.name }}"
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  lifecycle: experimental
  owner: ${{ values.owner }}

skeleton/k8s/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${{ values.name }}
  labels:
    app: ${{ values.name }}
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ${{ values.name }}
  template:
    metadata:
      labels:
        app: ${{ values.name }}
    spec:
      containers:
        - name: ${{ values.name }}
          image: my-registry.com/${{ values.name }}:latest
          ports:
            - containerPort: ${{ values.httpPort }}
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /healthz
              port: ${{ values.httpPort }}
          readinessProbe:
            httpGet:
              path: /readyz
              port: ${{ values.httpPort }}

Kubernetes 插件
#

K8s 插件让开发者不需要直接使用 kubectl,就能在 Backstage 界面查看服务的部署状态、Pod 日志、HPA 状态等。

安装配置
#

# 前端插件
yarn --cwd packages/app add @backstage/plugin-kubernetes

# 后端插件
yarn --cwd packages/backend add @backstage/plugin-kubernetes-backend

app-config.yaml 中配置集群信息:

kubernetes:
  serviceLocatorMethod:
    type: "multiTenant"
  clusterLocatorMethods:
    - type: "config"
      clusters:
        - url: https://k8s-prod.company.com
          name: production
          authProvider: "serviceAccount"
          skipTLSVerify: false
          skipMetricsLookup: false
          serviceAccountToken: ${K8S_PROD_SA_TOKEN}
          caData: ${K8S_PROD_CA_DATA}
        - url: https://k8s-staging.company.com
          name: staging
          authProvider: "serviceAccount"
          serviceAccountToken: ${K8S_STAGING_SA_TOKEN}
          caData: ${K8S_STAGING_CA_DATA}
  customResources:
    - group: "argoproj.io"
      apiVersion: "v1alpha1"
      plural: "rollouts"

为 Backstage 创建专用的 ServiceAccount 和 RBAC:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: backstage
  namespace: backstage
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backstage-read-only
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log", "services", "configmaps", "limitranges",
                "resourcequotas", "endpoints", "events", "namespaces"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets", "statefulsets", "daemonsets"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["autoscaling"]
    resources: ["horizontalpodautoscalers"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backstage-read-only
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: backstage-read-only
subjects:
  - kind: ServiceAccount
    name: backstage
    namespace: backstage

服务关联 K8s 资源,在 catalog-info.yaml 中添加注解:

annotations:
  backstage.io/kubernetes-id: payment-service
  backstage.io/kubernetes-namespace: production
  backstage.io/kubernetes-label-selector: "app=payment-service"

配置后,payment-service 的 Catalog 页面会出现"Kubernetes"Tab,展示所有集群中该服务的 Pod 状态、Deployment 滚动更新进度、最新日志。


TechDocs:文档即代码
#

配置 TechDocs
#

catalog-info.yaml 中添加注解:

annotations:
  backstage.io/techdocs-ref: dir:.

在代码仓库根目录添加 mkdocs.yml

site_name: "支付服务文档"
site_description: "支付服务开发、运维文档"

nav:
  - 首页: index.md
  - 架构设计:
    - 整体架构: architecture/overview.md
    - 数据库设计: architecture/database.md
    - Kafka 消息格式: architecture/kafka-events.md
  - 运维手册:
    - 部署流程: ops/deployment.md
    - 告警处理: ops/alerting.md
    - 故障排查: ops/troubleshooting.md
  - API 文档:
    - REST API: api/rest.md
    - 错误码: api/error-codes.md

plugins:
  - techdocs-core

文档目录结构:

docs/
├── index.md                  # 服务概览
├── architecture/
│   ├── overview.md
│   ├── database.md
│   └── kafka-events.md
├── ops/
│   ├── deployment.md
│   ├── alerting.md
│   └── troubleshooting.md
└── api/
    ├── rest.md
    └── error-codes.md

生产环境 TechDocs 存储
#

开发环境 TechDocs 可以本地构建,生产环境推荐使用 S3 存储预构建的文档:

# app-config.yaml
techdocs:
  builder: "external"
  generator:
    runIn: "local"
  publisher:
    type: "awsS3"
    awsS3:
      bucketName: my-company-techdocs
      region: us-east-1
      accountId: "123456789012"

在 CI 中预构建文档(以 GitHub Actions 为例):

name: Publish TechDocs

on:
  push:
    branches: [main]
    paths:
      - docs/**
      - mkdocs.yml
      - catalog-info.yaml

jobs:
  publish-techdocs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.11"
      - name: Install TechDocs CLI
        run: pip install mkdocs-techdocs-core==1.3.3
      - name: Install @techdocs/cli
        run: npm install -g @techdocs/cli
      - name: Publish TechDocs to S3
        run: |
          techdocs-cli publish \
            --publisher-type awsS3 \
            --storage-name my-company-techdocs \
            --entity default/component/payment-service
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_REGION: us-east-1

自定义 Plugin 开发
#

前端插件:展示自定义信息
#

假设需要在每个服务的 Catalog 页面展示该服务的当前 SLO 状态。

创建插件:

yarn backstage-cli new --select plugin
# 输入插件名:slo-status

生成的插件结构:

plugins/slo-status/
├── src/
│   ├── components/
│   │   └── SloStatusCard/
│   │       ├── SloStatusCard.tsx
│   │       └── index.ts
│   ├── plugin.ts
│   └── index.ts
├── package.json
└── README.md

SloStatusCard.tsx

import React from 'react';
import {
  InfoCard,
  Progress,
  StatusOK,
  StatusError,
  StatusWarning,
} from '@backstage/core-components';
import { useEntity } from '@backstage/plugin-catalog-react';
import { useApi } from '@backstage/core-plugin-api';
import { sloApiRef } from '../../api';

export const SloStatusCard = () => {
  const { entity } = useEntity();
  const sloApi = useApi(sloApiRef);
  const serviceName = entity.metadata.name;

  const { value: sloData, loading, error } = useAsync(
    () => sloApi.getSloStatus(serviceName),
    [serviceName]
  );

  if (loading) return <Progress />;
  if (error) return <div>无法加载 SLO 数据</div>;

  const { availability, errorBudget, status } = sloData!;

  const StatusIcon = status === 'ok' ? StatusOK :
                     status === 'warning' ? StatusWarning : StatusError;

  return (
    <InfoCard title="SLO 状态" subheader={`服务: ${serviceName}`}>
      <Grid container spacing={2}>
        <Grid item xs={6}>
          <Typography variant="h6">可用性</Typography>
          <Typography variant="h4" color={availability >= 99.9 ? 'primary' : 'error'}>
            {availability.toFixed(3)}%
          </Typography>
          <Typography variant="body2" color="textSecondary">
            目标: 99.9%
          </Typography>
        </Grid>
        <Grid item xs={6}>
          <Typography variant="h6">错误预算剩余</Typography>
          <Typography variant="h4">
            {errorBudget.toFixed(1)}%
          </Typography>
          <LinearProgress
            variant="determinate"
            value={errorBudget}
            color={errorBudget > 50 ? 'primary' : errorBudget > 20 ? 'secondary' : 'error'}
          />
        </Grid>
        <Grid item xs={12}>
          <Chip
            icon={<StatusIcon />}
            label={status === 'ok' ? '正常' : status === 'warning' ? '警告' : '违规'}
            color={status === 'ok' ? 'primary' : 'default'}
          />
        </Grid>
      </Grid>
    </InfoCard>
  );
};

注册插件到 Catalog 实体页面:

// packages/app/src/components/catalog/EntityPage.tsx
import { SloStatusCard } from '@internal/plugin-slo-status';

const serviceEntityPage = (
  <EntityLayout>
    <EntityLayout.Route path="/" title="概览">
      <Grid container spacing={3}>
        <Grid item xs={12} md={6}>
          <EntityAboutCard variant="gridItem" />
        </Grid>
        <Grid item xs={12} md={6}>
          <SloStatusCard />  {/* 添加自定义卡片 */}
        </Grid>
        <Grid item xs={12}>
          <EntityHasSystemsCard variant="gridItem" />
        </Grid>
      </Grid>
    </EntityLayout.Route>
    {/* 其他 Tab... */}
  </EntityLayout>
);

后端插件:自定义 API
#

前端插件调用的 SLO 数据需要后端插件提供 API:

yarn backstage-cli new --select backend-plugin
# 输入插件名:slo-status-backend

router.ts

import { Router } from 'express';
import { CatalogClient } from '@backstage/catalog-client';
import { PrometheusClient } from './prometheus';

export async function createRouter(options: RouterOptions): Promise<Router> {
  const router = Router();
  const prometheus = new PrometheusClient(options.config);

  router.get('/slo/:serviceName', async (req, res) => {
    const { serviceName } = req.params;

    try {
      // 从 Prometheus 查询 SLO 数据
      const availability = await prometheus.query(
        `avg_over_time(
          (1 - rate(http_requests_total{service="${serviceName}",status=~"5.."}[5m])
          / rate(http_requests_total{service="${serviceName}"}[5m]))[30d:5m]
        ) * 100`
      );

      const errorBudgetUsed = await prometheus.query(
        `(1 - avg_over_time(
          (1 - rate(http_requests_total{service="${serviceName}",status=~"5.."}[5m])
          / rate(http_requests_total{service="${serviceName}"}[5m]))[30d:5m]
        )) / 0.001 * 100`
      );

      const avail = parseFloat(availability);
      const budgetRemaining = 100 - parseFloat(errorBudgetUsed);

      res.json({
        availability: avail,
        errorBudget: budgetRemaining,
        status: avail >= 99.9 ? 'ok' : avail >= 99.0 ? 'warning' : 'error',
      });
    } catch (error) {
      res.status(500).json({ error: 'Failed to fetch SLO data' });
    }
  });

  return router;
}

与现有工具集成
#

ArgoCD 集成
#

安装 ArgoCD 插件后,Catalog 页面会显示 ArgoCD 应用的同步状态、健康状态、最近部署历史。

yarn --cwd packages/app add @roadiehq/backstage-plugin-argo-cd
yarn --cwd packages/backend add @roadiehq/backstage-plugin-argo-cd-backend

app-config.yaml 配置:

argocd:
  baseUrl: https://argocd.company.com
  token: ${ARGOCD_TOKEN}
  waitCycles: 25
  appLocatorMethods:
    - type: 'config'
      instances:
        - name: argocd
          url: https://argocd.company.com
          token: ${ARGOCD_TOKEN}

catalog-info.yaml 关联 ArgoCD 应用:

annotations:
  argocd/app-name: "payment-service-prod"
  # 多应用(多集群部署)
  argocd/app-name: "payment-service-prod,payment-service-staging"

Grafana 集成
#

yarn --cwd packages/app add @k-phoen/backstage-plugin-grafana
# app-config.yaml
grafana:
  domain: https://grafana.company.com
  unifiedAlerting: true
# catalog-info.yaml
annotations:
  grafana/dashboard-selector: "service=payment-service"
  grafana/alert-label-selector: "service=payment-service"

Slack 集成
#

让 Backstage 知道每个服务对应哪个 Slack 频道,开发者可以直接跳转:

# catalog-info.yaml
metadata:
  links:
    - url: https://my-company.slack.com/channels/payment-team
      title: Slack 频道
      icon: chat

或通过 PagerDuty 插件,直接在 Backstage 触发告警或查看 on-call 排班:

annotations:
  pagerduty.com/service-id: "PXXXXXX"
  pagerduty.com/integration-key: ${PAGERDUTY_INTEGRATION_KEY}

推广与落地
#

如何说服团队使用
#

直接说"用 Backstage 吧"很难推动。更有效的方式是从痛点入手:

找准第一个高价值场景。 对于工程基础建设薄弱的团队,通常最痛的是新服务创建——一个后端服务从立项到第一次生产部署,可能需要在 10 个地方配置,花 1-2 天。做一个好用的 Scaffolder 模板,让这个过程缩短到 10 分钟,这就是立竿见影的价值。

让 Catalog 先成为"黄页"。 不要一开始就追求大而全,先把所有服务的基本信息(owner、关联仓库、Grafana 链接)录入进去,让大家养成"找服务信息就上 Backstage 查"的习惯。

运维团队先用起来。 给 on-call 工程师配置 K8s 插件和 PagerDuty 集成,让他们处理告警的时候能直接在 Backstage 看 Pod 状态,减少切换工具的摩擦。

如何衡量 IDP 价值
#

量化价值是持续获得资源投入的前提:

开发者体验指标(通过季度问卷):

  • “找到我不熟悉的服务信息需要多长时间?”
  • “创建一个新服务需要多长时间?”
  • “你对 Backstage 的满意度(1-10分)”

客观指标:

新服务创建时间 = 从 Scaffolder 提交到第一次生产部署的时长
服务信息完整率 = 有完整 catalog-info.yaml 的服务数 / 总服务数
文档新鲜度 = 过去 3 个月内有更新的文档 / 总文档数
Catalog 月活 = 每月使用 Backstage 的独立用户数

典型成功案例:

  • 某团队引入 Scaffolder 后,新服务 onboarding 时间从 2 天 → 20 分钟
  • Catalog 上线后,“这个服务谁负责?“类型的 Slack 问题减少了 80%
  • TechDocs 统一后,服务文档覆盖率从 30% 提升到 85%

持续运营
#

Backstage 不是部署完就一劳永逸的工具。需要有专人(平台工程师或基础设施工程师)持续维护:

  • 定期检查 Catalog 数据质量(过期、orphan 实体清理)
  • 跟进 Backstage 版本升级(官方每 2 周发一个小版本)
  • 收集开发者反馈,持续迭代插件和模板
  • 建立 Backstage 使用文档和培训材料

一个健康的 IDP 应该是团队工程文化的一部分,而不是强制使用的工具。当开发者发现"在 Backstage 上做事比绕过它更方便"的时候,推广就成功了。

Wenzhuo Huang
作者
Wenzhuo Huang
搞运维的工程师,写代码的运维人。专注 Kubernetes、AWS、GitOps 与基础设施可靠性。这个博客既是我的技术笔记本,也是我踩过的坑的受害者档案。
68 - 这篇文章属于一个选集。

相关文章

DORA 指标与平台工程效能度量:用数据驱动 DevOps 改进

·747 字·4 分钟
DORA 四个指标不是考核工具,是诊断工具。从 CI/CD 流水线和 Incident 系统采集数据,找到部署频率低、前置时间长的真实原因,然后用平台工程手段系统性改进。本文给出采集方案、Grafana 看板设计和常见误用陷阱。

平台工程实践:构建 Internal Developer Platform

·1055 字·5 分钟
平台工程不是给 DevOps 换个名字,而是把基础设施能力产品化——让开发者像用 SaaS 一样消费平台能力。这篇文章记录我们团队从 0 到 MVP 的六个月实践,包括 Backstage 落地、黄金路径设计、以及用 DORA 指标验证平台价值。