Cluster Proportional Autoscaler | 山山仙人博客

Cluster Proportional Autoscaler(CPA) 是 Kubernetes 集群中的一种自动扩展机制，旨在根据集群中运行的 Pod 数量动态调整特定类型的资源（如 DaemonSet 或 StatefulSet）的副本数。CPA 主要用于确保这些资源能够根据集群的负载情况进行适当的扩展和收缩，从而优化资源利用率和性能。

工作原理

CPA 通过监控集群中的节点数量或可调度的核心数，根据预定义的线性或阶梯式配置来调整目标资源的副本数。主要特点：

线性模式（Linear）：根据节点数量或核心数按照线性比例计算副本数
阶梯模式（Ladder）：根据不同的阈值范围设置不同的副本数
适用场景：适合 CoreDNS、Metrics Server 等系统组件的自动扩缩容

Helm 安装

添加 Helm 仓库

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler
helm repo update

安装 CPA

helm install cluster-proportional-autoscaler \
  cluster-proportional-autoscaler/cluster-proportional-autoscaler \
  --namespace kube-system \
  --create-namespace \
  -f values.yaml

快速安装示例

为 CoreDNS 配置自动扩缩容：

helm install coredns-autoscaler \
  cluster-proportional-autoscaler/cluster-proportional-autoscaler \
  --namespace kube-system \
  --set config.linear.coresPerReplica=256 \
  --set config.linear.nodesPerReplica=16 \
  --set config.linear.min=2 \
  --set config.linear.max=10 \
  --set options.target="deployment/coredns"

配置说明

Values.yaml 配置示例

# 镜像配置
image:
  repository: registry.k8s.io/cpa/cluster-proportional-autoscaler
  tag: v1.8.9
  pullPolicy: IfNotPresent

# 资源限制
resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 20m
    memory: 32Mi

# 配置模式：linear（线性）或 ladder（阶梯）
config:
  # 线性模式配置
  linear:
    coresPerReplica: 256      # 每个副本对应的 CPU 核心数
    nodesPerReplica: 16       # 每个副本对应的节点数
    min: 2                    # 最小副本数
    max: 10                   # 最大副本数
    preventSinglePointFailure: true  # 防止单点故障，至少保持 2 个副本
    includeUnschedulableNodes: true  # 是否包含不可调度的节点

# CPA 运行选项
options:
  namespace: kube-system      # 目标资源所在的命名空间
  target: "deployment/coredns"  # 目标资源（deployment/xxx 或 replicationcontroller/xxx）
  pollPeriodSeconds: 10       # 轮询周期（秒）
  logtostderr: true          # 日志输出到 stderr
  v: 2                       # 日志级别（0-4）

# ServiceAccount 配置
serviceAccount:
  create: true
  name: cluster-proportional-autoscaler
  annotations: { }

# 节点选择器和容忍度
nodeSelector: { }
tolerations: [ ]
affinity: { }

阶梯模式配置

config:
  ladder:
    nodesToReplicas:
      - [ 1, 1 ]      # 1 个节点时，1 个副本
      - [ 2, 2 ]      # 2-4 个节点时，2 个副本
      - [ 5, 3 ]      # 5-9 个节点时，3 个副本
      - [ 10, 5 ]     # 10+ 个节点时，5 个副本
    coresToReplicas:
      - [ 1, 1 ]
      - [ 64, 3 ]
      - [ 128, 5 ]
      - [ 256, 7 ]

参数说明

核心参数

参数	说明	默认值	示例
`options.target`	目标资源类型和名称	必填	`deployment/coredns`
`options.namespace`	目标资源所在命名空间	`default`	`kube-system`
`options.pollPeriodSeconds`	轮询检查周期（秒）	`10`	`30`

线性模式参数

参数	说明	计算公式
`config.linear.coresPerReplica`	每个副本对应的 CPU 核心数	`replicas = max(ceil(cores / coresPerReplica), ceil(nodes / nodesPerReplica))`
`config.linear.nodesPerReplica`	每个副本对应的节点数	同上
`config.linear.min`	最小副本数	保证最小副本数
`config.linear.max`	最大副本数	限制最大副本数
`config.linear.preventSinglePointFailure`	防止单点故障	当计算结果为 1 时，强制设为 2
`config.linear.includeUnschedulableNodes`	是否计入不可调度节点	默认 `true`

阶梯模式参数

参数	说明	示例
`config.ladder.nodesToReplicas`	节点数到副本数的映射	`[[1,1], [5,3], [10,5]]`
`config.ladder.coresToReplicas`	CPU 核心数到副本数的映射	`[[1,1], [128,5], [256,7]]`

阶梯模式说明：

每个映射项 [threshold, replicas] 表示当资源数量 >= threshold 时，副本数为 replicas
如果同时配置了 nodesToReplicas 和 coresToReplicas，取两者计算结果的最大值

使用示例

示例 1：为 CoreDNS 配置自动扩缩容

# coredns-autoscaler-values.yaml
config:
  linear:
    coresPerReplica: 256
    nodesPerReplica: 16
    min: 2
    max: 20
    preventSinglePointFailure: true

options:
  namespace: kube-system
  target: "deployment/coredns"
  pollPeriodSeconds: 10

resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 20m
    memory: 32Mi

安装：

helm install coredns-autoscaler \
  cluster-proportional-autoscaler/cluster-proportional-autoscaler \
  -f coredns-autoscaler-values.yaml \
  --namespace kube-system

示例 2：使用阶梯模式

# metrics-server-autoscaler-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [ 1, 1 ]
      - [ 5, 2 ]
      - [ 20, 3 ]
      - [ 50, 5 ]
      - [ 100, 10 ]

options:
  namespace: kube-system
  target: "deployment/metrics-server"
  pollPeriodSeconds: 30

示例 3：多条件配置

config:
  linear:
    coresPerReplica: 128
    nodesPerReplica: 8
    min: 3
    max: 50
    preventSinglePointFailure: true
    includeUnschedulableNodes: false  # 不计入不可调度节点

options:
  namespace: kube-system
  target: "deployment/kube-state-metrics"
  pollPeriodSeconds: 15
  v: 2

计算示例

线性模式计算示例

示例 1：标准线性计算

集群状态：

节点数：32 个
CPU 总核心数：512 核

CPA 配置：

config:
  linear:
    coresPerReplica: 256
    nodesPerReplica: 16
    min: 2
    max: 10

计算过程：

步骤 1 - 基于核心数计算：
  ceil(512 / 256) = ceil(2) = 2

步骤 2 - 基于节点数计算：
  ceil(32 / 16) = ceil(2) = 2

步骤 3 - 取两者最大值：
  max(2, 2) = 2

步骤 4 - 应用 min/max 限制：
  max(min(2, 10), 2) = 2

最终副本数：2

示例 2：触发最小值限制

集群状态：

节点数：3 个
CPU 总核心数：48 核

CPA 配置：

config:
  linear:
    coresPerReplica: 256
    nodesPerReplica: 16
    min: 5
    max: 20

计算过程：

步骤 1 - 基于核心数计算：
  ceil(48 / 256) = ceil(0.1875) = 1

步骤 2 - 基于节点数计算：
  ceil(3 / 16) = ceil(0.1875) = 1

步骤 3 - 取两者最大值：
  max(1, 1) = 1

步骤 4 - 应用 min/max 限制：
  max(min(1, 20), 5) = max(1, 5) = 5

最终副本数：5（触发最小值限制）

示例 3：触发最大值限制

集群状态：

节点数：200 个
CPU 总核心数：3200 核

CPA 配置：

config:
  linear:
    coresPerReplica: 128
    nodesPerReplica: 8
    min: 2
    max: 15
    preventSinglePointFailure: false

计算过程：

步骤 1 - 基于核心数计算：
  ceil(3200 / 128) = ceil(25) = 25

步骤 2 - 基于节点数计算：
  ceil(200 / 8) = ceil(25) = 25

步骤 3 - 取两者最大值：
  max(25, 25) = 25

步骤 4 - 应用 min/max 限制：
  max(min(25, 15), 2) = max(15, 2) = 15

最终副本数：15（触发最大值限制）

示例 4：防止单点故障

集群状态：

节点数：2 个
CPU 总核心数：32 核

CPA 配置：

config:
  linear:
    coresPerReplica: 64
    nodesPerReplica: 4
    min: 1
    max: 10
    preventSinglePointFailure: true

计算过程：

步骤 1 - 基于核心数计算：
  ceil(32 / 64) = ceil(0.5) = 1

步骤 2 - 基于节点数计算：
  ceil(2 / 4) = ceil(0.5) = 1

步骤 3 - 取两者最大值：
  max(1, 1) = 1

步骤 4 - 应用防单点故障规则：
  因为 preventSinglePointFailure = true 且结果为 1
  强制设为 2

步骤 5 - 应用 min/max 限制：
  max(min(2, 10), 1) = 2

最终副本数：2（preventSinglePointFailure 触发）

阶梯模式计算示例

示例 5：基于节点数的阶梯配置

集群状态：

节点数：35 个
CPU 总核心数：560 核

CPA 配置：

config:
  ladder:
    nodesToReplicas:
      - [ 1, 1 ]      # 1-4 个节点 → 1 个副本
      - [ 5, 2 ]      # 5-19 个节点 → 2 个副本
      - [ 20, 3 ]     # 20-49 个节点 → 3 个副本
      - [ 50, 5 ]     # 50-99 个节点 → 5 个副本
      - [ 100, 10 ]   # 100+ 个节点 → 10 个副本

计算过程：

步骤 1 - 查找节点数对应的阶梯：
  当前节点数：35
  查找规则：从上到下找到第一个 threshold <= 35 的最大值

  [1, 1]   → 1 <= 35 ✓
  [5, 2]   → 5 <= 35 ✓
  [20, 3]  → 20 <= 35 ✓
  [50, 5]  → 50 <= 35 ✗
  [100, 10] → 100 <= 35 ✗

  选择 [20, 3]（最后一个满足条件的）

最终副本数：3

示例 6：基于 CPU 核心数的阶梯配置

集群状态：

节点数：10 个
CPU 总核心数：160 核

CPA 配置：

config:
  ladder:
    coresToReplicas:
      - [ 1, 1 ]      # 1-63 核 → 1 个副本
      - [ 64, 2 ]     # 64-127 核 → 2 个副本
      - [ 128, 4 ]    # 128-255 核 → 4 个副本
      - [ 256, 8 ]    # 256-511 核 → 8 个副本
      - [ 512, 16 ]   # 512+ 核 → 16 个副本

计算过程：

步骤 1 - 查找核心数对应的阶梯：
  当前核心数：160
  查找规则：从上到下找到第一个 threshold <= 160 的最大值

  [1, 1]    → 1 <= 160 ✓
  [64, 2]   → 64 <= 160 ✓
  [128, 4]  → 128 <= 160 ✓
  [256, 8]  → 256 <= 160 ✗
  [512, 16] → 512 <= 160 ✗

  选择 [128, 4]（最后一个满足条件的）

最终副本数：4

示例 7：同时配置节点数和核心数阶梯

集群状态：

节点数：15 个
CPU 总核心数：240 核

CPA 配置：

config:
  ladder:
    nodesToReplicas:
      - [ 1, 1 ]
      - [ 10, 3 ]
      - [ 20, 5 ]
      - [ 50, 10 ]
    coresToReplicas:
      - [ 1, 1 ]
      - [ 128, 4 ]
      - [ 256, 8 ]
      - [ 512, 16 ]

计算过程：

步骤 1 - 基于节点数计算：
  当前节点数：15
  查找阶梯：
    [1, 1]  → 1 <= 15 ✓
    [10, 3] → 10 <= 15 ✓
    [20, 5] → 20 <= 15 ✗
    [50, 10] → 50 <= 15 ✗
  结果：3

步骤 2 - 基于核心数计算：
  当前核心数：240
  查找阶梯：
    [1, 1]   → 1 <= 240 ✓
    [128, 4] → 128 <= 240 ✓
    [256, 8] → 256 <= 240 ✗
    [512, 16] → 512 <= 240 ✗
  结果：4

步骤 3 - 取两者最大值：
  max(3, 4) = 4

最终副本数：4

混合场景计算对比

集群规模	节点数	CPU 核心数	线性模式结果	阶梯模式结果	说明
小集群	5	80	2	1	线性模式更激进
中集群	30	480	4	3	两者接近
大集群	100	1600	13	10	受 max 限制
超大集群	500	8000	20 (max)	16	线性模式触顶

配置参考：

线性模式：coresPerReplica=128, nodesPerReplica=8, min=2, max=20
阶梯模式：[[1,1], [10,2], [50,5], [100,10], [200,16]]

监控与验证

查看 CPA 日志

kubectl logs -n kube-system -l app=cluster-proportional-autoscaler -f

查看目标资源副本数

kubectl get deployment coredns -n kube-system

验证自动扩缩容

模拟节点增加：

# 扩容集群节点后观察副本数变化
kubectl get deployment coredns -n kube-system -w

最佳实践

合理设置 min 和 max：避免副本数过少导致服务不可用，或过多浪费资源
启用 preventSinglePointFailure：对于关键服务建议启用，确保至少 2 个副本
选择合适的轮询周期：根据集群规模调整 pollPeriodSeconds，大集群可适当增加
监控资源使用：定期检查目标服务的资源使用情况，调整 CPA 配置
配合 HPA 使用：CPA 适合根据集群规模扩容，HPA 适合根据负载扩容，两者可以互补

卸载

helm uninstall coredns-autoscaler -n kube-system

工作原理​

Helm 安装​

添加 Helm 仓库​

安装 CPA​

快速安装示例​

配置说明​

Values.yaml 配置示例​

阶梯模式配置​

参数说明​

核心参数​

线性模式参数​

阶梯模式参数​

使用示例​

示例 1：为 CoreDNS 配置自动扩缩容​

示例 2：使用阶梯模式​

示例 3：多条件配置​

计算示例​

线性模式计算示例​

示例 1：标准线性计算​

示例 2：触发最小值限制​

示例 3：触发最大值限制​

示例 4：防止单点故障​

阶梯模式计算示例​

示例 5：基于节点数的阶梯配置​

示例 6：基于 CPU 核心数的阶梯配置​

示例 7：同时配置节点数和核心数阶梯​

混合场景计算对比​

监控与验证​

查看 CPA 日志​

查看目标资源副本数​

验证自动扩缩容​

最佳实践​

卸载​

参考资源​

工作原理

Helm 安装

添加 Helm 仓库

安装 CPA

快速安装示例

配置说明

Values.yaml 配置示例

阶梯模式配置

参数说明

核心参数

线性模式参数

阶梯模式参数

使用示例

示例 1：为 CoreDNS 配置自动扩缩容

示例 2：使用阶梯模式

示例 3：多条件配置

计算示例

线性模式计算示例

示例 1：标准线性计算

示例 2：触发最小值限制

示例 3：触发最大值限制

示例 4：防止单点故障

阶梯模式计算示例

示例 5：基于节点数的阶梯配置

示例 6：基于 CPU 核心数的阶梯配置

示例 7：同时配置节点数和核心数阶梯

混合场景计算对比

监控与验证

查看 CPA 日志

查看目标资源副本数

验证自动扩缩容

最佳实践

卸载

参考资源