Karpenter — AliCloud
AliCloud Karpenter uses the karpenter-provider-alibabacloud maintained by CloudPilot AI. The API surface mirrors the AWS provider — the same NodePool CRD is used, but the node template CRD is ECSNodeClass instead of EC2NodeClass.
Bundle source:
stacks/alicloud/karpenter
Differences from AWS
| Concept | AWS | AliCloud |
|---|---|---|
| Node template CRD | EC2NodeClass | ECSNodeClass |
| IAM / identity | IAM Role (Pod Identity) | RAM Role |
| Spot interruption | SQS queue | Cloud Monitor events |
| Subnet selection | Tag-based (subnetSelectorTerms) | VSwitch ID list (vSwitchSelectorTerms) |
| Security groups | Tag-based (securityGroupSelectorTerms) | Security group ID list |
| AMI family | AL2023, Bottlerocket, etc. | AliyunLinux, ContainerOS |
| Instance category label | karpenter.k8s.aws/instance-category | karpenter.k8s.alibabacloud.com/ecs-instance-category |
Infrastructure Components
infra group (runs first via tofu-module):
- Creates a RAM role with ECS, pricing API permissions
- Associates the role to the
karpenterservice account via RRSA (Ram Role for Service Account) - Installs the Karpenter Helm chart with the role ARN injected
kubernetes group (runs after infra):
- Applies a default
ECSNodeClass(node template) - Applies all
NodePooldefinitions inline via the control plane UI
ECSNodeClass
The ECSNodeClass is the AliCloud equivalent of AWS EC2NodeClass. It selects VSwitches and security groups by ID and specifies the RAM role nodes will assume.
apiVersion: karpenter.k8s.alibabacloud.com/v1alpha1
kind: ECSNodeClass
metadata:
name: karpenter-nodeclass-default
spec:
amiFamily: AliyunLinux
systemDisk:
size: 100
category: cloud_essd
encrypted: true
deleteWithInstance: true
vSwitchSelectorTerms:
- tags:
Name: "{{ cluster_name }}-private*" # private VSwitches only
securityGroupSelectorTerms:
- tags:
Name: "{{ cluster_name }}-node"
ramRoleName: "{{ cluster_name }}-karpenter-node-role"
userData: ""
Key points:
- VSwitch and security group selection is tag-based — same pattern as AWS
{{ cluster_name }}is resolved by the cluster agent at apply time- System disk uses cloud_essd (equivalent to gp3) with encryption enabled
- No SQS — AliCloud interruption handling is built into the provider via Cloud Monitor
NodePools
NodePool specs are identical in structure to AWS. Only the instance family labels differ.
airflow-worker
Used by Airflow KubernetesExecutor pods and Spark driver/executor pods.
| Setting | Value |
|---|---|
| Instance family | ecs.g7.large (general compute, equiv. to m5) |
| Architecture | amd64 |
| Capacity | Spot + On-Demand |
| Consolidation | WhenEmpty after 2m |
| Taint | airflow-worker: NoSchedule |
requirements:
- key: karpenter.k8s.alibabacloud.com/ecs-instance-category
operator: In
values: [g]
- key: karpenter.k8s.alibabacloud.com/ecs-instance-family
operator: In
values: [ecs.g7]
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand]
trino-xsmall
Used by Trino coordinator and worker pods.
| Setting | Value |
|---|---|
| Instance family | ecs.r7.large (memory-optimized, equiv. to r8g) |
| Architecture | amd64 |
| Capacity | On-Demand + Spot |
| Consolidation | WhenEmptyOrUnderutilized after 5m |
| Taint | trino-xsmall: NoSchedule |
requirements:
- key: karpenter.k8s.alibabacloud.com/ecs-instance-category
operator: In
values: [r]
- key: karpenter.k8s.alibabacloud.com/ecs-instance-family
operator: In
values: [ecs.r7]
- key: karpenter.sh/capacity-type
operator: In
values: [on-demand, spot]
jupyterhub-small
Used by JupyterHub single-user notebook servers.
| Setting | Value |
|---|---|
| Instance family | ecs.t6.large–2xlarge (burstable, equiv. to t3) |
| Architecture | amd64 |
| Capacity | Spot + On-Demand |
| Consolidation | WhenEmptyOrUnderutilized after 5m |
| Taint | jupyterhub-small: NoSchedule |
requirements:
- key: karpenter.k8s.alibabacloud.com/ecs-instance-category
operator: In
values: [t]
- key: karpenter.k8s.alibabacloud.com/ecs-instance-family
operator: In
values: [ecs.t6]
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand]
Deploying via bundle.yaml
Deployment follows the same stack workflow as AWS. The bundle takes a single required input: the target cluster resource (an ACK cluster).
AliCloud-specific inputs required at deploy time:
| Input | Description |
|---|---|
cluster | Target ACK cluster resource |
region | AliCloud region (e.g. cn-hangzhou) |
vswitch_ids | List of private VSwitch IDs for node placement |