Volcano 进阶实战 (三) - (多集群 / 离线混部)调度
承接前文继续开始进阶调度策略:
Volcano 实战快速入门 (一)-CSDN博客文章浏览阅读868次,点赞18次,收藏18次。本文介绍了当前 Kubernetes 在大语言模型应用场景中的资源调度和管理面临的现状和挑战。简单介绍了 Volcano 的核心概念和组件功能。并提供了一些基础的任务示例https://blog.csdn.net/weixin_39403185/article/details/147494858?spm=1011.2124.3001.6209
四、Volcano 进阶功能实战
4.1 网络拓扑感知调度
Volcano 进阶实战 (二) - (网络拓扑/负载感知)调度-CSDN博客文章浏览阅读188次,点赞2次,收藏2次。本篇详细介绍网络拓扑感知调度和负载感知重新调度策略。并利用 8 台节点的 Kubernetes 的环境模拟。网络拓扑结构调度。模拟高负载重新调度https://blog.csdn.net/weixin_39403185/article/details/147523903?spm=1011.2124.3001.6209
4.2 负载感知调度
Volcano 进阶实战 (二) - (网络拓扑/负载感知)调度-CSDN博客文章浏览阅读189次,点赞2次,收藏2次。本篇详细介绍网络拓扑感知调度和负载感知重新调度策略。并利用 8 台节点的 Kubernetes 的环境模拟。网络拓扑结构调度。模拟高负载重新调度https://blog.csdn.net/weixin_39403185/article/details/147523903?spm=1011.2124.3001.6209
4.3 多集群调度
a) 准备节点9 台 ECS 2C 8Gi
请参考: ECS节点申请流程
Qwen2.5 7B 极简微调训练_qwen 2.5 7b训练-CSDN博客文章浏览阅读290次,点赞4次,收藏6次。实现 qwen 2.5 7b 模型微调实验,并打包好模型最后发布到 huggingface_qwen 2.5 7b训练https://blog.csdn.net/weixin_39403185/article/details/147115232?spm=1001.2014.3001.5501
b) 三k8s集群 & karmada 基础环境
请参考:
karmada 多集群管理(篇幅太长,稍后我会补充这块)
c) volcano global 部署
karmada-apiserver.config的路径来源于安装karmada的配置
这里我采用了 kubecm来管理上下文
- 所有的集群安装volcano
kubectl --context kubernetes-admin@kubernetes apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
kubectl --context kubernetes-admin@kubernetes-02 apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
kubectl --context kubernetes-admin@kubernetes-03 apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
- 部署Kubernetes Reflector。配置好secret
kubectl --context kubernetes-admin@kubernetes -n kube-system apply -f https://github.com/emberstack/kubernetes-reflector/releases/download/v7.1.262/reflector.yaml## 创建 secret armada-webhook-config
kubectl --context kubernetes-admin@kubernetes create secret generic karmada-webhook-config \--from-file=kubeconfig=/root/karmada-space/data/karmada-apiserver.config \--namespace=karmada-system## 观察 secret armada-webhook-config
kubectl --context kubernetes-admin@kubernetes get secret karmada-webhook-config -n karmada-system## 创建共享 secret
kubectl --context kubernetes-admin@kubernetes annotate secret/karmada-webhook-config \reflector.v1.k8s.emberstack.com/reflection-allowed="true" \reflector.v1.k8s.emberstack.com/reflection-auto-namespaces="volcano-global" \reflector.v1.k8s.emberstack.com/reflection-auto-enabled="true" \--namespace=karmada-system
- 部署volcano-global核心组件
git clone https://github.com/volcano-sh/volcano-global.git
cd volcano-global
## 创建命名空间
# volcano-global-admission-init
# volcano-global-controller-manager
# volcano-global-webhook-manager
kubectl --context karmada-apiserver apply -f docs/deploy/volcano-global-namespace.yaml## 部署
kubectl --context kubernetes-admin@kubernetes apply -f docs/deploy/volcano-global-namespace.yaml
kubectl --context kubernetes-admin@kubernetes apply -f docs/deploy/volcano-global-controller-manager.yaml
kubectl --context kubernetes-admin@kubernetes apply -f docs/deploy/volcano-global-webhook-manager.yaml# Apply the webhook configuration.
kubectl --context karmada-apiserver apply -f docs/deploy/volcano-global-webhooks.yaml
注意 注意 有坑:
## 配置成Nerver 拉进镜像会失败imagePullPolicy: IfNotPresent
- 部署volcano-global CRD
# crd batch.volcano.sh_jobs
# crd scheduling.volcano.sh_queues
# crd bus.volcano.sh_commands
kubectl --context karmada-apiserver apply -f https://github.com/volcano-sh/volcano/raw/release-1.10/installer/helm/chart/volcano/crd/bases/batch.volcano.sh_jobs.yaml
kubectl --context karmada-apiserver apply -f https://github.com/volcano-sh/volcano/raw/release-1.10/installer/helm/chart/volcano/crd/bases/scheduling.volcano.sh_queues.yaml# job
# queue
kubectl --context karmada-apiserver apply -f docs/deploy/vcjob-resource-interpreter-customization.yaml
kubectl --context karmada-apiserver apply -f docs/deploy/queue-resource-interpreter-customization.yamlkubectl --context karmada-apiserver apply -f docs/deploy/volcano-global-all-queue-propagation.yaml
kubectl --context karmada-apiserver label clusterpropagationpolicy volcano-global-all-queue-propagation resourcetemplate.karmada.io/deletion-protected=Always
c) job调度测试
cat <<EOF | kubectl --context karmada-apiserver apply -f -
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:name: test
spec:reclaimable: trueweight: 1---
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:name: mindspore-cpu
spec:minAvailable: 2schedulerName: volcanopolicies:- event: PodEvictedaction: RestartJobplugins:ssh: []env: []svc: []maxRetry: 5queue: testtasks:- replicas: 3name: "pod"template:spec:containers:- command: ["/bin/bash", "-c", "python /tmp/lenet.py"]image: lyd911/mindspore-cpu-example:0.2.0imagePullPolicy: IfNotPresentname: mindspore-cpu-jobresources:limits:cpu: "0.25"requests:cpu: "0.25"restartPolicy: OnFailure
---
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:name: mindspore-cpu
spec:resourceSelectors:- apiVersion: batch.volcano.sh/v1alpha1kind: Jobname: mindspore-cpuplacement:replicaScheduling:replicaDivisionPreference: AggregatedreplicaSchedulingType: Divided
EOF
调整调度策略再观察:
cat <<EOF | kubectl --context karmada-apiserver apply -f -
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:name: mindspore-cpu
spec:resourceSelectors:- apiVersion: batch.volcano.sh/v1alpha1kind: Jobname: mindspore-cpuplacement:replicaScheduling:replicaSchedulingType: Duplicated
EOF
可以看到平衡复制模式 让任务 1:1 的全部调度到了所有的集群
4.4 离线混部调度
5节点,2 台 ubuntu 和 3 台 openeuler 系统,目前 volcano 的离线混部技术暂时不支持 ubuntu 和 centos 系统。暂时了解的情况是这样,如果后续更新了那就最好了。
a) 基础环境准备
注意点1:需要配置cgroupDriver: cgroupfs
## kubelet修改
/var/lib/kubelet/config.yaml
cgroupDriver: systemd/cgroupfs## containerd修改
/etc/containerd/config.toml
SystemdCgroup = false
注意点2:ubuntu和 centos 不支持memcg_qos_enable / sched_prio_load_balance_enabled
添加了三个openeuler系统的节点来测试
b) Volcano部署且支持混部策略
## 安装 vaocano
helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm repo update
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace \
--set custom.colocation_enable=true## 安装 agent
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-agent-development.yaml
在openeuler节点上打开混部和超卖开关
kubectl label node openeuler-01 volcano.sh/oversubscription=true
kubectl label node openeuler-01 volcano.sh/colocation=truekubectl label node openeuler-02 volcano.sh/oversubscription=true
kubectl label node openeuler-02 volcano.sh/colocation=truekubectl label node openeuler-03 volcano.sh/oversubscription=true
kubectl label node openeuler-03 volcano.sh/colocation=truekubectl label node openeuler-01 volcano.sh/oversubscription-
kubectl label node openeuler-01 volcano.sh/colocation-
## agent 调度到具备 正确的节点kubectl edit ds -n volcano-system volcano-agentnodeSelector:volcano.sh/oversubscription: "true"
c) CPU Burst 实验
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:name: stress-pod-on-te3
spec:nodeSelector:volcano.sh/oversubscription: "true"containers:- name: stress-containerimage: polinux/stresscommand: ["stress"]args: ["--cpu", "2", "--vm", "1", "--vm-bytes", "1G", "--timeout", "600s"]resources:requests: { cpu: "1500m", memory: "1Gi" }limits: { cpu: "2000m", memory: "1.5Gi" }nodeSelector:volcano.sh/oversubscription: "true"kubernetes.io/hostname: "openeuler-03"
EOF
d) 动态资源超卖实验
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:name: online-demonamespace: default
spec:replicas: 1selector:matchLabels:app: online-demotemplate:metadata:labels:app: online-demoannotations:volcano.sh/qos-level: "HLS"spec:containers:- name: container-1image: polinux/stressimagePullPolicy: IfNotPresentcommand: ["stress", "--cpu", "6"]resources:requests:cpu: 3nodeSelector:volcano.sh/oversubscription: "true"kubernetes.io/hostname: "openeuler-03"
---
apiVersion: apps/v1
kind: Deployment
metadata:name: offline-demonamespace: default
spec:replicas: 1selector:matchLabels:app: offline-demotemplate:metadata:labels:app: offline-demoannotations: volcano.sh/qos-level: "BE"spec:containers:- name: container-1image: nginx:latestresources:requests:kubernetes.io/batch-cpu: 1000kubernetes.io/batch-memory: 9000000000nodeSelector:volcano.sh/oversubscription: "true"kubernetes.io/hostname: "openeuler-03"
EOF
kubectl get nodes openeuler-03 -o json | jq '.status | {capacity: .capacity, allocatable: .allocatable}'watch -d "kubectl get node openeuler-03 -o json | jq '.status | {capacity, allocatable}'"
程序部署过后:
发现可超卖资源已经扣除了。
问题记录:
看日志发现代码里面的路径和实际系统的 cgroup 日志路径并不一致
## 期望路径
/host/sys/fs/cgroup/cpu/kubepods/cpuacct.usage## 实际路径
/sys/fs/cgroup/cpu/kubepods.slice/kubepods-burstable.slice/
主要原因还是因为:cgroupDriver应该配置成cgroupfs。修改了就行了
参考:
云原生混部 | Volcano背景 随着云原生技术的快速发展,越来越多的业务已逐渐迁移到Kubernetes,使用云原生化的方式进行开发维护,极大地简化了应用程序的部署、编https://volcano.sh/zh/docs/colocation/#%E8%87%AA%E5%AE%9A%E4%B9%89%E5%BC%80%E5%8F%91%E8%B6%85%E5%8D%96%E7%AD%96%E7%95%A5多集群AI作业调度 | Volcano背景 随着企业业务的快速增长,单一Kubernetes集群往往无法满足大规模AI训练和推理任务的需求。用户通常需要管理多个Kubernetes
https://volcano.sh/zh/docs/multi_cluster_scheduling/https://github.com/volcano-sh/volcano-global
https://github.com/volcano-sh/volcano-global