当前位置：首页 > backend >正文

OpenShift AI - 用 ModelCar 构建容器化模型，提升模型弹性扩展速度

backend 2025/8/24 8:07:53

《OpenShift / RHEL / DevSecOps 汇总目录》
说明：本文已经在 OpenShift 4.18 + OpenShift AI 2.19 的环境中验证

文章目录

什么是 ModelCar
构建模型镜像
在 OpenShift AI 使用模型镜像
- 部署模型
- 扩展速度对比
参考

什么是 ModelCar

KServe 典型的模型初始化方法是从 S3 Bucket 获取模型。由于每次初始化都要下载模型文件，这一过程对于小型模型来说是可行的，但对于大型模型来说就成了性能瓶颈，因为在自动扩展的过程中会大大延迟启动时间。

ModelCar 是 KServe 为应对这一问题的方案，它具备以下突出优势：

模型文件已放在容器镜像中，当镜像在节点中已被缓存的时候，可避免重复下载模型文件，从而可显著减少模型启动的延迟。
因为在节点上运行相同模型的 pod 将访问同一镜像，无需在每个 pod 中下载模型数据，因此可减少本地磁盘空间的使用。

构建模型镜像

创建下载模型的文件 download_model.py。

$ cat > download_model.py << EOF
from huggingface_hub import snapshot_download# Specify the Hugging Face repository containing the model
model_repo = "Qwen/Qwen2.5-0.5B-Instruct"
snapshot_download(repo_id=model_repo,local_dir="/models",allow_patterns=["*.safetensors", "*.json", "*.txt"],
)
EOF

创建构建镜像的文件 Containerfile。

$ cat > Containerfile << EOF
FROM registry.access.redhat.com/ubi9/python-311:latest as baseUSER rootRUN pip install huggingface-hub# Download the model file from hugging face
COPY download_model.py .RUN python download_model.py # Final image containing only the essential model files
FROM registry.access.redhat.com/ubi9/ubi-micro:9.4# Copy the model files from the base container
COPY --from=base /models /modelsUSER 1001
EOF

构建包含模型的镜像。

podman build . -t modelcar-example:latest --platform linux/amd64

将镜像推送到 Registry。

$ podman images localhost/modelcar-example
REPOSITORY                  TAG         IMAGE ID      CREATED         SIZE
localhost/modelcar-example  latest      ae4aac72bb2c  59 minutes ago  1.02 GB$ podman push localhost/modelcar-example quay.io/your-registry/modelcar-example:latest

在 OpenShift AI 使用模型镜像

部署模型

按下图创建一个使用镜像作为源的 connection。
使用以上 connection 部署模型。将 Deployment mode 设为 Advanced，即使用 Serverless 运行模型；Number of model server replicas to deploy 设为 0，即初始运行副本数为零。

扩展速度对比

结合《OpenShift AI - 在 OpenShift 和 OpenShift AI 上运行 LLM》中基于 S3 的模型部署模式，在同一环境中对 ModelCar 和 S3 方式运行的 ibm-granite/granite-3.2-2b-instruct 模型进行同时扩展。测试结果：

ModelCar 模式的扩展时间：1分12秒，明显快。
S3 模式的扩展时间：2分22秒。

参考

https://developers.redhat.com/articles/2025/01/30/build-and-deploy-modelcar-container-openshift-ai#modelcar_containers_pros_and_cons
https://github.com/redhat-ai-services/modelcar-catalog
https://opendatahub.io/docs/serving-models/
https://github.com/redhat-ai-services/modelcar-catalog/tree/main/modelcar-images/qwen2.5-0.5b-instruc

查看全文

http://www.xdnf.cn/news/6498.html