vllm docker-compose 运行LLM-Research/Mistral-7B-Instruct-v0.3
下载模型
modelscope download --model LLM-Research/Mistral-7B-Instruct-v0.3 --cache_dir e:\ai\vllm\models\Research
运行,运行本地模型,设置环境变量
HF_HUB_OFFLINE: 1
services: vllm-Research:container_name: vllm-Researchrestart: noimage: vllm/vllm-openai:latestruntime: nvidiaipc: host environment:TZ: Asia/Shanghai HF_HUB_OFFLINE: 1CUDA_VISIBLE_DEVICES: 0volumes: - e:\ai\vllm\models\Research:/modelscommand: ["--model", "/models/LLM-Research/Mistral-7B-Instruct-v0___3","--served_model_name", "LLM-Research/Mistral-7B-Instruct-v0.3","--gpu_memory_utilization", "0.90","--max_model_len", "672 ","--tensor-parallel-size", "1"]ports:- 8002:8000deploy:resources:reservations:devices:- driver: nvidiacapabilities: [ gpu ]count: all