vLLM加载lora
下载Huggingface模型
安装包
pip install huggingface_hub -i https://pypi.tuna.tsinghua.edu.cn/simple
下载
from huggingface_hub import snapshot_downloadsql_lora_path = snapshot_download(repo_id="Djs07/qwen2.5-1.5b-lora")
会放在~/.cache/huggingface/hub/ 目录下
启动服务
先把lora模型拷贝到当前目录再执行
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --enable-lora --lora-modules Qwen-Lora=models--Djs07--qwen2.5-1.5b-lora/snap
shots/8d7d20b1cbb95e7de29abe404e900c106fa8c8cb/
测试
模型改为上面设置的名字
curl http://172.17.0.3:10000/v1/completions -H "Content-Type: application/json" -d '{ "model": "Qwen-Lora", "prompt": "San Francisco is a", "max_tokens": 7, "temperature": 0 }'