当前位置：首页 > news >正文

vllm启动Qwen/Qwen3-Coder-30B-A3B-Instruct并支持工具调用

news 2025/8/5 13:51:00

阿里云千问团队在2025年8月1号推出了Qwen3-Coder系列的一个比较小参数版本Qwen/Qwen3-Coder-30B-A3B-Instruct 具有较强的性能并且对显存要求没那么高。那么我们应该如何在本地启动并使用这个模型开发Agent呢？

众所周知，Agent相比于LLM的最大的区别就是Agent可以使用各种各样的工具，所以我们想使用Qwen3-Coder-30B-A3B-Instruct开发Agent的前提就是让它支持工具调用。

硬件参数

显卡：H20(96G VRAM)

模型下载

安装modelscope

pip install modelscope

下载模型

modelscope download --model Qwen/Qwen3-Coder-30B-A3B-Instruct

注：模型文件大小在39GB左右，请提供足够的磁盘空间

VLLM模型服务

安装vllm>=0.10.0

推荐使用uv安装

pip install uv
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

VLLM服务配置

model: Qwen/Qwen3-Coder-30B-A3B-Instruct
served_model_name: qwen3-coder
host: 0.0.0.0
port: 8000
tensor-parallel-size: 1
gpu-memory-utilization: 0.9
api-key: your-api-key
disable-fastapi-docs: true
enable-auto-tool-choice: true
tool-call-parser: qwen3_coder
max-model-len: 32768

将这份配置保存为qwen3-coder.yml

启动VLLM模型服务

vllm serve --config vllm-coder.yml

观察日志输出，如果出现和上图一样的日志就代表模型服务启动成功了

工具调用测试

工具调用测试我准备使用openai-agents[litellm]测试

安装openai-agents-python

uv pip install "openai-agents[litellm]"

测试代码

import asyncio
from agents.extensions.models.litellm_model import LitellmModel
from agents import Agent, function_tool, Runnerqwen3_coder = LitellmModel(base_url="http://localhost:8000/v1",  # vllm 服务地址api_key="your-api-key",  # 注意这里只是方便演示，不推荐将api-key直接写到代码中，应该使用环境变量的方式os.getenv("API_KEY")model="openai/qwen3-coder",
)@function_tool
def get_user_info(user_id: str):"""get user info toolArgs:user_id: strreturn user info in dict"""return {"name": "Jack", "age": 18, "id": user_id}agent = Agent(name="Your Agent",instructions="使用工具完成用户任务",model=qwen3_coder,tools=[get_user_info],
)async def main():result = await Runner.run(agent, input="我的用户ID是1234，我是谁？")print(result.final_output)if __name__ == "__main__":asyncio.run(main())

将代码保存成agent.py文件