如何让DeepSeek-R1-Distill-Qwen-32B支持Function calling
描述:
deep seek-r1是不支持funciton calling调用,如果想让离线版本支持function calling,那怎么弄呢。
现象:
vllm:0.9.1
2张A6000卡
通过vllm0.9.1发布DeepSeek-R1-Distill-Qwen-32B大语言模型。
vllm serve "/mnt/data/models/DeepSeek-R1-Distill-Qwen-32B" \--host "0.0.0.0" \--port 9400 \--gpu-memory-utilization 0.9 \--max-model-len 8192 \ --served-model-name "qwen32b" \--tensor-parallel-size 2
再通过下面代码调用,发现没有返回function call
from openai import OpenAI
import json
import re
base_url = "http://127.0.0.1:9400/v1"
client = OpenAI(api_key="EMPTY", base_url=base_url)def get_weather(location: str, unit: str):return f"{location}的气温是27 {unit},晴天。"
tools = [{"type": "function","function": {"name": "get_weather","description": "获取给定位置的当前天气","parameters": {"type": "object","properties": {"location": {"type": "string", "description": "城市和州,例如“北京市海淀区”"},"unit": {"type": "string", "enum": ["摄氏度", "华氏度"]}},"required": ["location", "unit"]}}
}]response = client.chat.completions.create(model=client.models.list().data[0].id,messages=[{"role": "user", "content": "北京市房山区的天气怎么样?"}],tools=tools,tool_choice="auto"
)content=response.choices[0].message.content# 使用正则匹配 ```json ... ``` 中的内容
match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
if match:json_str = match.group(1)try:tool_call = json.loads(json_str)func_name = tool_call["name"]args = tool_call["arguments"]# 假设我们有所有可用函数的一个字典available_functions = {"get_weather": get_weather}if func_name in available_functions:# 调用函数并获取结果response = available_functions[func_name](**args)print("调用成功,返回结果:", response)else:print(f"错误:找不到名为 {func_name} 的函数")print(content)except json.JSONDecodeError:print("错误:无法解析 JSON 内容")except KeyError as e:print(f"缺少必要字段:{e}")
else:print("未找到 JSON 内容")
原因
观察大语言模型的prompt,发现没有出现tools,是因为提示词中没有工具相关的内容,如下:
prompt: '<|begin▁of▁sentence|><|User|>北京市房山区的天气怎么样?<|Assistant|><think>\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8180, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
解决方案:
编制chat-template模板,增加核心内容,如下:
{%- if tools %}{{- '<|im_start|>system\n' }}{%- if messages[0].role == 'system' %}{{- messages[0].content + '\n\n' }}{%- endif %}{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}{%- for tool in tools %}{{- "\n" }}{{- tool | tojson }}{%- endfor %}{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}{%- if messages[0].role == 'system' %}{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}{%- endif %}
{%- endif %}
再执行如下命令:
vllm serve "/mnt/data/models/DeepSeek-R1-Distill-Qwen-32B" \--host "0.0.0.0" \--port 9400 \--gpu-memory-utilization 0.9 \--served-model-name "qwen32b" \--max-model-len 8192 \--enable-auto-tool-choice \--tool-call-parser "hermes" \--chat-template "/mnt/data/models/qwen32_nonthinking.jinja" \--chat-template-content-format "auto" \--tensor-parallel-size 2
--tool-call-parser "hermes"
说明: 设置工具调用解析器类型。
hermes 是一种特定格式的解析器,适用于某些模型输出的结构化工具调用格式(如 Hermes 系列模型)。如果你的服务要返回类似:
{"name": "get_weather", "arguments": {"location": "..."}}
参数 | 含义 |
---|---|
模型路径 | 要加载的模型文件夹路径 |
--host | 服务绑定的 IP 地址 |
--port | 服务监听的端口 |
--gpu-memory-utilization | GPU 显存利用率上限 |
--served-model-name | 提供给客户端使用的模型名称 |
--max-model-len | 最大上下文长度(token 数) |
--enable-auto-tool-choice | 启用自动工具调用功能 |
--tool-call-parser | 工具调用结果的解析器类型 |
--chat-template | 自定义 prompt 模板文件 |
--chat-template-content-format | 模板内容格式处理方式 |
--tensor-parallel-size | 使用多少张 GPU 并行推理 |
观察大语言模型的prompt,发现出现tools,如下:
prompt: '<|im_start|>system\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{"type": "function", "function": {"name": "get_weather", "description": "获取给定位置的当前天气", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "城市和州,例如“北京市海淀区”"}, "unit": {"type": "string", "enum": ["摄氏度", "华氏度"]}}, "required": ["location", "unit"]}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call><|im_end|>\n<|im_start|>user\n北京市房山区的天气怎么样?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7972, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None.
结果如下: