当前位置：首页 > news >正文

如何让DeepSeek-R1-Distill-Qwen-32B支持Function calling

news 2025/6/16 14:10:10

描述：

deep seek-r1是不支持funciton calling调用，如果想让离线版本支持function calling，那怎么弄呢。

现象：

vllm:0.9.1

2张A6000卡

通过vllm0.9.1发布DeepSeek-R1-Distill-Qwen-32B大语言模型。

vllm serve "/mnt/data/models/DeepSeek-R1-Distill-Qwen-32B" \--host "0.0.0.0" \--port 9400 \--gpu-memory-utilization 0.9 \--max-model-len 8192 \      --served-model-name "qwen32b" \--tensor-parallel-size 2

再通过下面代码调用，发现没有返回function call

from openai import OpenAI
import json
import re
base_url = "http://127.0.0.1:9400/v1"
client = OpenAI(api_key="EMPTY", base_url=base_url)def get_weather(location: str, unit: str):return f"{location}的气温是27 {unit},晴天。"
tools = [{"type": "function","function": {"name": "get_weather","description": "获取给定位置的当前天气","parameters": {"type": "object","properties": {"location": {"type": "string", "description": "城市和州，例如“北京市海淀区”"},"unit": {"type": "string", "enum": ["摄氏度", "华氏度"]}},"required": ["location", "unit"]}}
}]response = client.chat.completions.create(model=client.models.list().data[0].id,messages=[{"role": "user", "content": "北京市房山区的天气怎么样?"}],tools=tools,tool_choice="auto"
)content=response.choices[0].message.content# 使用正则匹配 ```json ... ``` 中的内容
match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
if match:json_str = match.group(1)try:tool_call = json.loads(json_str)func_name = tool_call["name"]args = tool_call["arguments"]# 假设我们有所有可用函数的一个字典available_functions = {"get_weather": get_weather}if func_name in available_functions:# 调用函数并获取结果response = available_functions[func_name](**args)print("调用成功，返回结果：", response)else:print(f"错误：找不到名为 {func_name} 的函数")print(content)except json.JSONDecodeError:print("错误：无法解析 JSON 内容")except KeyError as e:print(f"缺少必要字段：{e}")
else:print("未找到 JSON 内容")

原因

观察大语言模型的prompt，发现没有出现tools,是因为提示词中没有工具相关的内容，如下：

prompt: '<｜begin▁of▁sentence｜><｜User｜>北京市房山区的天气怎么样?<｜Assistant｜><think>\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=8180, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

解决方案：

编制chat-template模板，增加核心内容，如下：

{%- if tools %}{{- '<|im_start|>system\n' }}{%- if messages[0].role == 'system' %}{{- messages[0].content + '\n\n' }}{%- endif %}{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}{%- for tool in tools %}{{- "\n" }}{{- tool | tojson }}{%- endfor %}{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}{%- if messages[0].role == 'system' %}{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}{%- endif %}
{%- endif %}

再执行如下命令：

vllm serve "/mnt/data/models/DeepSeek-R1-Distill-Qwen-32B" \--host "0.0.0.0" \--port 9400 \--gpu-memory-utilization 0.9 \--served-model-name "qwen32b" \--max-model-len 8192 \--enable-auto-tool-choice \--tool-call-parser "hermes" \--chat-template "/mnt/data/models/qwen32_nonthinking.jinja" \--chat-template-content-format "auto" \--tensor-parallel-size 2

--tool-call-parser "hermes"
说明：设置工具调用解析器类型。
hermes 是一种特定格式的解析器，适用于某些模型输出的结构化工具调用格式（如 Hermes 系列模型）。如果你的服务要返回类似：
{"name": "get_weather", "arguments": {"location": "..."}}

参数	含义
`模型路径`	要加载的模型文件夹路径
`--host`	服务绑定的 IP 地址
`--port`	服务监听的端口
`--gpu-memory-utilization`	GPU 显存利用率上限
`--served-model-name`	提供给客户端使用的模型名称
`--max-model-len`	最大上下文长度（token 数）
`--enable-auto-tool-choice`	启用自动工具调用功能
`--tool-call-parser`	工具调用结果的解析器类型
`--chat-template`	自定义 prompt 模板文件
`--chat-template-content-format`	模板内容格式处理方式
`--tensor-parallel-size`	使用多少张 GPU 并行推理

观察大语言模型的prompt，发现出现tools，如下：

prompt: '<|im_start|>system\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{"type": "function", "function": {"name": "get_weather", "description": "获取给定位置的当前天气", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "城市和州，例如“北京市海淀区”"}, "unit": {"type": "string", "enum": ["摄氏度", "华氏度"]}}, "required": ["location", "unit"]}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call><|im_end|>\n<|im_start|>user\n北京市房山区的天气怎么样?<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.6, top_p=0.95, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=7972, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None.

结果如下：