当前位置：首页 > news >正文

服务器部署一个千问2.5-14B、32B并发布为接口

news 2025/7/4 18:27:54

模型：

Qwen/Qwen2.5-14B：魔搭社区

Qwen/Qwen2.5-32B：魔搭社区

其实在下载前我想试试魔塔的接口调用稳不稳定

from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-32B',  # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)

from openai import OpenAIclient = OpenAI(base_url='https://api-inference.modelscope.cn/v1/',api_key='<MODELSCOPE_SDK_TOKEN>', # ModelScope Token
)# set extra_body for thinking control
extra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096
}response = client.chat.completions.create(model='Qwen/Qwen3-14B',  # ModelScope Model-Idmessages=[{'role': 'user','content': '9.9和9.11谁大'}],stream=True,extra_body=extra_body
)
done_thinking = False
for chunk in response:thinking_chunk = chunk.choices[0].delta.reasoning_contentanswer_chunk = chunk.choices[0].delta.contentif thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif answer_chunk != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Trueprint(answer_chunk, end='', flush=True)

我预计跑的数据为3k条左右，接口可用，就是有个问题，意思是你所使用的 Qwen/Qwen3-14B 模型仅支持流式模式（stream mode），你必须启用 stream 参数才能访问该模型。

🔁Qwen14B反馈如下： {'error': "Error: Error code: 400 - {'error': {'code': 'invalid_parameter_error', 'message': 'This model only support stream mode, please enable the stream parameter to access the model. ', 'param': None, 'type': 'invalid_request_error'}, 'request_id': 'e44ce9cc-2731-4956-86ff-0bce173ae68d'}"}

后面改了一下代码，流式输出已经能继续处理了。接下来就是看3k条数据跑完后是否有报错与遗漏，来看看接口的稳定性

def validate_explanation(term1, term2, prompt, score, explanation):print("🔎 正在调用Qwen14B 验证解释...")try:# set extra_body for thinking controlextra_body = {# enable thinking, set to False to disable"enable_thinking": True,# use thinking_budget to contorl num of tokens used for thinking# "thinking_budget": 4096}glm4_response = zhipu_client.chat.completions.create(model="Qwen/Qwen3-14B",messages=[{"role": "system","content": ("你的角色是 DeepSeek 使用”提示词“中的附加材料情况的监督者，负责审查用户输入内容，并严格按照以下格式输出分析结果："},{"role": "user","content": f"术语1: {term1}, 术语2: {term2}, 提示词: {prompt}, 分数: {score}, 解释: {explanation}"}],stream=True,extra_body=extra_body)full_output = ""done_thinking = Falsefor chunk in glm4_response:thinking_chunk = chunk.choices[0].delta.reasoning_contentoutput = chunk.choices[0].delta.contentprint(f"🧠Qwen14B返回：{output}")if thinking_chunk != '':print(thinking_chunk, end='', flush=True)elif output != '':if not done_thinking:print('\n\n === Final Answer ===\n')done_thinking = Truefull_output += outputif full_output.startswith("```json"):full_output = full_output[7:-3].strip()match = re.search(r'{\s*"a":\s*(true|false),\s*"b":\s*(true|false),\s*"c":\s*(true|false),\s*"flag":\s*(true|false),\s*"reply":\s*"(.*?)"\s*}',full_output, re.DOTALL)if match:a, b, c, flag, reply = match.groups()return {"a": a == "true","b": b == "true","c": c == "true","flag": flag == "true","reply": reply.strip()}else:return {"error": "InvalidQwen14B response format: Unable to extract fields."}except Exception as e:return {"error": f"Error: {e}"}

查看全文

http://www.xdnf.cn/news/282907.html