mindie近期报错总结
mindie版本太低启动Qwen2.5-3B-Instruct失败
刚开始使用mindie1.0RC2部署Qwen2.5-3B-Instruct,结果mindie启动报错:
AssertionError: weight lm_head.weight does not exist
昇腾社区也有类似的报错:https://www.hiascend.com/forum/thread-0243177238338974080-1-1.html
社区分析是config.json中的tie_word_embeddings参数导致的,RC2版本较老,而源码里面并没有对这个参数位true做特殊处理。可以将这个参数设置为false或者升级mindie版本。
我将mindie的版本升级到mindie1.0解决了。
/usr/local/bin权限问题引起的mindie容器启动失败
现象是,如果启动容器时使用-e ASCEND_VISIBLE_DEVICES=x指定npu,则启动报错。如果指定–device=/dev/davinci0,将所有需要的资源手动映射到docker内,则容器正常启动,mindie也启动正常。
报错信息如下:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook 1:/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-hook,/usr/local/Ascend/Ascend-Docker-Runtime/ascend-docker-hook err: error running hook: exit status 255, stdout: , stderr: runc start prestart-hook ...verify parameters valid and parse runtime optionssetup container config ...prepare necessary configenter container's mount namespacedo mountingPlease check the write permission!
failed to mount src: /usr/local/bin/npu-smi.failed to mount dev.failed to do file mounting for.failed to mount files.failed to do mounting.failed to setup container., ContainerId: 3df2fc6c6f6f700262281e2a7c01cb06986fec9243038215570873a8ae8d096e: unknown.
根本原因是/usr/local/bin权限被意外改成0775, 改为0755就好了。
mindie2.0推理qwen2_vl_7b失败
使用mindie2.0部署qwen2_vl_7b,调用时返回错误
{"error":"Failed to get engine response.","error_type":"Incomplete Generation"}
解决办法,增加环境变量:
export MATMUL_ND_NZ_ENABLE=0
export ENABLE_ACLNN_MATMUL_BACKEND=0
这里也有类似的问题:https://www.hiascend.com/forum/thread-0201178897583547128-1-1.html
问题分析:Qwen2系列模型可能开启了MATMUL格式转换和EACLNN接口调用,因此单卡跑Qwen系列模型时,需要关闭这两个参数。mindie 1.0.0镜像、mindie 2.0.T6及之后的镜像不涉及,可直接使用。
mindie1.0部署多模态模型不支持标准openai api协议
标准的openai接口传输图片格式如下:
"content": [{"type": "text", "text": text},{"type": "image_url","image_url": {"url": f"data:image/jpeg;base64,{base64_str}"}}]
而mindie1.0支持的格式如下:
"content": [{"type": "text", "text": text},{"type": "image_url","image_url": f"{base64_str}"}]
如果用dify等使用标准openai接口的第三方组件去访问,就非常麻烦。
mindie2.0其实已经支持了标准openai接口,怎么让mindie1.0也支持这个特性呢。
经过探索,找一个mindie2.0的容器,在里面找到mies_tokenizer-0.0.1-py3-none-any.whl文件,复制到mindie1.0容器中,强制重新安装就可以了。
更新mies_tokenizer之后的mindie同时支持上面2种传输图片的格式。
transformers版本低引起的mindie2.0部署qwen3失败
mindie2.0部署qwen3,启动失败。
日志文件/root/mindie/log/debug/mindie_llm_652_20250506200646506.log。日志内容:
2025-05-06 20:06:46.506 654 LLM log default format: [yyyy-mm-dd hh:mm:ss.uuuuuu] [processid] [threadid] [llm] [loglevel] [file:line] [status code] msg
[2025-05-06 20:06:46.506] [652] [281472858840928] [llm] [INFO] [llm_manager_impl.cpp:76] LLMRuntime init success!
[2025-05-06 20:06:51.008] [652] [281472858840928] [llm] [ERROR] [model_deploy_config.cpp:102] Failed to get vocab size from tokenizer wrapper with exception: ValueError: safe_get_tokenizer_from_pretrained failed. Please check the input parameters model_path and kwargs. If the input parameters are valid and the required files exist in model_path, make sure the folder's owner has execute permission. Otherwise, please check the function stack for detailed exception information.At:/usr/local/Ascend/atb-models/atb_llm/models/base/model_utils.py(251): wrapper/usr/local/Ascend/atb-models/atb_llm/models/qwen2/router_qwen2.py(84): get_tokenizer/usr/local/Ascend/atb-models/atb_llm/models/base/router.py(160): tokenizer/usr/local/Ascend/atb-models/atb_llm/runner/tokenizer_wrapper.py(20): __init__/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_tokenizer_wrapper.py(8): __init__/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py(28): get_tokenizer_wrapper[2025-05-06 20:06:51.008] [652] [281472858840928] [llm] [ERROR] [llm_manager_impl.cpp:784] Config manager init exception: ValueError: safe_get_tokenizer_from_pretrained failed. Please check the input parameters model_path and kwargs. If the input parameters are valid and the required files exist in model_path, make sure the folder's owner has execute permission. Otherwise, please check the function stack for detailed exception information.At:/usr/local/Ascend/atb-models/atb_llm/models/base/model_utils.py(251): wrapper/usr/local/Ascend/atb-models/atb_llm/models/qwen2/router_qwen2.py(84): get_tokenizer/usr/local/Ascend/atb-models/atb_llm/models/base/router.py(160): tokenizer/usr/local/Ascend/atb-models/atb_llm/runner/tokenizer_wrapper.py(20): __init__/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_tokenizer_wrapper.py(8): __init__/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py(28): get_tokenizer_wrapper
解决办法: 将transformers从4.44升级到4.46。
总结:最近在mindie1.0上部署qwen2_vl_7b也有类似的问题,需要升级transformers才能解决。