当前位置: 首页 > ds >正文

【学习心得】Xtuner模型qlora微调时错误记录

问题一:使用Xtuner进行模型qlora模型微调的时候,报错No module named 'triton.ops'

The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/output/xtuner/xtuner/tools/train.py", line 392, in <module>main()File "/output/xtuner/xtuner/tools/train.py", line 381, in mainrunner = Runner.from_cfg(cfg)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfgrunner = cls(File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in __init__self.model = self.build_model(model)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_modelmodel = MODELS.build(model)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in buildreturn self.build_func(cfg, *args, **kwargs, registry=self)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 234, in build_model_from_cfgreturn build_from_cfg(cfg, registry, default_args)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfgobj = obj_cls(**args)  # type: ignoreFile "/output/xtuner/xtuner/model/sft.py", line 97, in __init__self.llm = self.build_llm_from_cfg(File "/output/xtuner/xtuner/model/sft.py", line 143, in build_llm_from_cfgllm = self._build_from_cfg_or_module(llm)File "/output/xtuner/xtuner/model/sft.py", line 296, in _build_from_cfg_or_modulereturn BUILDER.build(cfg_or_mod)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in buildreturn self.build_func(cfg, *args, **kwargs, registry=self)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfgobj = obj_cls(**args)  # type: ignoreFile "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrainedreturn model_class.from_pretrained(File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3620, in from_pretrainedhf_quantizer.validate_environment(File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_8bit.py", line 77, in validate_environmentfrom ..integrations import validate_bnb_backend_availabilityFile "<frozen importlib._bootstrap>", line 1075, in _handle_fromlistFile "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1805, in __getattr__module = self._get_module(self._class_to_module[name])File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1819, in _get_moduleraise RuntimeError(
RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):
No module named 'triton.ops'

根据回溯来分析错误原因:

这是在尝试使用 BitsAndBytes(8-bit 量化) 加载模型时发生的,我们查看一下 bitsandbytes 库是否正确安装。

可能是版本不对,试着重装一下后还是报错,最后查找资料发现是torch2.6以上版本和bitstandbytes版本的冲突问题。

解决方案:安装低版本的pytorch==2.5.1我们进入xtuner的requirement文件里面修改一下torch版本

然后再执行一次xtuner安装就可以了

# 在xtuner源码目录中执行
pip install -e .

问题二:使用Xtuner进行模型qlora模型微调的时候,报错KeyError: 'qwen'

Traceback (most recent call last):File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1071, in from_pretrainedconfig_class = CONFIG_MAPPING[config_dict["model_type"]]File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 773, in __getitem__raise KeyError(key)
KeyError: 'qwen'During handling of the above exception, another exception occurred:Traceback (most recent call last):File "/output/xtuner/xtuner/tools/train.py", line 392, in <module>main()File "/output/xtuner/xtuner/tools/train.py", line 381, in mainrunner = Runner.from_cfg(cfg)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 462, in from_cfgrunner = cls(File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 429, in __init__self.model = self.build_model(model)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/runner/runner.py", line 836, in build_modelmodel = MODELS.build(model)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in buildreturn self.build_func(cfg, *args, **kwargs, registry=self)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 234, in build_model_from_cfgreturn build_from_cfg(cfg, registry, default_args)File "/usr/local/envs/xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 123, in build_from_cfgobj = obj_cls(**args)  # type: ignoreFile "/output/xtuner/xtuner/model/sft.py", line 97, in __init__self.llm = self.build_llm_from_cfg(File "/output/xtuner/xtuner/model/sft.py", line 142, in build_llm_from_cfgllm = self._dispatch_lm_model_cfg(llm_cfg, max_position_embeddings)File "/output/xtuner/xtuner/model/sft.py", line 281, in _dispatch_lm_model_cfgllm_cfg = AutoConfig.from_pretrained(File "/usr/local/envs/xtuner/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1073, in from_pretrainedraise ValueError(
ValueError: The checkpoint you are trying to load has model type `qwen` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

原因分析:当前安装的Transformers 库不支持这个模型最新qwen3模型。

解决方案:qwen3的模型配置文件中说明了支持的transformers版本为4.51.0但是我要使用Xtuner微调,xtuner框架支持的transformers版本是4.48.0所以我不能安装最新的版本。于是可以修改配置文件中的model_type字段,改成qwen2

成功开始训练
http://www.xdnf.cn/news/4166.html

相关文章:

  • 【嘉立创EDA】FPCB(Flexible-PCB)柔性软板设计如何增加补强层
  • 反常积分(广义积分)
  • Redis总结(六)redis持久化
  • C++ 适配器模式详解
  • Java中使用Lock简化同步机制
  • 安装SDL和FFmpeg
  • 强化学习ppo算法在大语言模型上跑通
  • [ 设计模式 ] | 单例模式
  • Android学习总结之GetX库篇(场景运用)
  • 智能合约在去中心化金融(DeFi)中的核心地位与挑战
  • 机器学习中常见搜索算法
  • 代码随想录算法训练营第三十二天
  • Scrapy爬虫实战:如何用Rules实现高效数据采集
  • STM32教程:DMA运用及代码(基于STM32F103C8T6最小系统板标准库开发)*详细教程*
  • Vue3响应式原理那些事
  • PyTorch 张量与自动微分操作
  • 研0大模型学习(第12天)
  • 《深入理解 Java 虚拟机》笔记
  • 三、【LLaMA-Factory实战】模型微调进阶:从LoRA到MoE的技术突破与工程实践
  • 一文读懂Python之pandas模块
  • Vite简单介绍
  • 亚马逊卖家复刻案例:用社群分层策略实现海外用户月均消费3.2次
  • 普通消元求解线性基并求解最大异或和
  • 【论文笔记】SOTR: Segmenting Objects with Transformers
  • 机器人强化学习入门学习笔记
  • 有效的数独(中等)
  • Qt中数据结构使用自定义类————附带详细示例
  • 2025年企业Radius认证服务器市场深度调研:中小企业身份安全投入产出比最优解
  • Untiy基础学习(六)MonoBehaviour基类的简单介绍
  • 形式化数学——Lean求值表达式