LLAMA-Factory微调chatglm3-6b出现KeyError: ‘instruction‘错误

之前我也遇到过这样的错误就是在LLAMA-Factory微调chatglm3-6b时报错KeyError: ‘instruction‘。那时候是因为数据现存在少部分格式不同，这才导致KeyError: 'instruction'错误。

但是候来又遇到了KeyError: ‘instruction‘，但这次没有格式不同的问题。

究其原因，LLAMA-Factory只能接受特定格式的数据集

{"instruction": "描述面向对象编程（OOP）的原则。","input": "OOP 原则包括封装、继承、多态和抽象，促进了有组织和可维护的代码。","output": "输出评价：你对面向对象编程的原则有很好的理解。在你的开发经验中，这些原则是如何指导你编写代码的？"}

其中，instruction这些叫什么不重要，重要的是要在LLAMA-Factory的data文件夹下的dataset_info.json中注册这个数据集，同时要描述映射关系（这里是关键，llama-factory实际使用的是prompt，query这些键，你要描述instruction对应的是promot（即提示），还是query（问题）还是response（回答））

"self_cognition": {"file_name": "self_cognition.json", #你数据集的名字"file_sha1": "eca3d89fa38b35460d6627cefdc101feef507eb5",#这是生成的独特编码"columns": {"prompt": "instruction",  #映射关系的描述"query": "input","response": "output","history": "history"#有就加上，没有就不加}}

这样注册了就不会报错了。

附上生成独特编码的代码

import hashlib
def calculate_sha1(file_path):sha1 = hashlib.sha1()try:with open(file_path, 'rb') as file:while True:data = file.read(8192)  # Read in chunks to handle large filesif not data:breaksha1.update(data)return sha1.hexdigest()except FileNotFoundError:return "File not found."# 使用示例
file_path = 'test3.json'  # 替换为您的文件路径
sha1_hash = calculate_sha1(file_path)
print("SHA-1 Hash:", sha1_hash)

ps：往期回顾python划分数据集时出现KeyError: ‘instruction‘错误

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.xdnf.cn/news/1425153.html

如若内容造成侵权/违法违规/事实不符，请联系一条长河网进行投诉反馈，一经查实，立即删除！