当前位置：首页 > news >正文

自训练NL-SQL模型

news 2025/5/28 1:58:17

使用T5小模型在笔记本上训练 nature language to SQL/自然语言转SQL
实测通过。

本文介绍了如何在笔记本上使用T5小模型训练自然语言转SQL的任务。主要内容包括：1) 创建Python 3.9环境并安装必要的依赖包；2) 通过Hugging Face镜像下载wikisql数据集和T5-small模型；3) 实现数据预处理函数，将自然语言问题转换为SQL查询语句；4) 优化训练过程，包括截断条件和批量处理以提高内存效率。实验表明，该方法在有限计算资源下可行，适合个人开发者和小规模项目尝试。

##############################################

创建环境建议用python3.9

##############################################

#list all conda environment
conda env list

#deactive 现有环境
conda deactivate
conda remove --name py312_test --all

#创建一个新环境
conda create -n py39_test python=3.9
conda activate py39_test

#requirment.txt 见文章最下方

##############################################

开始安装

##############################################
#pip
pip install torch transformers pandas datasets

#curl
curl -I https://hf-mirror.com/datasets/Salesforce/wikisql

先尝试从wiki下载SQL set

import os
os.environ[“HF_ENDPOINT”] = “https://hf-mirror.com”

from datasets import load_dataset

指定版本（如 “refs/convert/parquet” 是官方维护的稳定分支）

dataset = load_dataset(
“Salesforce/wikisql”,
trust_remote_code=True,
revision=“refs/convert/parquet”
) # dataset 保存在 C:\Users\ASUS.cache\huggingface\datasets
print(dataset[“train”][0]) # 查看数据结构

install torch

pip uninstall numpy -y
pip install numpy1.26.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torch2.1.0 --extra-index-url https://download.pytorch.org/whl/cpu -i https://pypi.tuna.tsinghua.edu.cn/simple

import numpy as np
print(f"NumPy 版本: {np.version}“) # 应输出 1.26.0
import torch
print(f"PyTorch 版本: {torch.version}”) # 输出 2.1.0+cpu
print(f"是否为 CPU 版本: {not torch.cuda.is_available()}") # 输出 True（无 GPU 时）

#install sentencepiece
pip install sentence

查看全文

http://www.xdnf.cn/news/636805.html