当前位置：首页 > news >正文

Amazon Linux 训练lora模型的方式

news 2025/8/11 13:13:09

1. 基本思路

Amazon Linux（尤其是 Amazon Linux 2）和常见的 Ubuntu 系列不太一样，主要差异在：

默认的 Python 版本较老（Amazon Linux 2 常见是 2.7/3.7，需要自己安装新版本）
部分构建工具和依赖（gcc, glibc, cmake 等）需要额外安装
部分 Python 包（如 PyTorch、xformers）需要对应 CUDA 版本的预编译 wheel，否则编译极慢

目标：

Python ≥ 3.10
CUDA 对应的 PyTorch
安装 LoRA 训练框架（推荐 Kohya_ss 或 sd-scripts）
数据准备和训练执行

2. 环境准备

2.1 更新系统

sudo yum update -y

2.2 安装必要工具

sudo yum groupinstall "Development Tools" -y sudo yum install git wget unzip bzip2 make cmake -y

3. 安装 Python 3.10+

Amazon Linux 2 自带的 Python 太老，需要自己装新版本。

sudo amazon-linux-extras enable python3.10 sudo yum install python3.10 python3.10-devel -y python3.10 -m ensurepip python3.10 -m pip install --upgrade pip

验证：


python3.10 --version

4. 安装 CUDA / cuDNN

如果你用的是 AWS GPU 实例（如 p3/p4/g4dn），推荐直接用 AWS 的 NVIDIA 驱动 AMI 或官方教程：


# 安装驱动和 CUDA Toolkit sudo yum install -y cuda sudo yum install -y libcudnn8 libcudnn8-devel

安装完成后重启，并检查：


nvidia-smi nvcc --version

5. 安装 PyTorch 和 xformers

根据你的 CUDA 版本选择正确的 wheel（以下以 CUDA 11.8 为例）：

python3.10 -m pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118 python3.10 -m pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118

检查：


python3.10 -c "import torch;print(torch.__version__, torch.cuda.is_available())"

6. 安装 LoRA 训练工具

6.1 Kohya_ss（推荐，功能全）

git clone https://github.com/kohya-ss/sd-scripts.git cd sd-scripts python3.10 -m venv venv source venv/bin/activate pip install --upgrade pip wheel pip install -r requirements.txt

6.2 或者直接用 webui 训练扩展

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git cd stable-diffusion-webui/extensions git clone https://github.com/kohya-ss/sd-webui-additional-networks.git

7. 数据集准备（关键步骤）

7.1 数据量

最少：10~20 张高质量图（LoRA 可以用很小的数据集）
推荐：30~50 张，如果是人物 / 特定风格，越多越好（但不要无关图）

7.2 图片要求

分辨率尽量接近训练分辨率（比如 512×512 或 768×768）
PNG/JPG 都行，但 JPG 不要压缩太狠
主题要清晰、干净，少杂物、少背景干扰

7.3 文件命名（可选做法）

用统一命名方便管理：


dataset/ subject_0001.png subject_0002.png subject_0003.png

7.4 生成训练标签

LoRA 训练依赖**提示词（tags）**来引导学习，最好每张图有一份 .txt 标签文件。
如果不自己写，可以用自动打标签工具（推荐 WD14 Tagger）：

方法 1：找文件位置

# 进入 sd-scripts 目录的虚拟环境source venv/bin/activate # 安装标签生成依赖pip install transformers pillow # 运行 WD14 自动打标签（会生成 dataset/imgxxx.txt）python finetune/tag_images_by_wd14_tagger.py \--batch_size 4 \--repo_id SmilingWolf/wd-v1-4-vit-tagger-v2 \dataset

生成的 .txt 类似：

1girl, solo, long hair, smile, blue eyes, dress

方法 2：用 WebUI 里的自动打标签

如果你同时装了 Stable Diffusion WebUI，可以用 WD14 Tagger 扩展，不必手动运行脚本。
安装：


cd stable-diffusion-webui/extensions git clone https://github.com/toriato/stable-diffusion-webui-wd14-tagger.git

然后在 WebUI 的“批量处理”里批量打标签。

方法 3：自己写个简化版打标签脚本

wd-v1-4-vit-tagger-v2 模型是 HuggingFace 上的，可以用几行代码直接调用：

from PIL import Image from transformers import AutoProcessor, AutoModelForImageClassification import os model_id = "SmilingWolf/wd-v1-4-vit-tagger-v2" processor = AutoProcessor.from_pretrained(model_id) model = AutoModelForImageClassification.from_pretrained(model_id) img_dir = "dataset" for file in os.listdir(img_dir): if file.lower().endswith((".png", ".jpg", ".jpeg")): img = Image.open(os.path.join(img_dir, file)).convert("RGB") inputs = processor(images=img, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits[0] tags = [model.config.id2label[i] for i in logits.topk(5).indices] with open(os.path.join(img_dir, file.rsplit(".",1)[0]+".txt"), "w") as f: f.write(", ".join(tags))

建议：把所有图片的标签里加一个专属词，比如 mycharname，这样 LoRA 训练出来时你就能用这个词触发效果。

8. 训练 LoRA（Python 3.12 + AL2023）

8.1 基本命令

假设你用 sd-scripts 训练，数据集在 dataset/，模型是 stable-diffusion-v1-5：

accelerate launch train_network.py \ --pretrained_model_name_or_path="model/stable-diffusion-v1-5" \ --train_data_dir="dataset" \ --resolution=512,512 \ --output_dir="lora_out" \ --logging_dir="logs" \ --network_module=networks.lora \ --text_encoder_lr=5e-5 \ --unet_lr=1e-4 \ --network_dim=128 \ --learning_rate=1e-4 \ --lr_scheduler="cosine" \ --train_batch_size=2 \ --max_train_steps=10000 \ --save_every_n_epochs=1 \ --mixed_precision="fp16" \ --cache_latents \ --optimizer_type="AdamW8bit"

8.2 关键参数解释

参数	作用	建议值
`--network_dim`	LoRA 维度（越高越细致，但文件更大）	64~128
`--train_batch_size`	批量大小（显存不够就调小）	1~2
`--max_train_steps`	总训练步数	2k~10k
`--text_encoder_lr`	文本编码器学习率	5e-5
`--unet_lr`	UNet 学习率	1e-4
`--mixed_precision`	混合精度	`fp16`（省显存）
`--optimizer_type`	优化器	`AdamW8bit`（节省显存）

8.3 显存优化技巧

如果显存 ≤ 8GB：
- --network_dim=64
- --train_batch_size=1
- --gradient_checkpointing
- 开启 --xformers
如果显存 ≥ 16GB：
- 可以用 --network_dim=128 或更高
- 批量大小调到 2~4