Python Day43
Task:
kaggle找到一个图像数据集,用cnn网络进行训练并且用grad-cam做可视化
进阶:并拆分成多个文件
项目结构概览:
grad_cam_project/
├── data/
│ ├── train/
│ │ ├── cats/
│ │ └── dogs/
│ └── validation/
│ ├── cats/
│ └── dogs/
├── main.py # 主程序,负责调度训练和可视化
├── data_loader.py # 数据加载和预处理
├── model_builder.py # 模型构建
├── trainer.py # 模型训练
├── grad_cam.py # Grad-CAM 核心逻辑
├── utils.py # 辅助函数(如图片处理、绘图)
└── requirements.txt # 项目依赖
步骤:
- 准备环境和数据集
- 创建文件结构
- 编写
requirements.txt
- 编写
data_loader.py
- 编写
model_builder.py
- 编写
trainer.py
- 编写
utils.py
- 编写
grad_cam.py
- 编写
main.py
- 运行和测试
1. 准备环境和数据集
环境:
确保你安装了 Python 和以下库:
tensorflow
, numpy
, matplotlib
, opencv-python
(cv2)
数据集:
从 Kaggle 下载 “Dogs vs. Cats” 数据集。
链接: https://www.kaggle.com/c/dogs-vs-cats/data
下载后,你需要组织数据目录结构如下:
grad_cam_project/
├── data/
│ ├── train/
│ │ ├── cats/
│ │ │ ├── cat.0.jpg
│ │ │ ├── cat.1.jpg
│ │ │ └── ...
│ │ └── dogs/
│ │ ├── dog.0.jpg
│ │ ├── dog.1.jpg
│ │ └── ...
│ └── validation/
│ ├── cats/
│ │ ├── cat.2000.jpg
│ │ └── ...
│ └── dogs/
│ ├── dog.2000.jpg
│ └── ...
你可以手动从 train.zip
中随机抽取一部分图片作为 validation
集,并按照 cats
和 dogs
分类。例如,从每个类别中拿出2000张图片作为验证集,其余作为训练集。
2. 创建文件结构
在你的工作目录下创建 grad_cam_project
文件夹,并在其中创建上述所示的文件和子文件夹。
3. 编写 requirements.txt
tensorflow==2.x.x # 你的TensorFlow版本,例如 2.10.0
numpy
matplotlib
opencv-python
scikit-learn # 用于分割数据集,如果手动分好了可以不装
安装依赖:
pip install -r requirements.txt
4. 编写 data_loader.py
这个文件将负责加载和预处理图片数据。
# grad_cam_project/data_loader.pyimport tensorflow as tf
import osdef load_data(data_dir, img_size, batch_size, validation_split=0.2):"""使用 ImageDataGenerator 加载和预处理图像数据。Args:data_dir (str): 数据集根目录 (e.g., 'data').img_size (tuple): 图像尺寸 (height, width).batch_size (int): 批处理大小.validation_split (float): 训练集中的验证集比例.Returns:tuple: (train_generator, validation_generator, num_classes, class_names)."""image_height, image_width = img_size# 路径train_dir = os.path.join(data_dir, 'train')validation_dir = os.path.join(data_dir, 'validation')if not os.path.exists(train_dir) or not os.path.exists(validation_dir):print(f"Error: 'train' or 'validation' directories not found in {data_dir}.")print("Please ensure your data is structured like: data/train/cats, data/train/dogs, etc.")exit()# 训练集数据增强和预处理train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, # 像素值归一化到 [0, 1]rotation_range=20, # 随机旋转20度width_shift_range=0.2, # 随机水平平移height_shift_range=0.2, # 随机垂直平移shear_range=0.2, # 剪切变换zoom_range=0.2, # 随机缩放horizontal_flip=True, # 随机水平翻转fill_mode='nearest' # 填充新创建像素的方法)# 验证集不需要数据增强,只进行归一化validation_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(train_dir,target_size=(image_height, image_width),batch_size=batch_size,class_mode='binary' # 因为是猫狗二分类)validation_generator = validation_datagen.flow_from_directory(validation_dir,target_size=(image_height, image_width),batch_size=batch_size,class_mode='binary')num_classes = len(train_generator.class_indices)class_names = list(train_generator.class_indices.keys())print(f"Found {train_generator.num_classes} classes: {class_names}")return train_generator, validation_generator, num_classes, class_namesif __name__ == '__main__':# 示例用法DATA_DIR = 'data' # 假设数据在项目根目录下的data文件夹IMG_SIZE = (150, 150)BATCH_SIZE = 32print(f"Loading data from {DATA_DIR}...")train_gen, val_gen, num_classes, class_names = load_data(DATA_DIR, IMG_SIZE, BATCH_SIZE)print(f"Number of training samples: {train_gen.samples}")print(f"Number of validation samples: {val_gen.samples}")print(f"Class names: {class_names}")# 可以迭代一个batch看看数据形态for data_batch, labels_batch in train_gen:print("Data batch shape:", data_batch.shape)print("Labels batch shape:", labels_batch.shape)break
5. 编写 model_builder.py
这个文件将构建 CNN 模型。我们将使用预训练的 MobileNetV2
作为特征提取器,并在其上添加自定义分类层。
# grad_cam_project/model_builder.pyimport tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adamdef build_model(input_shape, num_classes):"""构建基于 MobileNetV2 的图像分类模型。Args:input_shape (tuple): 模型输入图像的形状 (height, width, channels).num_classes (int): 分类类别数量.Returns:tf.keras.Model: 编译好的模型."""# 加载预训练的 MobileNetV2 模型,不包含顶层(分类层)base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,include_top=False,weights='imagenet')# 冻结基础模型的权重,使其不参与训练base_model.trainable = False# 在基础模型之上添加自定义分类层x = base_model.outputx = GlobalAveragePooling2D()(x) # 全局平均池化,将特征图展平# 如果是二分类,输出层1个单元,激活函数 sigmoidif num_classes == 2:predictions = Dense(1, activation='sigmoid')(x)loss_fn = 'binary_crossentropy'# 如果是多分类,输出层 num_classes 个单元,激活函数 softmaxelse:predictions = Dense(num_classes, activation='softmax')(x)loss_fn = 'sparse_categorical_crossentropy' # 如果标签是整数,用这个model = Model(inputs=base_model.input, outputs=predictions)# 编译模型model.compile(optimizer=Adam(learning_rate=0.0001),loss=loss_fn,metrics=['accuracy'])model.summary()return modelif __name__ == '__main__':# 示例用法INPUT_SHAPE = (150, 150, 3)NUM_CLASSES = 2 # 猫狗二分类print("Building model...")model = build_model(INPUT_SHAPE, NUM_CLASSES)print("Model built successfully.")
6. 编写 trainer.py
这个文件将包含模型的训练逻辑。
# grad_cam_project/trainer.pyimport tensorflow as tf
import osdef train_model(model, train_generator, validation_generator, epochs, model_save_path='saved_model'):"""训练模型。Args:model (tf.keras.Model): 要训练的模型.train_generator (tf.keras.preprocessing.image.DirectoryIterator): 训练数据生成器.validation_generator (tf.keras.preprocessing.image.DirectoryIterator): 验证数据生成器.epochs (int): 训练轮数.model_save_path (str): 模型保存路径.Returns:tf.keras.callbacks.History: 训练历史对象."""# 创建保存模型的目录os.makedirs(model_save_path, exist_ok=True)checkpoint_filepath = os.path.join(model_save_path, 'best_model.h5')# 定义回调函数callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_filepath,save_best_only=True, # 只保存最好的模型monitor='val_accuracy', # 监控验证集准确率mode='max', # 准确率越大越好verbose=1),tf.keras.callbacks.EarlyStopping(monitor='val_loss', # 监控验证集损失patience=5, # 如果5个epoch内验证损失没有改善,则停止训练verbose=1,restore_best_weights=True # 停止时恢复最佳权重),tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',factor=0.2,patience=3,verbose=1,min_lr=0.000001)]print(f"Starting training for {epochs} epochs...")history = model.fit(train_generator,epochs=epochs,validation_data=validation_generator,callbacks=callbacks)print("Training finished.")print(f"Best model saved to {checkpoint_filepath}")return historyif __name__ == '__main__':# 示例用法 (需要 data_loader 和 model_builder 的支持)from data_loader import load_datafrom model_builder import build_modelDATA_DIR = 'data'IMG_SIZE = (150, 150)BATCH_SIZE = 32EPOCHS = 10 # 示例用,实际训练可能需要更多print("Loading data for trainer example...")train_gen, val_gen, num_classes, class_names = load_data(DATA_DIR, IMG_SIZE, BATCH_SIZE)print("Building model for trainer example...")model = build_model(IMG_SIZE + (3,), num_classes) # (H, W, C)print("Starting training example...")history = train_model(model, train_gen, val_gen, EPOCHS)# 可以打印训练历史import matplotlib.pyplot as pltplt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)plt.plot(history.history['accuracy'], label='Training Accuracy')plt.plot(history.history['val_accuracy'], label='Validation Accuracy')plt.legend()plt.title('Accuracy over Epochs')plt.subplot(1, 2, 2)plt.plot(history.history['loss'], label='Training Loss')plt.plot(history.history['val_loss'], label='Validation Loss')plt.legend()plt.title('Loss over Epochs')plt.show()
7. 编写 utils.py
一些通用的辅助函数。
# grad_cam_project/utils.pyimport numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image
import matplotlib.pyplot as plt
import cv2 # OpenCV for image processingdef load_and_preprocess_image(img_path, target_size=(224, 224)):"""加载并预处理一张图片,使其符合模型输入要求。Args:img_path (str): 图片文件路径.target_size (tuple): 目标尺寸 (height, width).Returns:tuple: (original_img, img_array)original_img (PIL.Image.Image): 原始加载的图片.img_array (numpy.ndarray): 经过预处理,可供模型输入的图片数组."""img = image.load_img(img_path, target_size=target_size)img_array = image.img_to_array(img)img_array = np.expand_dims(img_array, axis=0) # 增加 batch 维度img_array = img_array / 255.0 # 归一化到 [0, 1],与训练时一致return img, img_arraydef display_grad_cam(img, heatmap, alpha=0.4):"""在原始图片上叠加 Grad-CAM 热力图并显示。Args:img (PIL.Image.Image or numpy.ndarray): 原始图片。heatmap (numpy.ndarray): 归一化后的热力图 (0-1).alpha (float): 热力图透明度."""# 转换为 OpenCV 格式 (BGR)if isinstance(img, np.ndarray):img = (img * 255).astype(np.uint8) # 如果是归一化过的 numpy 数组else: # PIL Imageimg = np.array(img)# 确保是 RGB 格式 (OpenCV默认BGR)if img.shape[-1] == 3 and img.ndim == 3:if img.shape[2] == 3: # Assuming RGB# For some reason, cv2.cvtColor(img, cv2.COLOR_RGB2BGR) is needed if original image was PIL Image# and then converted to numpy array.# If you load image using cv2.imread directly, it's already BGR.passelse: # Probably RGBA, convert to RGB firstimg = cv2.cvtColor(img, cv2.COLOR_RGBA2RGB)# 调整热力图大小到原始图片尺寸heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))# 将热力图转换为 RGB 格式,并归一化到 0-255heatmap = np.uint8(255 * heatmap)heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)# 将热力图叠加到原始图片上# cv2.addWeighted(src1, alpha, src2, beta, gamma)# dst = src1 * alpha + src2 * beta + gamma# 这里 src1 是热力图,src2 是原始图片superimposed_img = cv2.addWeighted(heatmap, alpha, img, 1 - alpha, 0)# 显示图片plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img)plt.title('Original Image')plt.axis('off')plt.subplot(1, 2, 2)plt.imshow(superimposed_img)plt.title('Grad-CAM Heatmap')plt.axis('off')plt.show()
8. 编写 grad_cam.py
这个文件包含 Grad-CAM 的核心实现。
# grad_cam_project/grad_cam.pyimport tensorflow as tf
import numpy as np
import cv2def generate_grad_cam(model, img_array, layer_name, pred_index=None):"""生成 Grad-CAM 热力图。Args:model (tf.keras.Model): 训练好的模型。img_array (numpy.ndarray): 预处理后的图片数组 (batch_size, H, W, C)。layer_name (str): 目标卷积层的名称 (通常是最后一个卷积层)。pred_index (int, optional): 预测的类别索引。如果为 None,则取最高预测概率的类别。Returns:numpy.ndarray: 原始图片尺寸的热力图 (0-1)."""# 1. 创建一个新的模型,输入与原模型相同,输出包含目标卷积层的特征图和最终预测grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(layer_name).output, model.output])# 2. 使用 GradientTape 计算梯度with tf.GradientTape() as tape:# 获取目标层输出 (特征图) 和模型最终预测conv_output, predictions = grad_model(img_array)# 如果未指定预测类别,则取预测概率最高的类别if pred_index is None:pred_index = tf.argmax(predictions[0]) # predictions[0] 是一个batch的预测# 获取目标类别的预测分数# 对于二分类,predictions[0] 是一个标量 sigmoid 输出,我们需要确保其作为张量处理if predictions.shape[-1] == 1: # Binary classification# If target class is 0, we want the gradient of 1-p, if target class is 1, gradient of pclass_channel = predictions[:, 0] if pred_index == 1 else (1 - predictions[:, 0])else: # Multi-class classificationclass_channel = predictions[:, pred_index]# 3. 计算目标类别分数相对于目标卷积层输出的梯度grads = tape.gradient(class_channel, conv_output)# 4. 对梯度进行全局平均池化,得到每个特征图的权重pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)) # 在 batch, H, W 维度上平均# 5. 将每个特征图的权重与对应的特征图相乘# 展平 conv_output (batch, H, W, Channels) -> (Channels)conv_output = conv_output[0] # 取出batch中的第一张图片# 逐通道加权求和heatmap = conv_output @ pooled_grads[..., tf.newaxis] # 利用矩阵乘法实现加权求和heatmap = tf.squeeze(heatmap) # 去掉单维度轴# 6. 应用 ReLU 激活,丢弃负值 (因为我们只关心正向贡献的区域)heatmap = tf.maximum(heatmap, 0) # ReLU# 7. 将热力图归一化到 0-1 范围max_heatmap = tf.reduce_max(heatmap)if max_heatmap == 0:heatmap = heatmap # 如果全是0,避免除以0else:heatmap = heatmap / max_heatmapreturn heatmap.numpy()def get_last_conv_layer_name(model):"""尝试获取模型中最后一个卷积层的名称。注意:这可能需要根据具体的模型架构进行调整。对于 MobileNetV2,常见的最后一个卷积层名称是 'out_relu' 或 'Conv_1'."""for layer in reversed(model.layers):# 查找 Conv2D 或 DepthwiseConv2Dif isinstance(layer, (tf.keras.layers.Conv2D, tf.keras.layers.DepthwiseConv2D)) and layer.trainable:# 如果是预训练模型,冻结的层不参与,所以寻找最近一个“可训练”的卷积层可能不合适。# 我们要的是特征提取的最后输出层,通常它是冻结的。return layer.name# 对于MobileNetV2,最后一个主要的卷积层通常是 'out_relu' 激活层之前# 寻找它的前一个卷积层或其输出if layer.name == 'out_relu': # MobileNetV2return layer.inbound_nodes[0].inbound_layers[0].name # 前一个层if 'conv' in layer.name.lower() and len(layer.get_weights()) > 0: # 确保有权重return layer.namereturn None # 没找到合适的卷积层if __name__ == '__main__':# 示例用法 (需要 utils 和 model_builder 的支持)import osfrom utils import load_and_preprocess_image, display_grad_camfrom model_builder import build_model# 假设你已经训练并保存了一个模型MODEL_PATH = 'saved_model/best_model.h5'if not os.path.exists(MODEL_PATH):print(f"Error: Model not found at {MODEL_PATH}.")print("Please run `python main.py --mode train` first to train and save a model.")exit()print(f"Loading model from {MODEL_PATH}...")model = tf.keras.models.load_model(MODEL_PATH)# 示例图片# 请替换为你的数据集中实际存在的猫或狗的图片路径TEST_IMAGE_PATH_CAT = 'data/validation/cats/cat.2000.jpg' # 示例猫图片TEST_IMAGE_PATH_DOG = 'data/validation/dogs/dog.2000.jpg' # 示例狗图片TARGET_IMG_SIZE = (150, 150) # 模型的输入尺寸# 获取最后一个卷积层的名称last_conv_layer_name = get_last_conv_layer_name(model)if last_conv_layer_name is None:print("Could not automatically determine the last convolutional layer. Please specify manually.")# 根据MobileNetV2架构,通常是 'out_relu' 之前的卷积层,例如 'block_16_project'# 或者更底层的 'Conv_1'。'out_relu'是其输出。last_conv_layer_name = 'block_16_project' # MobileNetV2的倒数第二个block的project层# 更好的方法是查看 model.summary() 找到最后一个 Conv2D 或 DepthwiseConv2D 层print(f"Using '{last_conv_layer_name}' as target layer for Grad-CAM.")# 可视化猫的图片print(f"\nProcessing {TEST_IMAGE_PATH_CAT}...")original_img_cat, img_array_cat = load_and_preprocess_image(TEST_IMAGE_PATH_CAT, target_size=TARGET_IMG_SIZE)predictions_cat = model.predict(img_array_cat)predicted_class_index_cat = int(round(predictions_cat[0][0])) # 二分类class_names = ['cat', 'dog'] # 假设 'cat' -> 0, 'dog' -> 1predicted_label_cat = class_names[predicted_class_index_cat]print(f"Predicted: {predicted_label_cat} (Score: {predictions_cat[0][0]:.4f})")heatmap_cat = generate_grad_cam(model, img_array_cat, last_conv_layer_name, pred_index=predicted_class_index_cat)display_grad_cam(original_img_cat, heatmap_cat)# 可视化狗的图片print(f"\nProcessing {TEST_IMAGE_PATH_DOG}...")original_img_dog, img_array_dog = load_and_preprocess_image(TEST_IMAGE_PATH_DOG, target_size=TARGET_IMG_SIZE)predictions_dog = model.predict(img_array_dog)predicted_class_index_dog = int(round(predictions_dog[0][0]))predicted_label_dog = class_names[predicted_class_index_dog]print(f"Predicted: {predicted_label_dog} (Score: {predictions_dog[0][0]:.4f})")heatmap_dog = generate_grad_cam(model, img_array_dog, last_conv_layer_name, pred_index=predicted_class_index_dog)display_grad_cam(original_img_dog, heatmap_dog)
9. 编写 main.py
这是项目的入口点,负责调度训练和 Grad-CAM 可视化。
# grad_cam_project/main.pyimport argparse
import tensorflow as tf
import os# 从其他文件导入函数
from data_loader import load_data
from model_builder import build_model
from trainer import train_model
from grad_cam import generate_grad_cam, get_last_conv_layer_name
from utils import load_and_preprocess_image, display_grad_cam# 配置参数
DATA_DIR = 'data'
IMG_SIZE = (150, 150) # 图片尺寸,MobileNetV2通常推荐128x128以上
BATCH_SIZE = 32
EPOCHS = 10 # 示例用,实际训练可能需要更多,建议50+
MODEL_SAVE_DIR = 'saved_model'
MODEL_PATH = os.path.join(MODEL_SAVE_DIR, 'best_model.h5')def main():parser = argparse.ArgumentParser(description="Train CNN and visualize with Grad-CAM.")parser.add_argument('--mode', type=str, default='train', choices=['train', 'visualize'],help="Choose mode: 'train' for training, 'visualize' for Grad-CAM.")parser.add_argument('--image_path', type=str, default=None,help="Path to the image for Grad-CAM visualization. Required if mode is 'visualize'.")parser.add_argument('--target_layer', type=str, default=None,help="Name of the target convolutional layer for Grad-CAM. E.g., 'block_16_project' for MobileNetV2.")args = parser.parse_args()if args.mode == 'train':print("\n--- Starting Training Mode ---")train_generator, validation_generator, num_classes, class_names = load_data(DATA_DIR, IMG_SIZE, BATCH_SIZE)# 确保目录存在os.makedirs(MODEL_SAVE_DIR, exist_ok=True)model = build_model(IMG_SIZE + (3,), num_classes) # (H, W, C)train_model(model, train_generator, validation_generator, EPOCHS, MODEL_SAVE_DIR)print("Training completed and model saved.")elif args.mode == 'visualize':print("\n--- Starting Visualization Mode ---")if not os.path.exists(MODEL_PATH):print(f"Error: Model not found at {MODEL_PATH}. Please run 'python main.py --mode train' first.")returnif args.image_path is None or not os.path.exists(args.image_path):print("Error: --image_path is required and must be a valid path for visualization mode.")returnprint(f"Loading model from {MODEL_PATH}...")model = tf.keras.models.load_model(MODEL_PATH)# 为了获取 class_names,我们再次加载数据生成器,但这仅用于获取类别名称。# 实际使用时,如果模型保存时保存了这些信息,可以从模型中读取。_, _, _, class_names = load_data(DATA_DIR, IMG_SIZE, BATCH_SIZE) print(f"Processing image: {args.image_path}")original_img, img_array = load_and_preprocess_image(args.image_path, target_size=IMG_SIZE)predictions = model.predict(img_array)if model.output_shape[-1] == 1: # Binary classification (e.g., sigmoid output)predicted_class_index = int(round(predictions[0][0]))predicted_label = class_names[predicted_class_index]prediction_score = predictions[0][0]else: # Multi-class classification (e.g., softmax output)predicted_class_index = tf.argmax(predictions[0]).numpy()predicted_label = class_names[predicted_class_index]prediction_score = predictions[0][predicted_class_index]print(f"Predicted: {predicted_label} (Score: {prediction_score:.4f})")target_layer_name = args.target_layerif target_layer_name is None:target_layer_name = get_last_conv_layer_name(model)if target_layer_name is None:print("Could not automatically determine the last convolutional layer. Please specify manually using --target_layer.")print("For MobileNetV2, try 'block_16_project'.")returnprint(f"Automatically determined target layer: '{target_layer_name}'")print(f"Generating Grad-CAM for layer: '{target_layer_name}'")heatmap = generate_grad_cam(model, img_array, target_layer_name, pred_index=predicted_class_index)display_grad_cam(original_img, heatmap)print("Grad-CAM visualization completed.")else:print("Invalid mode. Please choose 'train' or 'visualize'.")if __name__ == '__main__':main()
10. 运行和测试
A. 训练模型:
python main.py --mode train
这会开始训练过程,并根据 val_accuracy
保存最佳模型到 saved_model/best_model.h5
。训练可能需要一些时间,取决于你的数据集大小和硬件。
B. 可视化 Grad-CAM:
在训练完成后,选择一张猫或狗的图片进行可视化。
例如,如果你的 data/validation/cats
目录中有一张 cat.2000.jpg
:
python main.py --mode visualize --image_path data/validation/cats/cat.2000.jpg --target_layer block_16_project
注意: --target_layer block_16_project
针对 MobileNetV2。如果你使用不同的模型,或者想看其他层,请根据 model.summary()
找到相应的卷积层名称。get_last_conv_layer_name
函数会尝试自动寻找,但手动指定更保险。
运行后,会弹出一个窗口显示原始图片和叠加了热力图的图片,热力图会高亮显示模型在做出预测时关注的图像区域。
代码解释和进阶考虑:
- 数据加载 (
data_loader.py
): 使用ImageDataGenerator
简化了数据加载和预处理,并支持数据增强,这对于防止过拟合非常重要。 - 模型构建 (
model_builder.py
): 采用了迁移学习。MobileNetV2
在 ImageNet 上预训练,能够提取强大的通用特征。我们冻结了它的卷积基,只训练顶部的分类器,这能大大加速训练并提高性能。 - 训练 (
trainer.py
):ModelCheckpoint
: 确保保存了训练过程中表现最佳的模型。EarlyStopping
: 当模型在验证集上的性能不再提升时,提前停止训练,节省时间和防止过拟合。ReduceLROnPlateau
: 当验证损失停滞时,动态降低学习率,有助于模型更好地收敛。
- Grad-CAM (
grad_cam.py
):- 核心思想是利用目标类别(或预测最高分数的类别)相对于最后一个卷积层特征图的梯度,来计算每个特征图的重要性。
- 将梯度(重要性)与特征图相乘,然后 ReLU 激活,得到热力图。
- 热力图尺寸较小,需要上采样到原始图片尺寸。
get_last_conv_layer_name
是一个辅助函数,用于尝试自动查找最后一个卷积层,但手动指定通常更可靠。
- 工具函数 (
utils.py
): 封装了图片加载、预处理和热力图可视化的通用逻辑,保持grad_cam.py
的核心逻辑清晰。 - 主程序 (
main.py
): 使用argparse
处理命令行参数,使得程序可以灵活地在训练和可视化模式之间切换。