当前位置：首页 > backend >正文

四、CV_GoogLeNet

backend 2025/7/19 8:15:35

四、GoogLeNet

GoogLeNet在加深度的同时做了结构上的创新，引入了一个叫做Inception的结构来代替之前的卷积加激活的经典组件。

1.Inception块

GoogLeNet中的基础卷积块叫做Inception块，其结构较为复杂

Inception块里有4条并行的线路，前三条线路使用窗口大小分别是1 $×\times$ 1、3 $×\times$ 3、5 $×\times$ 5的卷积层来抽取不同空间尺寸下的信息，其中中间2个线路会对输入先做1 $×\times$ 1卷积来减少输入通道数，以降低模型复杂度。第4条线路则使用3 $×\times$ 3最大池化层，后接1 $×\times$ 1卷积层来改变通道数。4条线路都使用了合适的填充来使输入与输出的高宽一致。最后我们将每条线路的输出在通道维上连结，并向后进行传播

（1）1 $×\times$ 1卷积

它和其他卷积核的唯一区别是没有考虑在特征图局部信息之间的关系

1 $×\times$ 1卷积的作用

实现跨通道的交互和信息整合
卷积核通道数的降维（通道数减少）和升维，减少网络参数

在tf.keras中实现Inception模块，各个卷积层卷积核的个数通过参数来控制

# 定义Inception模块
class Inception(tf.keras.layers.Layer):def __init__(self, c1, c2, c3, c4):super().__init__()#线路1self.p1_1 = tf.keras.layers.Conv2D(c1, kernel_size = 1, activation = 'relu', padding = 'same')# 线路2self.p2_1 = tf.keras.layers.Conv2D(c2[0], kernel_size = 1, activation = 'relu', padding = 'same')self.p2_2 = tf.keras.layers.Conv2D(c1[1], kernel_size = 3, activation = 'relu', padding = 'same')# 线路3self.p3_1 = tf.keras.layers.Conv2D(c3[0], kernel_size = 1, activation = 'relu', padding = 'same')self.p3_1 = tf.keras.layers.Conv2D(c3[1], kernel_size = 5, activation = 'relu', padding = 'same')# 线路4self.p4_1 = tf.keras.layers.MaxPool2D(pool_size = 3, padding = 'same', strides = 1)self.p3_1 = tf.keras.layers.Conv2D(c4, kernel_size = 1, activation = 'relu', padding = 'same')# 完成前向传播过程def call(self, x):p1 = self.p1_1(x)p2 = self.p2_2(self.p2_1(x))p3 = self.p3_2(self.p3_1(x))p4 = self.p4_2(self.p4_1(x))# 在通道维上concat输出outputs = tf.concat([p1, p2, p3, p4], axis = -1)return outputs

指定通道数，对Inception模块进行实例化

Inception(64, (96, 128), (16, 32), 32) # 每个卷积层的卷积核数

2.GoogLeNet模型

GoogLeNet主要由Inception块构成

整个网络架构分为五个模块（B1, B2, B3, B4, B5 ），每个模块之间使用步幅为2的3 $×\times$ 3最大池化层来减小输出高宽

（1）B1模块

第一模块使用了一个64通道的7 $×\times$ 7卷积层

# 定义模型的输入
inputs = tf.keras.Input(shape = (224, 224, 3), name = 'input')x = tf.keras.layers.Conv2D(64, kernel_size = 7, strides = 2, padding = 'same', activation = 'relu')(inputs)x = tf.keras.layers.MaxPool2D(pool_size = 3, strides = 2, padding = 'same')(x)

（2）B2模块

第二个模块使用2个卷积层：首先是64通道的1 $×\times$ 1卷积层，然后是将通道增大3倍的3 $×\times$ 3卷积层

# B2模块x = tf.keras.layers.Conv2D(64, kernel_size = 1, padding = 'same', activation = 'relu')(x)
x = tf.keras.layers.Conv2D(192, kernel_size = 3, padding = 'same', activation = 'relu')(x)
x = tf.keras.layers.MaxPool2D(pool_size = 3, padding = 'same', strides = 2)(x)

（3）B3模块

第三个模块串联2个完整的Inception块。第一个Inception块的通道数是 $64 + 128 + 32 + 32 = 256$ ，第二个Inception块输出通道数增至 $128 + 192 + 96 + 64 = 480$

# B3模块x = Inception(64, (96, 128), (16, 32), 32)(x)
x = Inception(128, (128, 192), (32, 96), 64)(x)
x = tf.keras.layers.MaxPool2D(pool_size = 3, strides = 2, padding = 'same')(x)

（4）B4模块

第四模块更加复杂，它串联了5个Inception模块，其输出通道数分别是

$192 + 208 + 48 + 64 = 512$

$160 + 224 + 64 + 64 = 512$

$128 + 256 + 64 + 64 = 512$

$112 + 288 + 64 + 64 = 528$

$26 + 320 + 128 + 128 = 832$

并且增加了辅助分类器（放在第一个Inception后面和最后一个Inception前面），根据实验发现网络的中间层具有很强的识别能力，为了利用中间层抽象的特征，在某些中间层中添加含有多层的分类器，如下图所示：

辅助分类器实现：

def aux_classifier(x, filter_size):# x是输入数据，filter_size：卷积层卷积核个数，全连接层神经元个数# 池化层x = tf.keras.layers.AveragePooling2D(pool_size = 5, strides = 3, padding = 'same')(x)# 1x1卷积层x = tf.keras.layers.Conv2D(filters = filter_size[0], kernel_size = 1, strides = 1, padding = 'valid', activation = 'relu')(x)# 展平x = tf.keras.layers.Flatten()(x)# 全连接层x = tf.keras.layers.Dense(units = filter_size[1], activation = 'relu')(x)# Softmax输出层x = tf.keras.layers.Dense(units = 10, activation = 'softmax')(x)return x

B4模块实现

# B4模块#Inception
x = Inception(192, (96, 208), (16, 48), 64)(x)
# 辅助输出1
aux_output_1 = aux_classifier(x, [128, 1024])
# Inception
x = Inception(160, (112, 224), (24, 64), 64)(x)
# Inception
x = Inception(128, (128, 256), (24, 64), 64)(x)
# Inception
x = Inception(112, (144, 288), (32, 64), 64)(x)
# 辅助输出2
aux_output_2 = aux_classifier(x, [128, 1024])
# Inception
x = Inception(256, (160, 320), (32, 128), 128)(x)
# 最大池化
x = tf.keras.layers.MaxPool2D(pool_size = 3, strides = 2, padding = 'same')(x)

（5）B5模块

它串联了2个Inception模块，其输出通道数分别是

$256 + 320 + 128 + 128 = 832$

$384 + 384 + 128 + 128 = 1024$

后面紧跟输出层，该模块使用全局平均池化层（GPA）来将每个通道的高和宽变成1。最后输出变成二维数组后接输出个数为标签类别数的全连接层。

全局平均池化层（GPA）

用来替代全连接层，将特征图每一通道中所有像素值相加后求平均，得到就是GPA的结果，再将其送入后续网络中进行计算

实现过程：

# B5模块
# Inception
x = Inception(256, (160, 320), (32, 128), 128)(x)
# Inception
x = Inception(384, (192, 384), (48, 128), 128)(x)
# GPA
x = tf.keras.layers.GlobalAvgPool2D()(x)
# 输出层
main_outputs = tf.keras.layers.Dense(10, activation = 'softmax')(x)

（6）最终

构建GooLeNet模型并通过summary来看下模型的结构：

# 使用Model来创建模型，指明输入和输出
model = tf.keras.Model(inputs = inputs, outputs = [main_outputs, aux_output_1, aux_output_2])
model.summary()

3.手写数字识别

（1）数据读取

import numpy as np
from tensorflow.keras.datasets import mnist(train_images, train_labels), (test_images, test_labels) = mnist.load_data()# N H W C
train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1], train_images.shape[2], 1))test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1], test_images.shape[2], 1))
# 定义两个方法随机抽取部分样本演示def get_train(size):index = np.random.randint(0, np.shape(train_images)[0], size)resize_images = tf.image.resize_with_pad(train_images[index], 224, 224, )return resize_images.numpy(), train_labels[index]def get_test(size):index = np.random.randint(0, np.shape(test_images)[0], size)resize_images = tf.image.resize_with_pad(test_images[index], 224, 224, )return resize_images.numpy(), test_labels[index]
# 获取训练样本和测试样本
train_image, train_label = get_train(256)
test_image, test_label = get_test(128)

（2）模型编译

有三个输出，需指定权重

# 优化器，损失函数，评价指标
model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate = 0.01),loss = tf.keras.losses.sparse_categorical_crossentropy,metrics = ['accuracy'],loss_weights = [1, 0.3, 0.3])

（3）模型训练

model.fit(train_images, train_labels, batch_size = 128, epochs = 3, verbose = 1, validation_split = 0.1)

（4）模型评估

model.evaluate(test_images, test_labels, verbose = 1)

4.延伸版本

（1）InceptionV2

在InceptionV2中将大卷积核拆分为小卷积核，将V1中的5 $×\times$ 5卷积用两个3 $×\times$ 3的卷积代替，从而增加网络的深度，减少了参数

（2）InceptionV3

将n $×\times$ n卷积分割为1 $×\times$ n和n $×\times$ 1两个卷积，例如，一个3 $×\times$ 3的卷积首先执行一个1 $×\times$ 3的卷积，然后执行一个3 $×\times$ 1的卷积，这种方法的参数量和计算量均降低

查看全文

http://www.xdnf.cn/news/15681.html

Linux | Bash 子字符串提取

尺寸标注识别5 实例分割 roboflow | result.boxes获取边界框 | yolov8n-seg架构 torchinfo | 对直线关系不敏感

20250718-4-Kubernetes 应用程序生命周期管理-Pod对象：实现机制_笔记

【宇树科技：未来1-3年，机器人可流水线打螺丝】

服务攻防-Java组件安全FastJson高版本JNDI不出网C3P0编码绕WAF写入文件CI链

提示工程核心概念：与AI清晰沟通的艺术

html复习

【Spring WebFlux】什么是响应式编程

软件测试全谱系深度解析：从单元到生产的质量保障体系

C#测试调用ServiceController类查询及操作服务的基本用法

阿里云ubuntu建一个简单网页+公网访问+域名访问

Maven 配置文件核心配置：本地仓库、镜像与 JDK 版本

SQL映射文件

Vue3 业务落地全景：脚手架、权限、国际化、微前端、跨端与低代码 50 条实战心法

闲庭信步使用图像验证平台加速FPGA的开发：第二十二课——图像直方图统计的FPGA实现

【C++】总结—哪些场景下会产生临时变量或者临时对象？

k8s:手动创建PV，解决postgis数据库本地永久存储

React条件渲染

零信任产品联合宁盾泛终端网络准入，打造随需而变、精准贴合业务的网络安全访问体系

Docker 与 GPU 训练

OSPF路由协议的协商过程

Java全栈面试实录：从电商场景到AIGC的深度技术考察

基于现代R语言【Tidyverse、Tidymodel】的机器学习方法与案例分析

Maven私服仓库，发布jar到私服仓库，依赖的版本号如何设置，规范是什么

精通 triton 使用 MLIR 的源码逻辑 - 第002节：再掌握一些 triton 语法 — 通过 02 softmax

生成式引擎优化（GEO）核心解析：下一代搜索技术的演进与落地策略

Python包发布与分发全指南：从PyPI到企业私有仓库

LiteCloud超轻量级网盘项目基于Spring Boot

Solr7升级Solr8全攻略：从Core重命名到IK分词兼容，零业务中断实战指南

css样式中的选择器和盒子模型