小卷积核堆叠
统一架构设计
模块 | 层类型 | 输出尺寸 | 关键参数 |
---|---|---|---|
Block 1 | Conv3×3 + ReLU ×2 | 224×224×64 | Padding=1 |
MaxPool2×2 | 112×112×64 | Stride=2 | |
Block 2 | Conv3×3 + ReLU ×2 | 112×112×128 | Padding=1 |
MaxPool2×2 | 56×56×128 | Stride=2 | |
Block 3 | Conv3×3 + ReLU ×3 | 56×56×256 | Padding=1 |
MaxPool2×2 | 28×28×256 | Stride=2 | |
Block 4 | Conv3×3 + ReLU ×3 | 28×28×512 | Padding=1 |
MaxPool2×2 | 14×14×512 | Stride=2 | |
Block 5 | Conv3×3 + ReLU ×3 | 14×14×512 | Padding=1 |
MaxPool2×2 | 7×7×512 | Stride=2 | |
分类器 | FC4096 + ReLU + Dropout | 4096 | Dropout=0.5 |
FC4096 + ReLU + Dropout | 4096 | Dropout=0.5 | |
FC1000 + Softmax | 1000 | 输出概率 |
结构特点:
- 总计13个卷积层 + 3个全连接层
- 所有卷积层 padding=1 保持分辨率不变
- 全连接层参数量占比超90%,计算成本高
import torch.nn as nn
def make_layers(cfg: list): # 例如: [64, 'M', 128, 'M', ...]
layers = []
in_channels = 3
for v in cfg:
if v == 'M': # 池化层
layers += [nn.MaxPool2d(2, stride=2)]
else: # 卷积层
layers += [
nn.Conv2d(in_channels, v, kernel_size=3, padding=1),
nn.ReLU(inplace=True)
]
in_channels = v
return nn.Sequential(*layers)
# VGG16配置
vgg16_cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M',
512, 512, 512, 'M', 512, 512, 512, 'M']
class VGG(nn.Module):
def __init__(self, num_classes=1000):
super().__init__()
self.features = make_layers(vgg16_cfg)
self.avgpool = nn.AdaptiveAvgPool2d((7, 7)) # 自适应池化
self.classifier = nn.Sequential(
nn.Linear(512*7*7, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, num_classes)
)
self._init_weights() # 初始化权重
def _init_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
nn.AdaptiveAvgPool2d((7,7))
兼容任意输入尺寸vgg16_bn
在卷积后添加nn.BatchNorm2d
from torchvision import models
# 加载基础模型
vgg = models.vgg16(pretrained=True) # 自动下载权重
# 加载带BN的模型
vgg_bn = models.vgg16_bn(pretrained=True)
# 冻结特征提取层
for param in vgg.features.parameters():
param.requires_grad = False
# 替换分类器(适配新任务)
vgg.classifier[6] = nn.Linear(4096, 10) # 10分类任务
# 仅训练分类层
optimizer = torch.optim.Adam(vgg.classifier.parameters(), lr=0.001)
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224), # VGG标准输入尺寸
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]) # ImageNet统计值
])
# 查看特征层结构
print(vgg.features[0:5]) # 输出Block1的层序列
# 提取中间特征(以第3个池化层为例)
feature_extractor = nn.Sequential(*list(vgg.features.children())[:15])
创新价值:
工程实践要点:
torchvision
预训练模型加速开发局限与改进:
官方实现参考:
- torchvision.models.vgg源码
- 预训练权重配置