transfer

(转)迁移学习官方教程


翻译+搬运自:http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html#training-the-model

前置阅读:PyTorch深度学习:60分钟入门(Translation) - 知乎专栏


实际情况下,很少有人从头开始训练整个卷积网络(网络参数随机初始化),因为很难获得足够数量的训练图片,容易造成网络过拟合。所以,通常在一个非常大的数据集(如ImageNet图像,其中包含120万和1000类)上预训练ConvNet,然后用ConvNet作为参数初始化或固定特征提取器。

有两种迁移场景:

Finetuning the convnet:

我们用预训练网络的权值来初始化,来代替原先的随机初始化操作,其余训练过程照常。(这种方法通常叫做finetuning)

ConvNet as fixed feature extractor:

冻结所有网络的权重,除了最后的全连接层,对全连接层的权重做随机初始化后,开始训练,训练过程中只更新全连接层的参数。(这种方法就好比把pretrain的网络作为一个特征提取器)


需要用到的包:


   
   
   
   
  1. # License: BSD
  2. # Author: Sasank Chilamkurthy
  3. from __future__ import print_function, division
  4. import torch
  5. import torch.nn as nn
  6. import torch.optim as optim
  7. from torch.autograd import Variable
  8. import numpy as np
  9. import torchvision
  10. from torchvision import datasets, models, transforms
  11. import matplotlib.pyplot as plt
  12. import time
  13. import copy
  14. import os
  15. plt.ion() # interactive mode

加载数据

我们使用torchvision和torch.utils.data包来导入数据

现在要解决的问题是训练一个模型来对蚂蚁和蜜蜂进行分类。我们有大约120个训练图像用于蚂蚁和蜜蜂。每个类有75个验证图像。通常,如果从零开始训练,这是一个非常小的数据集。所以我们使用迁移学习,以得到更好的结果。(这个数据集是ImageNet中一个很小的子集)

请从这里下载数据集,然后把它解压到工程目录下。(下面是数据增广和标准化)


   
   
   
   
  1. # Data augmentation and normalization for training
  2. # Just normalization for validation
  3. data_transforms = {
  4. 'train': transforms.Compose([
  5. transforms.RandomSizedCrop( 224),
  6. transforms.RandomHorizontalFlip(),
  7. transforms.ToTensor(),
  8. transforms.Normalize([ 0.485, 0.456, 0.406], [ 0.229, 0.224, 0.225])
  9. ]),
  10. 'val': transforms.Compose([
  11. transforms.Scale( 256),
  12. transforms.CenterCrop( 224),
  13. transforms.ToTensor(),
  14. transforms.Normalize([ 0.485, 0.456, 0.406], [ 0.229, 0.224, 0.225])
  15. ]),
  16. }
  17. data_dir = 'hymenoptera_data'
  18. dsets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
  19. for x in [ 'train', 'val']}
  20. dset_loaders = {x: torch.utils.data.DataLoader(dsets[x], batch_size= 4,
  21. shuffle= True, num_workers= 4)
  22. for x in [ 'train', 'val']}
  23. dset_sizes = {x: len(dsets[x]) for x in [ 'train', 'val']}
  24. dset_classes = dsets[ 'train'].classes
  25. use_gpu = torch.cuda.is_available()

可视化

让我们可视化一些训练图像,以便更好的了解数据增广


   
   
   
   
  1. def imshow(inp, title=None):
  2. """Imshow for Tensor."""
  3. inp = inp.numpy().transpose(( 1, 2, 0))
  4. mean = np.array([ 0.485, 0.456, 0.406])
  5. std = np.array([ 0.229, 0.224, 0.225])
  6. inp = std * inp + mean
  7. plt.imshow(inp)
  8. if title is not None:
  9. plt.title(title)
  10. plt.pause( 0.001) # pause a bit so that plots are updated
  11. # Get a batch of training data
  12. inputs, classes = next(iter(dset_loaders[ 'train']))
  13. # Make a grid from batch
  14. out = torchvision.utils.make_grid(inputs)
  15. imshow(out, title=[dset_classes[x] for x in classes])

transfer_第1张图片

训练模型说明:

现在,让我们编写一个通用函数来训练模型。在这里,我们将说明:

1.学习率的调度

2.保存最佳模型

下面的模型中,

lr_scheduler(optimizer, epoch)
   
   
   
   

是一个更改优化器的函数,从而使得学习率能根据我们指定的方案来调整。


   
   
   
   
  1. def train_model(model, criterion, optimizer, lr_scheduler, num_epochs=25):
  2. since = time.time()
  3. best_model = model
  4. best_acc = 0.0
  5. for epoch in range(num_epochs):
  6. print( 'Epoch {}/{}'.format(epoch, num_epochs - 1))
  7. print( '-' * 10)
  8. # Each epoch has a training and validation phase
  9. for phase in [ 'train', 'val']:
  10. if phase == 'train':
  11. optimizer = lr_scheduler(optimizer, epoch)
  12. model.train( True) # Set model to training mode
  13. else:
  14. model.train( False) # Set model to evaluate mode
  15. running_loss = 0.0
  16. running_corrects = 0
  17. # Iterate over data.
  18. for data in dset_loaders[phase]:
  19. # get the inputs
  20. inputs, labels = data
  21. # wrap them in Variable
  22. if use_gpu:
  23. inputs, labels = Variable(inputs.cuda()), \
  24. Variable(labels.cuda())
  25. else:
  26. inputs, labels = Variable(inputs), Variable(labels)
  27. # zero the parameter gradients
  28. optimizer.zero_grad()
  29. # forward
  30. outputs = model(inputs)
  31. _, preds = torch.max(outputs.data, 1)
  32. loss = criterion(outputs, labels)
  33. # backward + optimize only if in training phase
  34. if phase == 'train':
  35. loss.backward()
  36. optimizer.step()
  37. # statistics
  38. running_loss += loss.data[ 0]
  39. running_corrects += torch.sum(preds == labels.data)
  40. epoch_loss = running_loss / dset_sizes[phase]
  41. epoch_acc = running_corrects / dset_sizes[phase]
  42. print( '{} Loss: {:.4f} Acc: {:.4f}'.format(
  43. phase, epoch_loss, epoch_acc))
  44. # deep copy the model
  45. if phase == 'val' and epoch_acc > best_acc:
  46. best_acc = epoch_acc
  47. best_model = copy.deepcopy(model)
  48. print()
  49. time_elapsed = time.time() - since
  50. print( 'Training complete in {:.0f}m {:.0f}s'.format(
  51. time_elapsed // 60, time_elapsed % 60))
  52. print( 'Best val Acc: {:4f}'.format(best_acc))
  53. return best_model

学习率调整方案

让我们创建学习速率调整方案。我们每隔几次epoch就会成倍地降低学习率。


   
   
   
   
  1. def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
  2. """Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
  3. lr = init_lr * ( 0.1**(epoch // lr_decay_epoch))
  4. if epoch % lr_decay_epoch == 0:
  5. print( 'LR is set to {}'.format(lr))
  6. for param_group in optimizer.param_groups:
  7. param_group[ 'lr'] = lr
  8. return optimizer

可视化模型预测


   
   
   
   
  1. def visualize_model(model, num_images=6):
  2. images_so_far = 0
  3. fig = plt.figure()
  4. for i, data in enumerate(dset_loaders[ 'val']):
  5. inputs, labels = data
  6. if use_gpu:
  7. inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
  8. else:
  9. inputs, labels = Variable(inputs), Variable(labels)
  10. outputs = model(inputs)
  11. _, preds = torch.max(outputs.data, 1)
  12. for j in range(inputs.size()[ 0]):
  13. images_so_far += 1
  14. ax = plt.subplot(num_images// 2, 2, images_so_far)
  15. ax.axis( 'off')
  16. ax.set_title( 'predicted: {}'.format(dset_classes[preds[j]]))
  17. imshow(inputs.cpu().data[j])
  18. if images_so_far == num_images:
  19. return

Finetuning我们的网络

载入预训练的模型,重置最后的全连接层


   
   
   
   
  1. model_ft = models.resnet18(pretrained= True)
  2. num_ftrs = model_ft.fc.in_features
  3. model_ft.fc = nn.Linear(num_ftrs, 2)
  4. if use_gpu:
  5. model_ft = model_ft.cuda()
  6. criterion = nn.CrossEntropyLoss()
  7. # Observe that all parameters are being optimized
  8. optimizer_ft = optim.SGD(model_ft.parameters(), lr= 0.001, momentum= 0.9)

训练和测试

CPU上大约需要15-25分钟。 在GPU上,需要不到一分钟的时间。 


     
     
     
     
  1. model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
  2. num_epochs= 25)

输出:


     
     
     
     
  1. Epoch 0/ 24
  2. ----------
  3. LR is set to 0.001
  4. train Loss: 0.1504 Acc: 0.6762
  5. val Loss: 0.0756 Acc: 0.8627
  6. Epoch 1/ 24
  7. ----------
  8. train Loss: 0.1742 Acc: 0.7664
  9. val Loss: 0.1215 Acc: 0.8301
  10. Epoch 2/ 24
  11. ----------
  12. train Loss: 0.1583 Acc: 0.7500
  13. val Loss: 0.1291 Acc: 0.8039
  14. Epoch 3/ 24
  15. ----------
  16. train Loss: 0.1580 Acc: 0.7541
  17. val Loss: 0.0714 Acc: 0.8824
  18. Epoch 4/ 24
  19. ----------
  20. train Loss: 0.1135 Acc: 0.8115
  21. val Loss: 0.1669 Acc: 0.7778
  22. Epoch 5/ 24
  23. ----------
  24. train Loss: 0.1102 Acc: 0.8279
  25. val Loss: 0.0687 Acc: 0.9020
  26. Epoch 6/ 24
  27. ----------
  28. train Loss: 0.0814 Acc: 0.8730
  29. val Loss: 0.0660 Acc: 0.9281
  30. Epoch 7/ 24
  31. ----------
  32. LR is set to 0.0001
  33. train Loss: 0.1015 Acc: 0.8238
  34. val Loss: 0.0612 Acc: 0.9281
  35. Epoch 8/ 24
  36. ----------
  37. train Loss: 0.0848 Acc: 0.8525
  38. val Loss: 0.0614 Acc: 0.9281
  39. Epoch 9/ 24
  40. ----------
  41. train Loss: 0.1072 Acc: 0.8033
  42. val Loss: 0.0620 Acc: 0.9150
  43. Epoch 10/ 24
  44. ----------
  45. train Loss: 0.0570 Acc: 0.9139
  46. val Loss: 0.0616 Acc: 0.9216
  47. Epoch 11/ 24
  48. ----------
  49. train Loss: 0.0781 Acc: 0.8689
  50. val Loss: 0.0657 Acc: 0.9085
  51. Epoch 12/ 24
  52. ----------
  53. train Loss: 0.0800 Acc: 0.8525
  54. val Loss: 0.0595 Acc: 0.9020
  55. Epoch 13/ 24
  56. ----------
  57. train Loss: 0.0724 Acc: 0.8689
  58. val Loss: 0.0536 Acc: 0.9150
  59. Epoch 14/ 24
  60. ----------
  61. LR is set to 1.0000000000000003e-05
  62. train Loss: 0.0778 Acc: 0.8770
  63. val Loss: 0.0519 Acc: 0.9216
  64. Epoch 15/ 24
  65. ----------
  66. train Loss: 0.0574 Acc: 0.9057
  67. val Loss: 0.0577 Acc: 0.9216
  68. Epoch 16/ 24
  69. ----------
  70. train Loss: 0.0701 Acc: 0.8934
  71. val Loss: 0.0631 Acc: 0.9216
  72. Epoch 17/ 24
  73. ----------
  74. train Loss: 0.0970 Acc: 0.8361
  75. val Loss: 0.0536 Acc: 0.9216
  76. Epoch 18/ 24
  77. ----------
  78. train Loss: 0.0639 Acc: 0.8975
  79. val Loss: 0.0655 Acc: 0.9150
  80. Epoch 19/ 24
  81. ----------
  82. train Loss: 0.0699 Acc: 0.9016
  83. val Loss: 0.0494 Acc: 0.9281
  84. Epoch 20/ 24
  85. ----------
  86. train Loss: 0.0765 Acc: 0.8648
  87. val Loss: 0.0540 Acc: 0.9346
  88. Epoch 21/ 24
  89. ----------
  90. LR is set to 1.0000000000000002e-06
  91. train Loss: 0.0854 Acc: 0.8484
  92. val Loss: 0.0562 Acc: 0.9216
  93. Epoch 22/ 24
  94. ----------
  95. train Loss: 0.0822 Acc: 0.8852
  96. val Loss: 0.0513 Acc: 0.9216
  97. Epoch 23/ 24
  98. ----------
  99. train Loss: 0.0561 Acc: 0.9139
  100. val Loss: 0.0634 Acc: 0.9085
  101. Epoch 24/ 24
  102. ----------
  103. train Loss: 0.0645 Acc: 0.8975
  104. val Loss: 0.0507 Acc: 0.9281
  105. Training complete in 1m 3s
  106. Best val Acc: 0.934641

可视化:

visualize_model(model_ft)
     
     
     
     

transfer_第2张图片

用固定特征提取器的方法训练网络

这里我们需要冻结所有网络参数,除了最后一层,设

requires_grad == False
     
     
     
     

来冻结参数,使得其在反传的时候不会更新


     
     
     
     
  1. model_conv = torchvision.models.resnet18(pretrained= True)
  2. for param in model_conv.parameters():
  3. param.requires_grad = False
  4. # Parameters of newly constructed modules have requires_grad=True by default
  5. num_ftrs = model_conv.fc.in_features
  6. model_conv.fc = nn.Linear(num_ftrs, 2)
  7. if use_gpu:
  8. model_conv = model_conv.cuda()
  9. criterion = nn.CrossEntropyLoss()
  10. # Observe that only parameters of final layer are being optimized as
  11. # opoosed to before.
  12. optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr= 0.001, momentum= 0.9)

训练和测试

在CPU上,与以前的场景相比,这将花费大约一半的时间。这是因为不需要计算大多数网络的梯度。然而,前向仍然是都需要计算的。


     
     
     
     
  1. model_conv = train_model(model_conv, criterion, optimizer_conv,
  2. exp_lr_scheduler, num_epochs= 25)

输出:


     
     
     
     
  1. Epoch 0/ 24
  2. ----------
  3. LR is set to 0.001
  4. train Loss: 0.1510 Acc: 0.6762
  5. val Loss: 0.0555 Acc: 0.9281
  6. Epoch 1/ 24
  7. ----------
  8. train Loss: 0.0988 Acc: 0.8074
  9. val Loss: 0.0454 Acc: 0.9412
  10. Epoch 2/ 24
  11. ----------
  12. train Loss: 0.1206 Acc: 0.7828
  13. val Loss: 0.0967 Acc: 0.9020
  14. Epoch 3/ 24
  15. ----------
  16. train Loss: 0.1620 Acc: 0.7377
  17. val Loss: 0.1390 Acc: 0.8039
  18. Epoch 4/ 24
  19. ----------
  20. train Loss: 0.1440 Acc: 0.7705
  21. val Loss: 0.0607 Acc: 0.9216
  22. Epoch 5/ 24
  23. ----------
  24. train Loss: 0.1181 Acc: 0.7992
  25. val Loss: 0.0762 Acc: 0.8758
  26. Epoch 6/ 24
  27. ----------
  28. train Loss: 0.1536 Acc: 0.7664
  29. val Loss: 0.0454 Acc: 0.9608
  30. Epoch 7/ 24
  31. ----------
  32. LR is set to 0.0001
  33. train Loss: 0.1109 Acc: 0.8074
  34. val Loss: 0.0480 Acc: 0.9346
  35. Epoch 8/ 24
  36. ----------
  37. train Loss: 0.1045 Acc: 0.8115
  38. val Loss: 0.0465 Acc: 0.9477
  39. Epoch 9/ 24
  40. ----------
  41. train Loss: 0.0973 Acc: 0.8238
  42. val Loss: 0.0488 Acc: 0.9477
  43. Epoch 10/ 24
  44. ----------
  45. train Loss: 0.0723 Acc: 0.8730
  46. val Loss: 0.0560 Acc: 0.9281
  47. Epoch 11/ 24
  48. ----------
  49. train Loss: 0.0867 Acc: 0.8525
  50. val Loss: 0.0436 Acc: 0.9542
  51. Epoch 12/ 24
  52. ----------
  53. train Loss: 0.0941 Acc: 0.8443
  54. val Loss: 0.0448 Acc: 0.9412
  55. Epoch 13/ 24
  56. ----------
  57. train Loss: 0.1037 Acc: 0.8074
  58. val Loss: 0.0414 Acc: 0.9542
  59. Epoch 14/ 24
  60. ----------
  61. LR is set to 1.0000000000000003e-05
  62. train Loss: 0.0874 Acc: 0.8320
  63. val Loss: 0.0413 Acc: 0.9477
  64. Epoch 15/ 24
  65. ----------
  66. train Loss: 0.0893 Acc: 0.8484
  67. val Loss: 0.0412 Acc: 0.9412
  68. Epoch 16/ 24
  69. ----------
  70. train Loss: 0.0585 Acc: 0.9098
  71. val Loss: 0.0587 Acc: 0.9085
  72. Epoch 17/ 24
  73. ----------
  74. train Loss: 0.0708 Acc: 0.8770
  75. val Loss: 0.0483 Acc: 0.9346
  76. Epoch 18/ 24
  77. ----------
  78. train Loss: 0.0915 Acc: 0.8361
  79. val Loss: 0.0417 Acc: 0.9542
  80. Epoch 19/ 24
  81. ----------
  82. train Loss: 0.0751 Acc: 0.8648
  83. val Loss: 0.0441 Acc: 0.9477
  84. Epoch 20/ 24
  85. ----------
  86. train Loss: 0.0717 Acc: 0.8852
  87. val Loss: 0.0478 Acc: 0.9412
  88. Epoch 21/ 24
  89. ----------
  90. LR is set to 1.0000000000000002e-06
  91. train Loss: 0.0865 Acc: 0.8279
  92. val Loss: 0.0439 Acc: 0.9608
  93. Epoch 22/ 24
  94. ----------
  95. train Loss: 0.0764 Acc: 0.8443
  96. val Loss: 0.0523 Acc: 0.9346
  97. Epoch 23/ 24
  98. ----------
  99. train Loss: 0.0790 Acc: 0.8648
  100. val Loss: 0.0446 Acc: 0.9477
  101. Epoch 24/ 24
  102. ----------
  103. train Loss: 0.0850 Acc: 0.8566
  104. val Loss: 0.0426 Acc: 0.9477
  105. Training complete in 0m 37s
  106. Best val Acc: 0.960784


可视化:


     
     
     
     
  1. visualize_model(model_conv)
  2. plt.ioff()
  3. plt.show()

transfer_第3张图片


















你可能感兴趣的:(deep,learning)