Matlab神经网络十讲(3): Deep Networks / CNN

1. What is deep learning?

Deep learning is a branch of machine learning that teaches computers to do what comes
naturally to humans: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. Deep learning is especially suited for image recognition, which is important for solving problems such as facial recognition, motion detection, and many advanced driver assistance technologies such as autonomous
driving, lane detection, pedestrian detection, and autonomous parking.

Deep learning uses neural networks to learn useful representations of features directly from data. Neural networks combine multiple nonlinear processing layers, using simple elements operating in parallel and inspired by biological nervous systems. Deep learning models can achieve state-of-the-art accuracy in object classification, sometimes exceeding human-level performance.  

Matlab神经网络十讲(3): Deep Networks / CNN_第1张图片

 Many deep learning applications use image files, and sometimes millions of image files.
To efficiently access many image files for deep learning, MATLAB provides theimageDatastore
function. Use this function to:

1. Automatically read batches of images for faster processing in machine learning and computer vision applications
2. Import data from image collections that are too large to fit in memory
3. Label your image data automatically based on folder names

2. Try DeepLearning in Just 10 Lines 

camera = webcam; % Connect to the camera
net = alexnet;   % AlexNet is a pretrained convolutional neural network (CNN) that
                 % has been trained on more than a million images and can classify
                 % images into 1000 object categories
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end  % Ctrl + C to end the program

3. Transfer Learning

    Transfer learning is commonly used in deep learning applications.We can take a pretrained network and use it as a starting point to learn a new task.Fine-tuning a network with transfer learning is much faster and easier than training from scratch.We can quickly make the network learn a new task using a smaller number of training images. The advantage of transfer learning is that the pretrained network has already learned a rich set of features that can be applied to a wide range of other similar tasks.(用于已经训练好的网络,通过自己的数据进行微调)

    3.1 Train Classifiers Using Features Extracted from Pretrained Networks

    Feature extraction allows us to use the power of pretrained networks without investing time and effort into training. Feature extraction can be the fastest way to use deep learning. We extract learned features from a pretrained network, and use those features to train a classifier.

    3.2 Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on the Cloud

    3.3 Constructing Deep Networks Using Autoencoders(学习隐含特征)

[X,T] = wine_dataset;
hiddenSize = 10;
autoenc1 = trainAutoencoder(X,hiddenSize,...
            'L2WeightRegularization',0.001,...
            'SparsityRegularization',4,...
            'SparsityProportion',0.05,...
            'DecoderTransferFunction','purelin');
features1 = encode(autoenc1,X); % Extract the features in the hidden layer.

% Train a second autoencoder using the features from the first autoencoder.
% Do not scale
the data.
hiddenSize = 10;
autoenc2 = trainAutoencoder(features1,hiddenSize,...
            'L2WeightRegularization',0.001,...
            'SparsityRegularization',4,...
            'SparsityProportion',0.05,...
            'DecoderTransferFunction','purelin',...
            'ScaleData',false);
features2 = encode(autoenc2,features1);% Extract the features in the hidden layer.

% Train a softmax layer for classification using the features, features2, 
% from the second autoencoder, autoenc2.
softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');

% Stack the encoders and the softmax layer to form a deep network.
deepnet = stack(autoenc1,autoenc2,softnet);

% Train the deep network on the wine data.
deepnet = train(deepnet,X,T);

% Estimate the wine types using the deep network, deepnet.
wine_type = deepnet(X);

% Plot the confusion matrix.
plotconfusion(T,wine_type);

4. 几种经典的DeepNet

1. AlexNet

    AlexNet has learned rich feature representations for a wide range of images. We can apply this rich feature learning to a wide range of image classification problems using transfer learning and feature extraction (图像分类问题、特征提取). The AlexNet model is trained on more than a million images and can classify images into 1000 object categories (具有很强的分类优势). The training images are a subset of the ImageNet database, which is used in ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). AlexNet won ILSVRC 2012, achieving highest classification performance. AlexNet has 8 layers with learnable weights:5 convolutional layers, and 3 fully connected layers

2. VGG-16 and VGG-19

      We can use VGG-16 and VGG-19 for classification, transfer learning, and feature extraction(分类 迁移学习 特征提取). VGG-16 and VGG-19 are both trained using the ILSVRC data set. VGG-16 has 16 layers with learnable weights: 13 convolutional layers and 3 fully connected layers. VGG-19 has 19 layers with learnable weights: 16 convolutional layers and 3 fully connected layers. In both networks, all convolutional layers have filters of size 3-by-3.VGG networks are larger and typically slower than AlexNet, but more accurate on the original ILSVRC data set.

3. GoogLeNet

      GoogLeNet won the ILSVRC in 2014. GoogLeNet is smaller and typically faster than VGG networks, and smaller and more accurate than AlexNet on the original ILSVRC data set. GoogLeNet is 22 layers deep. It has a more complex structure than AlexNet and VGG networks with some layers having inputs from or outputs to multiple layers. However, when performing classification and transfer learning, this more complicated internal structure does not significantly change the way you use the network. Use classify to classify new images and trainNetwork to perform transfer learning.

4. ResNet-50

      The residual connections of ResNets enable training of very deep networks. ResNet-50 is deeper, larger, and slower than GoogLeNet, but more accurate on the original ILSVRC data set. As the name suggests, ResNet-50 is 50 layers deep. Use classify to classify new images. 

5. importCaffeNetwork(导入caffe训练好的网络)

      There are many pretrained networks available in Caffe Model Zoo . Locate and download the desired.prototxtand.caffemodel files and useimportCaffeNetwork to import the pretrained network into MATLAB.

6. importKerasNetwork(导入Keras训练好的网络)

      We can import the network and weights either from the same HDF5 (.h5) file or
separate HDF5 and JSON (.json) files. 

5. Learning about Convolutional Neural Networks

    5.1 Fundament of Convolutioal Neural Networks

Matlab神经网络十讲(3): Deep Networks / CNN_第2张图片

The neurons in each layer of a ConvNet are arranged in a 3-D manner, transforming a 3-D input to a 3-D output. For example, for an image input, the first layer (input layer) holds the images as 3-D inputs,with the dimensions being height, width, and the color channels of the image. The neurons in the first convolutional layer connect to the regions of these images and transform them into a 3-D output. The hidden units (neurons) in each layer learn nonlinear combinations of the original inputs, which is called feature extraction.These learned features, also known as activations, from one layer become the inputs for the next layer. Finally, the learned features become the inputs to the classifier or the regression function at the end of the network.

We can concatenate the layers of a convolutional neural network in MATLAB in the following way:

% defining the layers of our network
layers = [imageInputLayer([28 28 1])
          convolution2dLayer(5,20)
          reluLayer
          maxPooling2dLayer(2,'Stride',2)
          fullyConnectedLayer(10)
          softmaxLayer
          classificationLayer];

% specify the training options using the trainingOptions function.
options = trainingOptions('sgdm');

% train the network with training data using the trainNetwork function
convnet = trainNetwork(data,layers,options);

    5.2 Specify Layers of Convolutional Neural Networks

     The first step of creating and training a new convolutional neural network (ConvNet) is to define the network architecture.(定义网络结构)

      We can define the layers of a convolutional neural network in MATLAB in an array format, for example:

layers = [  imageInputLayer([28 28 1])
            convolution2dLayer(3,16,'Padding',1) % 填充边界
            batchNormalizationLayer
            reluLayer
            maxPooling2dLayer(2,'Stride',2)
            convolution2dLayer(3,32,'Padding',1)
            batchNormalizationLayer 
            reluLayer
            fullyConnectedLayer(10)  % 全连接层
            softmaxLayer
            classificationLayer ];
      1. Image Input Layer
          The image input layer defines the size of the input images of a convolutional neural network and contains the raw pixel values of the images. We can add an input layer using the imageInputLayer function. Specify the image size using the inputSize  argument. The size of an image corresponds to the height, weight, and the number of  color channels of that image. For example, for a grayscale image, the number of channels  is 1, and for a color image it is 3.

      2. Convolution Layer

         Filters and Stride: A convolutional layer consists of neurons that connect to subregions of the input images or the outputs of the layer before it. A convolutional layer learns the features localized by these regions while scanning through an image. We can specify the size of these regions using the filterSize input argument when you create the layer using the convolution2dLayer function.

          suppose that the input image is a 28-by-28-by-3 color image. For a convolutional layer with 16 filters, and a filter size of 8-by-8, the number of weights per filter is 8*8*3 = 192, and the total number of parameters in the layer is (192+1) * 16 = 3088. Assuming stride is 4 in each direction and there is no zero padding, the total number of neurons in each feature map is 6-by-6 ((28 – 8+0)/4 + 1 = 6). Then, the total number of neurons in the layer is 6*6*16 = 256.

          Learning Parameters: You can also adjust the learning rates and regularization parameters for this layer using the related name-value pair arguments while defining the convolutional layer. If you choose not to adjust them, trainNetwork uses the global training parameters defined by trainingOptions function. 

    3. Batch Normalization Layer

        Use batch normalization layers between convolutional layers and nonlinearities such as
ReLU layers (卷积层和非线性层,如ReLU层) to speed up network training and reduce the sensitivity to network initialization(加速训练以及减少对网络初始化的敏感). The layer first
normalizes the activations of each channel by subtracting the mini-batch mean and dividing by the mini-batch standard deviation. Then, the layer shifts the input by an offset β and scales it by a scale factor γ. β and γ are themselves
learnable parameters that are updated during network training. Create a batch normalization layer using
batchNormalizationLayer.

      Batch normalization layers normalize the activations and gradients propagating through a neural network,making network training an easier optimization problem. To take full advantage of this fact, we can try increasing the learning rate. Since the optimization problem is easier, the parameter updates can be larger and the network can learn faster.We can also try reducing the L2 and dropout regularization(减小L2约束 使用Dropout正则化). With batch normalization layers, the activations of a specific image are not deterministic, but instead depend on which images happen to appear in the same mini-batch. To take full advantage of this regularizing effect, try shuffling the training data before every training epoch. To specify how often to shuffle the data during training, use the 'Shuffle' name-value pair argument of trainingOptions.

      4. ReLU Layer

          Convolutional and batch normalization layers (在卷积层和批正则化层) are usually followed by a nonlinear activation function (非线性激活函数)such as a rectified linear unit (ReLU), specified by a ReLU layer. Create a ReLU layer using the reluLayer function. A ReLU layer performs a threshold operation to each element, where any input value less than zero is set to zero, that is,




      5. Cross Channel Normalization (Local Response Normalization) Layer

          This layer performs a channel-wise local response normalization. It usually follows the
ReLU activation layer. Create this layer using the crossChannelNormalizationLayer
function. This layer replaces each element with a normalized value it obtains using the elements from a certain number of neighboring channels (elements in the normalization window).

      6. Max- and Average-Pooling Layers

     Max- and average-pooling layers follow the convolutional layers for down-sampling, hence, reducing the number of connections to the following layers (usually a fully connected layer). They do not perform any learning themselves, but reduce the number of parameters to be learned in the following layers. They also helpreduce overfitting. Create these layers using themaxPooling2dLayer andaveragePooling2dLayer functions.

        A max-pooling layer returns the maximum values of rectangular regions of its input. The size of the rectangular regions is determined by thepoolSize argument ofmaxPoolingLayer. For example, ifpoolSize equals [2,3], then the layer returns the maximum value in regions of height 2 and width 3.  

         The maxPoolingLayer andaveragepoolingLayer functions scan through the input horizontally and vertically in step sizes we can specify using the 'Stride' name-value pair argument of either function. If the poolSize is smaller than or equal to the Stride, then the pooling regions do not overlap.

      7. Dropout Layer

          A dropout layer randomly sets the layer’s input elements to zero with a given probability. Create a dropout layer using the dropoutLayerfunction.

      8. Fully Connected Layer
         The convolutional (and down-sampling) layers are followed by one or more fully connected layers. Create a fully connected layer using the
fullyConnectedLayer function.   

      As the name suggests, all neurons in a fully connected layer connect to all the neurons in the previous layer. This layer combines all of the features (local information) learned by the previous layers across the image to identify the larger patterns. For classification problems, the last fully connected layer combines the features to classify the images. This is the reason that the outputSize argument of the last fully connected layer of the network is equal to the number of classes of the data set. For regression problems, the output size must be equal to the number of response variables.

      9. Output Layers

         For classification problems, a softmax layer and then a classification layer must follow the final fully connected layer. We can create these layers using thesoftmaxLayer andclassificationLayer functions, respectively.

Matlab神经网络十讲(3): Deep Networks / CNN_第3张图片

The output unit activation function is the softmax function:

      10. Regression Layer

        We can also use ConvNets for regression problems, where the target (output) variable is continuous. In such cases, a regression output layer must follow the final fully connected layer. We can create a regression layer using the regressionLayer function. 

Matlab神经网络十讲(3): Deep Networks / CNN_第4张图片

The default loss function for a regression layer is the mean squared error: 

你可能感兴趣的:(神经网络,深度学习,matlab)