keras构建卷积神经网络

为什么是Keras，而不是Tensorflow？ (Why Keras, not Tensorflow?)

If you are asking, “Should I use keras OR tensorflow?”, you are asking the wrong question.

如果您问：“我应该使用keras还是tensorflow？”，您提出的问题是错误的。

When I first started my deep-learning journey, I kept thinking these two are completely separate entities. Well, as of mid-2017, they are not! Keras, a neural network API, is now fully integrated within TensorFlow. What does that mean?

当我第一次开始深度学习之旅时，我一直认为这两个是完全独立的实体。好吧，截至2017年中，他们还没有！ 神经网络API Keras现在已完全集成在TensorFlow中 。那是什么意思？

It means you have a choice between using the high-level Keras API, or the low-level TensorFlow API. High-level APIs provide more functionality within a single command and are easier to use (in comparison with low-level APIs), which makes them usable even for non-tech people. The low-level APIs allow the advanced programmer to manipulate functions within a module at a very granular level, thus allowing custom implementation for novel solutions.

这意味着您可以选择使用高级Keras API还是使用低级TensorFlow API。高级 API在单个命令中提供了更多功能，并且更易于使用(与低级API相比)，这使它们甚至可以用于非技术人员。 低级API允许高级程序员在非常细粒度的级别上操作模块中的功能，从而允许对新颖的解决方案进行自定义实现。

Note: For the purpose of this tutorial, we will be using Keras only!

注意：就本教程而言，我们将仅使用Keras！

让我们深入研究编码 (Let’s dive right into the coding)

We begin by installing Keras onto our machine. As I said before, Keras is integrated within TensorFlow, so all you have to do is pip install tensorflow in your terminal (for Mac OS) to access Keras in your Jupyter notebook.

我们首先将Keras安装到我们的机器上。就像我之前说过的那样，Keras集成在TensorFlow中，因此您所要做的就是在终端中(对于Mac OS) pip install tensorflow来访问Jupyter笔记本中的Keras。

数据集 (Dataset)

We will be working with a loan-application dataset. It has two predictor features, a continuous variable - age, and a categorical variable - area (rural vs. urban), and one binary outcome variable application_outcome, which can take values 0 (approved) or 1(rejected).

我们将使用贷款申请数据集。它具有两个预测器功能：一个连续变量age和一个类别变量area (农村vs.城市)，以及一个二进制结果变量application_outcome ，其值可以为0(已批准)或1(已拒绝)。

import pandas as pddf = pd.read_csv('loan.csv')[['age', 'area', 'application_outcome']]
df.head()

Sample from our dataset. 来自我们的数据集的样本。

预处理数据 (Preprocessing the data)

In order to avoid overfitting, we will be scaling the age between 0 and 1 using MinMaxScaler, and label encoding the area and application_outcome features using LabelEncoder from Sklearn toolkit. We are doing this so we can bring all the input features on the same scale.

为了避免过度拟合，我们将使用MinMaxScaler在0和1之间缩放age ，并使用Sklearn工具包中的LabelEncoder对area和application_outcome功能进行标签编码。我们这样做是为了使所有输入功能都具有相同的比例。

from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from itertools import chain# Sacling the Age column
scaler = MinMaxScaler(feature_range = (0,1))a = scaler.fit_transform(df.age.values.reshape(-1, 1))
x1 = list(chain(*a))# Encoding the Area, Application Outcome columns
le = LabelEncoder()x2 = le.fit_transform(df.area.values)
y = le.fit_transform(df.application_outcome)
# Updating the df
df.age = x1
df.area = x2
df.application_outcome = ydf.head()

scaled dataset 缩放数据集的样本

If you read into the Keras documentation, it requires the input data to be of type NumPy arrays. So that is what we are going to do now!

如果您阅读Keras文档，则要求输入数据为NumPy数组类型。这就是我们现在要做的！

scaled_train_samples = df[['age', 'area']].values
train_labels = df.application_outcome.valuestype(scaled_train_samples) # numpy.ndarray

生成模型架构 (Generating the model architecture)

There are two ways to build Keras models: sequential (most basic one) and functional (for complex networks).

有两种方法可以构建Keras模型：顺序模型(最基本的模型)和功能模型(对于复杂的网络)。

We will be creating a Sequential model which is a linear stack of layers. That is, the sequential API allows you to create models layer-by-layer. It is great for developing deep learning models in most cases.

我们将创建一个顺序模型 ，该模型是线性堆叠的图层。也就是说，顺序API 允许您逐层创建模型。在大多数情况下，这对于开发深度学习模型非常有用。

# Model architecturemodel_m = Sequential([
    Dense(units = 8, input_shape= (2,), activation = 'relu'),
    Dense(units = 16, activation = 'relu'),
    Dense(units = 2, activation = 'softmax') 
])

Here, the first dense layer is actually the second layer overall (because the actual first layer will be our input layer from original data) but the first “hidden” layer. It has 8 units/neurons/nodes and the choice of 8 is arbitrary!

在这里， 第一致密层实际上是整个第二层(因为实际的第一层将会从原始数据我们的输入层)，但第一个“隐藏的”层 。它有8个单元/神经元/节点，可以选择8个单元！

The input_shape parameter is something you must assign based on your dataset. Intuitively speaking, it is the shape of the input data that the network should expect. I like to think of it as — “what is the shape of a single row of data that I am feeding into the neural network?”.

您必须根据数据集分配input_shape参数。直观地讲，这是网络应该期望的输入数据的形状。我喜欢将其视为- “ 我要馈入神经网络 的单行数据 的形状是什么 ？”。

In our case, a single row of the input looks like [0.914, 0]. That is, it is 1-dimensional. Thus, the input_shape parameter will look like a tuple (2, ), where 2 refers to the number of features in your dataset (age and area). Thus, the input layer would expect a one-dimensional array with 2 elements for input. It would produce 8 outputs in return.

在我们的例子中， 输入的单行看起来像[0.914, 0] 。也就是说，它是一维的。因此， input_shape参数看起来像元组(2，)，其中2表示数据集中input_shape的数量( age和area )。 因此，输入层将期望包含2个元素的一维数组用于输入。 作为回报，它将产生8个输出。

If we were dealing with, say black-and-white 2x3 pixel images (as we will look into our next tutorial on Convolutional Neural Networks), we will see that a single row of the input (or vector representation a single image) looks like [[0 , 1, 0] , [0 , 0, 1], where 0 means the pixel is bright and 1 means the pixel is dark. That is, it is 2-dimensional. Subsequently, the input_shape parameter will be equal to (2,3).

如果我们正在处理黑白2x3像素图像(我们将在下一个卷积神经网络教程中进行研究)，我们将看到输入的单行 (或矢量表示单个图像 )看起来像[[0 , 1, 0] , [0 , 0, 1] ，其中0表示像素亮，1表示像素暗。即，它是二维的。随后， input_shape参数将等于(2,3)。

Note: In our case, our input shape has only one dimension, so you don’t necessarily need to give it as a tuple. Instead, you can give input_dim as a scalar number. So, in our model, where our input layer has 2 elements, we can use any of these two:

注意：在我们的情况下，我们的输入形状只有一个尺寸，因此您不必将其作为元组给出。 相反，您可以将 input_dim 作为标量数字。 因此，在我们的模型中，输入层有2个元素，我们可以使用以下两个元素中的任何一个：

input_shape=(2,) -- The comma is necessary when you have only one dimension
input_shape=(2,) -只有一维时必须使用逗号
input_dim = 2
input_dim = 2

A popular misconception surrounding the input shape parameter is that it must include the total number of input samples that we are feeding to our neural network (10,000 in our case).

关于输入形状参数的一个普遍的误解是它必须包括我们正在馈送到神经网络的输入样本的总数(在我们的例子中为10,000)。

The number of rows in your training data is not part of the input shape of the network because the training process feeds the network one sample per batch (or, more precisely, batch_size samples per batch).

训练数据中的行数不是网络输入形状的一部分，因为训练过程每批次向网络馈送一个样本(或更准确地说，每批次一个batch_size样本)。

The second “hidden” layer is another dense layer and has the same activation function as the first hidden layer i.e. ‘relu’. An activation function ensures values that are passed on lie within a tunable, expected range. The Rectified Linear Unit (or relu) function returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less.

第二“隐藏”层是另一个致密层，并且具有与第一隐藏层相同的激活功能 ，即“ relu”。激活功能可确保传递的值在可调的预期范围内。整流线性单位 (或relu)函数直接返回作为输入提供的值，如果输入为0.0或更小，则返回值0.0。

You might be wondering why didn’t we specify the input_shape parameter for this layer. After all, Keras need to know the shape of their inputs in order to be able to create their weights. The truth is,

您可能想知道为什么我们不为此层指定input_shape参数。毕竟，Keras需要知道其输入的形状才能创建其权重。事实是，

There no need to specify the input_shape parameter for second (or subsequent) hidden layer as it will automatically calculate the optimal number of input nodes based on the architecture (i.e. units and particularities of each layer).

无需为第二个(或后续)隐藏层指定input_shape参数，因为它将根据架构(即每个层的单位和特殊性)自动计算最佳输入节点数。

Finally, the third or the last hidden layer in our sequential model is another dense layer with a softmax activation function. The softmax function returns the output probabilities for both classes — approved (output = 0) and rejected(output = 1).

最后，我们的顺序模型中的第三个或最后一个隐藏层是另一个具有softmax激活函数的密集层。 softmax函数返回两种类别的输出概率- approved (输出= 0)和rejected (输出= 1)。

This is how the model summary looks like:

模型摘要如下所示：

model_m.summary()

Summary for our Sequential model 我们的顺序模型摘要

准备训练模型 (Preparing the model for training)

model_m.compile(optimizer= Adam(learning_rate = 0.0001), 
              loss = 'sparse_categorical_crossentropy', 
              metrics = ['accuracy'] 
             )

Before we start training our model with actual data, we must compile the model with certain parameters. Here, we will be using the Adam optimizer .

在开始使用实际数据训练模型之前，我们必须使用某些参数来compile模型。在这里，我们将使用Adam optimizer 。

Available choices of optimizers include SGD, Adadelta, Adagrad, etc.

优化器的可用选择包括SGD，Adadelta，Adagrad等。

The loss parameter specifies cross-entropy loss should be monitored at each iteration. The metrics parameter indicates we want to judge our model based on the accuracy.

loss参数指定应在每次迭代时监视交叉熵损失。 metrics参数表示我们要基于准确性来判断模型。

训练和验证模型 (Training and validating the model)

# training the model
model_m.fit(x = scaled_train_samples_mult, 
          y = train_labels, 
          batch_size= 10, 
          epochs = 30, 
          validation_split= 0.1, 
          shuffle = True,
          verbose = 2 
         )

The x and y parameters are pretty intuitive — NumPy arrays of predictor and outcome variables, respectively. batch_size specifies how many samples are included in one batch. epochs=30 means the model is going to train on all of the data 30 times. verbose = 2 means it is set to the most verbose level in terms of the output messages.

x和y参数非常直观-分别是预测变量和结果变量的NumPy数组。 batch_size指定一个批次中包含多少个样本。 epochs=30表示模型将对所有数据进行 30次训练。 verbose = 2表示就输出消息而言，它被设置为最详细的级别。

We are creating a validation set on-the-fly using a 0.1 validation_split, i.e. reserving 10% of the training data during each epoch and holding it out of training. This helps to check the generalizability of our model because by taking a subset of the training set, the model is learning only on training data but is being tested on validation data.

我们创建了一个验证集合使用0.1上即时validation_split ，即，每个历元期间保留所述训练数据的10％，并且保持它拉出训练。这有助于检查模型的可推广性，因为通过获取训练集的子集，模型仅在训练数据上学习，而在验证数据上进行测试。

Keep in mind that the Validation split occurs BEFORE the training set is shuffled i.e. only training set is shuffled AFTER the validation set has been taken out. If you had all the rejected loan applications at the end of the dataset, it could mean your validation set has misrepresentation of classes. So you MUST shuffle data yourself rather than relying on keras to do it for you!

请记住，在对训练集进行改组之前，即发生验证拆分之前，即仅对训练集进行了改组，之后才取出验证集。如果您在数据集的末尾有所有被拒绝的贷款申请，则可能意味着您的验证集的类表述不正确。因此，您必须自己随机整理数据，而不要依靠keras为您做数据！

This is what the first five epochs look like:

这是前五个时期的样子：

This is what the last five epochs look like:

这是最后五个时期的样子：

As you can see, we started with high loss (0.66) and low accuracy (0.57) on the validation set during first epoch. Gradually, we were able to decrease the loss (0.24) and improve accuracy (0.93) on the validation set on the last epoch.

如您所见，我们在第一个时期从验证集的高损失(0.66)和低准确性(0.57)开始。渐渐地，我们能够在最后一个时期验证集上减少损失(0.24)和提高准确性(0.93)。

对测试集进行推断 (Making inferences on the test set)

We preprocess the previously unseen test set in a manner similar to the trainset and save it in scaled_test_samples. The corresponding labels are stored in test_labels .

我们以类似于scaled_test_samples集的方式预处理以前看不见的测试集，并将其保存在scaled_test_samples 。相应的标签存储在test_labels 。

predictions = model.predict(x = scaled_test_samples, 
                            batch_size= 10,
                            verbose=0)

Make sure to pick exact samebatch_size as used in the training process.

确保选择与培训过程中使用的batch_size完全相同的batch_size 。

Since our last hidden layer had a softmax activation function, the predictions include the output probabilities for both classes (on left we have the probability of class 0 (i.e. approved) and on right, class 1 (i.e. rejected).

由于我们的最后一个隐藏层具有softmax激活函数，因此predictions包括两种类别的输出概率(左侧为0类(即已批准)，右侧为1类(即被拒绝)。

Prediction from ANN when the final layer has softmax activation. 当最后一层具有softmax激活时，根据ANN进行预测。

There are a couple of ways to proceed from here. You could choose an arbitrary threshold value, say 0.7, and only if the probability of class 0 (i.e. approved) exceeds 0.7, should you choose to approve the loan application. Alternatively, you could pick the class with the highest probability as the final prediction. For instance, based on the above screenshot the model predicts a loan will be approved with a 2% probability but will be rejected with a 97% probability. Thus, the final inference should be that person’s loan is rejected. We will be doing the latter.

从这里开始有两种方法。您可以选择一个任意的阈值，例如0.7，并且只有当0类(即已批准)的概率超过0.7时，您才可以选择批准贷款申请。或者，您可以选择概率最高的类别作为最终预测。例如，基于上述屏幕截图，该模型预测贷款将以2％的概率被批准，但将以97％的概率被拒绝。因此，最终的推断应该是该人的贷款被拒绝。我们将做后者。

# get index of the prediction with the highest probrounded_pred = np.argmax(predictions, axis = 1)
rounded_pred

Predictions made for the test set 对测试集的预测

保存和加载Keras模型 (Saving and Loading a Keras model)

To save everything from the trained model:

要保存经过训练的模型中的所有内容 ：

model.save('models/LoanOutcome_model.h7')

We have essentially saved EVERYTHING from our trained model1. the architecture (layers, no of neurons, etc)2. weights learned3. training configurations (optimizers, loss)4. state of the optimizer (allows for easy retraining)

从本质上讲，我们已经从训练有素的模型中保存了一切。结构(层，没有神经元，等等)2。学习的重量3。训练配置(优化器，损失)4。优化程序的状态(允许轻松重新培训)

To load the model we just saved:

要加载我们刚刚保存的模型：

from tensorflow.keras.models import load_model
new_model = load_model('models/LoanOutcome_model.h7')

To save only the architecture:

仅保存架构：

json_string = model.to_json()

To reconstruct a new model from previously-stored architecture:

要从以前存储的体系结构重建新模型，请执行以下操作：

from tensorflow.keras.models import model_from_json
model_architecture = model_from_json(json_string)

To save only the weights:

要仅保存权重：

model.save_weights('weights/LoanOutcome_weights.h7')

To use the weights for some other model architecture:

要将权重用于其他一些模型体系结构：

model2 = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])# retrieving the saved weights
model2.load_weights('weights/LoanOutcome_weights.h7')

And there we have it. We have successfully managed to build our first ANN, train, validate and test it and also managed to save it for future use. In the next post, we will be working our way through a Convolutional Neural Network (CNN) to tackle an image classification task.

我们终于得到它了。我们已经成功地构建了我们的第一个ANN，对其进行了训练，验证和测试，并且还保存了它以备将来使用。在下一篇文章中，我们将通过卷积神经网络(CNN)来解决图像分类任务。

Until then :)

直到那时：)

翻译自: https://towardsdatascience.com/beginners-guide-to-building-artificial-neural-networks-using-keras-in-python-bdc4989dab00