深度学习——tensorflow2.0搭建文本分类模型

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import tensorflow as tf

from tensorflow import keras

print(tf.__version__)

2.0.0-alpha0

imdb = keras.datasets.imdb
vocab_size = 10000
index_from = 3
# num_words:词表的最大值;index_from:从这个index开始索引实际单词,默认为3
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = vocab_size, index_from = index_from)

word_index = imdb.get_word_index()
print(len(word_index)) 

88584

word_index = {k:(v+3) for k, v in word_index.items()}
word_index[''] = 0
word_index[''] = 1
word_index[''] = 2
word_index[''] = 3

reverse_word_index = dict([(value, key) for key, value in word_index.items()])

def decode_review(text_ids):
    return ' '.join([reverse_word_index.get(word_id, "") for word_id in text_ids])

max_length = 500

# keras.preprocessing.sequence.pad_sequences:由于输入是变长的,因此需要使用padding进行对齐
train_data = keras.preprocessing.sequence.pad_sequences(
    train_data, # list of list
    value = word_index[''], # padding填充的数值
    padding = 'post',            # post, pre:post表示在句子的后面进行padding,pre表示在句子的前面进行padding
    maxlen = max_length)

test_data = keras.preprocessing.sequence.pad_sequences(
    test_data, # list of list
    value = word_index[''],
    padding = 'post', # post, pre
    maxlen = max_length)

embedding_dim = 16
batch_size = 512

bi_rnn_model = keras.models.Sequential([
    keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
    keras.layers.Bidirectional(keras.layers.LSTM(units = 32, return_sequences = False)),
    keras.layers.Dense(32, activation = 'relu'),
    keras.layers.Dense(1, activation='sigmoid'),
])

bi_rnn_model.summary()
# binary_crossentropy:2分类的交叉熵损失函数
bi_rnn_model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, 500, 16)           160000    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 64)                12544     
_________________________________________________________________
dense_4 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 33        
=================================================================
Total params: 174,657
Trainable params: 174,657
Non-trainable params: 0
_________________________________________________________________

history = bi_rnn_model.fit(train_data, train_labels, epochs = 30, batch_size = batch_size, validation_split = 0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/30
20000/20000 [==============================] - 6s 307us/sample - loss: 0.6858 - accuracy: 0.5631 - val_loss: 0.6281 - val_accuracy: 0.6702
Epoch 2/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.4969 - accuracy: 0.7776 - val_loss: 0.4603 - val_accuracy: 0.7960
Epoch 3/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.3552 - accuracy: 0.8547 - val_loss: 0.3183 - val_accuracy: 0.8732
Epoch 4/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.2202 - accuracy: 0.9201 - val_loss: 0.2940 - val_accuracy: 0.8834
Epoch 5/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.1672 - accuracy: 0.9452 - val_loss: 0.2979 - val_accuracy: 0.8882
Epoch 6/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.1301 - accuracy: 0.9603 - val_loss: 0.3593 - val_accuracy: 0.8812
Epoch 7/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.1052 - accuracy: 0.9693 - val_loss: 0.3746 - val_accuracy: 0.8770
Epoch 8/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0886 - accuracy: 0.9754 - val_loss: 0.3826 - val_accuracy: 0.8736
Epoch 9/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0824 - accuracy: 0.9772 - val_loss: 0.3935 - val_accuracy: 0.8826
Epoch 10/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0675 - accuracy: 0.9822 - val_loss: 0.4086 - val_accuracy: 0.8726
Epoch 11/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.0873 - accuracy: 0.9715 - val_loss: 0.5023 - val_accuracy: 0.8718
Epoch 12/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.0613 - accuracy: 0.9829 - val_loss: 0.4441 - val_accuracy: 0.8796
Epoch 13/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0534 - accuracy: 0.9857 - val_loss: 0.4314 - val_accuracy: 0.8684
Epoch 14/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.0512 - accuracy: 0.9864 - val_loss: 0.5086 - val_accuracy: 0.8728
Epoch 15/30
20000/20000 [==============================] - 6s 281us/sample - loss: 0.0375 - accuracy: 0.9920 - val_loss: 0.5097 - val_accuracy: 0.8728
Epoch 16/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0384 - accuracy: 0.9915 - val_loss: 0.5945 - val_accuracy: 0.8666
Epoch 17/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0331 - accuracy: 0.9929 - val_loss: 0.5916 - val_accuracy: 0.8714
Epoch 18/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0263 - accuracy: 0.9955 - val_loss: 0.6204 - val_accuracy: 0.8724
Epoch 19/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0234 - accuracy: 0.9961 - val_loss: 0.5929 - val_accuracy: 0.8722
Epoch 20/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0193 - accuracy: 0.9972 - val_loss: 0.6174 - val_accuracy: 0.8706
Epoch 21/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0173 - accuracy: 0.9976 - val_loss: 0.6405 - val_accuracy: 0.8696
Epoch 22/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0157 - accuracy: 0.9977 - val_loss: 0.6853 - val_accuracy: 0.8682
Epoch 23/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0161 - accuracy: 0.9973 - val_loss: 0.7232 - val_accuracy: 0.8564
Epoch 24/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0751 - accuracy: 0.9758 - val_loss: 0.4803 - val_accuracy: 0.8574
Epoch 25/30
20000/20000 [==============================] - 6s 279us/sample - loss: 0.0354 - accuracy: 0.9898 - val_loss: 0.6191 - val_accuracy: 0.8686
Epoch 26/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0196 - accuracy: 0.9955 - val_loss: 0.6793 - val_accuracy: 0.8628
Epoch 27/30
20000/20000 [==============================] - 6s 283us/sample - loss: 0.0150 - accuracy: 0.9974 - val_loss: 0.7201 - val_accuracy: 0.8654
Epoch 28/30
20000/20000 [==============================] - 6s 287us/sample - loss: 0.0137 - accuracy: 0.9974 - val_loss: 0.7434 - val_accuracy: 0.8648
Epoch 29/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.0232 - accuracy: 0.9936 - val_loss: 0.9976 - val_accuracy: 0.8578
Epoch 30/30
20000/20000 [==============================] - 6s 280us/sample - loss: 0.1400 - accuracy: 0.9535 - val_loss: 0.4958 - val_accuracy: 0.8592

bi_rnn_model.evaluate(test_data, test_labels, batch_size = batch_size)

25000/25000 [==============================] - 2s 99us/sample - loss: 0.5320 - accuracy: 0.8460

Out[ ]:

[0.5319758009338379, 0.846]

你可能感兴趣的:(深度学习——tensorflow2.0搭建文本分类模型)