机器学习中的神经网络Neural Networks for Machine Learning:Programming Assignment 2: Learning Word Representatio

Programming Assignment 2: Learning Word Representations.Help Center

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

We are now ready to start using neural nets for solving real problems!

In this assignment we will design a neural net language model. The model will learn to predict the next word given the previous three words. The network looks like this:

机器学习中的神经网络Neural Networks for Machine Learning:Programming Assignment 2: Learning Word Representatio_第1张图片

To get started, download any one of the following archives. 
assignment2.tar.gz 
Or 
assignment2.zip 
Or each file individually

  • README.txt
  • train.m
  • data.mat
  • raw_sentences.txt
  • fprop.m
  • word_distance.m
  • display_nearest_word.m
  • predict_next_word.m
  • load_data.m

The starter code implements a basic framework for training neural nets with mini-batch gradient descent. Your job is to write code to complete the implementation of forward and back propagation. See the README file for a description of the dataset, starter code and how to run it. 
This sample_output shows you what output to expect once everything is implemented correctly. 
Once you have implemented the required code and have the model running, answer the following questions. 
Happy coding!

In accordance with the Coursera Honor Code, I (刘欣欣) certify that the answers here are my own work.

Question 1

Train a model with 50 dimensional embedding space, 200 dimensional hidden layer and default setting of all other hyperparameters. What is average training set cross entropy as reported by the training program after 10 epochs ? Please provide a numeric answer (three decimal places). [4 points]
Answer for Question 1

Question 2

Train a model for 10 epochs with a 50 dimensional embedding space, 200 dimensional hidden layer, a learning rate of 0.0001 and default setting of all other hyperparameters. What do you observe ? [3 points]
Cross Entropy on the training and validation set decreases very rapidly.
Cross Entropy on the training and validation set decreases very slowly.
Cross Entropy on the validation set fluctuates wildly and eventually diverges.
Cross Entropy on the training set fluctuates wildly and eventually diverges.

Question 3

If all weights and biases in this network were set to zero and no training is performed, what will be the average cross entropy on the training set ? Please provide a numeric answer (three decimal places). [3 points]
Answer for Question 3

Question 4

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer. 
  • Model A: Learning rate = 0.001,
  • Model B: Learning rate = 0.1
  • Model C: Learning rate = 10.0.
Use the default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 1 epoch ? [3 points]
Model A
Model C
Model B

Question 5

In the models trained in Question 4, which one gives the lowest training set cross entropy after 10 epochs ? [2 points]
Model A
Model B
Model C

Question 6

Train each of following models:
  • Model A: 5 dimensional embedding, 100 dimensional hidden layer
  • Model B: 50 dimensional embedding, 10 dimensional hidden layer
  • Model C: 50 dimensional embedding, 200 dimensional hidden layer
  • Model D: 100 dimensional embedding, 5 dimensional hidden layer
Use default values for all other hyperparameters. Which model gives the best training set cross entropy after 10 epochs of training ? [3 points]
Model A
Model C
Model B
Model D

Question 7

In the models trained in Question 6, which one gives the best validation set cross entropy after 10 epochs of training ? [2 points]
Model B
Model C
Model D
Model A

Question 8

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer. 
  • Model A: Momentum = 0.0
  • Model B: Momentum = 0.5
  • Model C: Momentum = 0.9
Use the default settings for all other hyperparameters. Which model gives the lowest validation set cross entropy after 5 epochs ? [3 points]
Model B
Model C
Model A

Question 9

Train a model with 50 dimensional embedding layer and 200 dimensional hidden layer for 10 epochs. Use default values for all other hyperparameters. Which words are among the 10 closest words to the word 'day'. [2 points]
'during'
'today'
'year'
'week'

Question 10

In the model trained in Question 9, why is the word 'percent' close to 'dr.' even though they have very different contexts and are not expected to be close in word embedding space? [2 points]
Both words occur too frequently.
The model is not capable of separating them in embedding space, even if it got a much larger training set.
We trained the model with too large a learning rate.
Both words occur very rarely, so their embedding weights get updated very few times and remain close to their initialization.

Question 11

In the model trained in Question 9, why is 'he' close to 'she' even though they refer to completely different genders? [2 points]
The model does not care about gender. It puts them close because if 'he' occurs in a 4-gram, it is very likely that substituting it by 'she' will also make a sensible 4-gram.
Both words occur very rarely, so their embedding weights get updated very few times and remain close to their initialization.
They often occur close by in sentences.
They differ by only one letter.

Question 12

In conclusion, what kind of words does the model put close to each other in embedding space. Choose the  most appropriate answer. [3 points]
Words that belong to similar topics. A topic is a semantic categorization (like 'sports', 'art', 'business', 'computers' etc).
Words that are such that if one word occurs in a 4-gram replacing it with the other also creates a sensible 4-gram.
Words that have a lot of letters in common.
Words that occur close in an alphabetical sort.

你可能感兴趣的:(机器学习)