Assignment 2: Learning World representations.
This commit is contained in:
125
machine_learning/course2/assignment2/README.txt
Executable file
125
machine_learning/course2/assignment2/README.txt
Executable file
@@ -0,0 +1,125 @@
|
||||
#######################################
|
||||
Neural Networks for Machine Learning
|
||||
Programming Assignment 2
|
||||
Learning word representations.
|
||||
#######################################
|
||||
|
||||
In this assignment, you will design a neural net language model that will
|
||||
learn to predict the next word, given previous three words.
|
||||
|
||||
The data set consists of 4-grams (A 4-gram is a sequence of 4 adjacent words
|
||||
in a sentence). These 4-grams were extracted from a large collection of text.
|
||||
The 4-grams are chosen so that all the words involved come
|
||||
from a small vocabulary of 250 words. Note that for the purposes of this
|
||||
assignment special characters such as commas, full-stops, parentheses etc
|
||||
are also considered words. The training set consists of 372,550 4-grams. The
|
||||
validation and test sets have 46,568 4-grams each.
|
||||
|
||||
### GETTING STARTED. ###
|
||||
Look at the file raw_sentences.txt. It contains the raw sentences from which
|
||||
these 4-grams were extracted. Take a look at the kind of sentences we are
|
||||
dealing with here. They are fairly simple ones.
|
||||
|
||||
To load the data set, go to an octave terminal and cd to the directory where the
|
||||
downloaded data is located. Type
|
||||
|
||||
> load data.mat
|
||||
|
||||
This will load a struct called 'data' with 4 fields in it.
|
||||
You can see them by typing
|
||||
|
||||
> fieldnames(data)
|
||||
|
||||
'data.vocab' contains the vocabulary of 250 words. Training, validation and
|
||||
test sets are in 'data.trainData', 'data.validData' and 'data.testData' respectively.
|
||||
To see the list of words in the vocabulary, type -
|
||||
|
||||
> data.vocab
|
||||
|
||||
'data.trainData' is a matrix of 372550 X 4. This means there are 372550
|
||||
training cases and 4 words per training case. Each entry is an integer that is
|
||||
the index of a word in the vocabulary. So each row represents a sequence of 4
|
||||
words. 'data.validData' and 'data.testData' are also similar. They contain
|
||||
46,568 4-grams each. All three need to be separated into inputs and targets
|
||||
and the training set needs to be split into mini-batches. The file load_data.m
|
||||
provides code for doing that. To run it type:
|
||||
|
||||
>[train_x, train_t, valid_x, valid_t, test_x, test_t, vocab] = load_data(100);
|
||||
|
||||
This will load the data, separate it into inputs and target, and make
|
||||
mini-batches of size 100 for the training set.
|
||||
|
||||
train.m implements the function that trains a neural net language model.
|
||||
To run the training, execute the following -
|
||||
|
||||
> model = train(1);
|
||||
|
||||
This will train the model for one epoch (one pass through the training set).
|
||||
Currently, the training is not implemented and the cross entropy will not
|
||||
decrease. You have to fill in parts of the code in fprop.m and train.m.
|
||||
Once the code is correctly filled-in, you will see that the cross entropy
|
||||
starts decreasing. At this point, try changing the hyperparameters (number
|
||||
of epochs, number of hidden units, learning rates, momentum, etc) and see
|
||||
what effect that has on the training and validation cross entropy. The
|
||||
questions in the assignment will ask you try out specific values of these.
|
||||
|
||||
The training method will output a 'model' (a struct containing weights, biases
|
||||
and a list of words). Now it's time to play around with the learned model
|
||||
and answer the questions in the assignment.
|
||||
|
||||
### DESCRIPTION OF THE NETWORK. ###
|
||||
The network consists of an input layer, embedding layer, hidden layer and output
|
||||
layer. The input layer consists of three word indices. The same
|
||||
'word_embedding_weights' are used to map each index to a distributed feature
|
||||
representation. These mapped features constitute the embedding layer. This layer
|
||||
is connected to the hidden layer, which in turn is connected to the output
|
||||
layer. The output layer is a softmax over the 250 words.
|
||||
|
||||
### THINGS YOU SEE WHEN THE MODEL IS TRAINING. ###
|
||||
As the model trains it prints out some numbers that tell you how well the
|
||||
training is going.
|
||||
(1) The model shows the average per-case cross entropy (CE) obtained
|
||||
on the training set. The average CE is computed every 100 mini-batches. The
|
||||
average CE over the entire training set is reported at the end of every epoch.
|
||||
|
||||
(2) After every 1000 mini-batches of training, the model is run on the
|
||||
validation set. Recall, that the validation set consists of data that is not
|
||||
used for training. It is used to see how well the model does on unseen data. The
|
||||
cross entropy on validation set is reported.
|
||||
|
||||
(3) At the end of training, the model is run both on the validation set and on
|
||||
the test set and the cross entropy on both is reported.
|
||||
|
||||
You are welcome to change these numbers (100 and 1000) to see the CE's more
|
||||
frequently if you want to.
|
||||
|
||||
|
||||
### SOME USEFUL FUNCTIONS. ###
|
||||
These functions are meant to be used for analyzing the model after the training
|
||||
is done.
|
||||
display_nearest_words.m : This method will display the words closest to a
|
||||
given word in the word representation space.
|
||||
word_distance.m : This method will compute the distance between two given
|
||||
words.
|
||||
predict_next_word.m : This method will produce some predictions for the next
|
||||
word given 3 previous words.
|
||||
Take a look at the documentation inside these functions to see how to use them.
|
||||
|
||||
|
||||
### THINGS TO TRY. ###
|
||||
Choose some words from the vocabulary and make a list. Find the words that
|
||||
the model thinks are close to words in this list (for example, find the words
|
||||
closest to 'companies', 'president', 'day', 'could', etc). Do the outputs make
|
||||
sense ?
|
||||
|
||||
Pick three words from the vocabulary that go well together (for example,
|
||||
'government of united', 'city of new', 'life in the', 'he is the' etc). Use
|
||||
the model to predict the next word. Does the model give sensible predictions?
|
||||
|
||||
Which words would you expect to be closer together than others ? For example,
|
||||
'he' should be closer to 'she' than to 'federal', or 'companies' should be
|
||||
closer to 'business' than 'political'. Find the distances using the model.
|
||||
Do the distances that the model predicts make sense ?
|
||||
|
||||
You are welcome to try other things with this model and post any interesting
|
||||
observations on the forums!
|
||||
Reference in New Issue
Block a user