Overall, This paper,Efficient Estimation of Word Representations in Vector Space (Mikolov et al., arXiv 2013), is saying about comparing computational time with each other model, and extension of NNLM which turns into two step. one is training word vector and then the other step is using the trained vector on The NNLM.

In estimaiting continuous representations of words including the well-known Latent Semantic Analysis(LSA) and Latent Dirichlet Allocation(LDA).

They are saying the neural network is performing better thatnn LSA for preserving linear regularities among words. even LDA becomes computationally very expensive on large data sets.

In this paper, the computational time complexity is defined as follows:

The training complexity is proportional to :
O = E * T * Q

Where E is number of the training epochs, T is the number of the words in the training set and Q is defined further for each model architecture.

They are saygint when you compute computational time, it heavily depens on the number of output units or hidden layer units.

The best expensive one of two units is first output units, after that, the second one is hidden layer.

So They implemented hiararchical softmax with huffman tree to reduce output unit. but the bottleneck situation remains in hidden layer.

Finally they got rid of the hidden layer, so their model heavily depends on the efficiency of the softmax normalizations.

The models are called CBOW(continuous bag of word) and skip gram.

Mikolov et al., arXiv 2013

Let’s see an examle they used for testing syntactic and semantic questions.

Mikolov et al., arXiv 2013

Tip: They are providing a set of test about syntactic and semantic regularities, The test set is available at word-test.v1.txt
Some of the resulting word vectors were made available for future research and comparison :
- Senna
- Metaoptimize-wordreprs
- RNNLM
- AI stanford

Note: It is was found that computational time depens on output and hidden layer, But In thi paper, they used hiarachical softmax for output layer and got rid of hidden layer. it is totally for reducing the computational time.
Their idea to create new model for the continous represenation of words is from that neural network language model can successfully trained in two steps: first continuous word vectors are learned using simple model, and then the N-gram NNLM is trained on top of these distributed representations of words.
They introduced two model based on their ideas. those are called one is CBOW and The other is Skip gram.
CBOW : predicting a word with future and history words before and after a middle word, so The weight matrix between the input and the projection layer is shared for all word positions in the same way in the NNLM.
Skip gram : this model used each current words as an input to a log-linear classifier with continuous projections layer, and predict words within a certain range before and after the current word.
</br> If you want to download the data set, visit here

Download URL:
The paper: Efficient Estimation of Word Representations in Vector Space (Mikolov et al., arXiv 2013)

Reference

Paper
- arXiv Version: Efficient Estimation of Word Representations in Vector Space (Mikolov et al., arXiv 2013)
- word test set: syntactic and semantic regularities
Quoar
- What is meant by a ditributed representation in Deep learning
- How does word2vec work? Can someone walk through a speicific example
Reference
How to use html for alert
- how to use icon
My github repository to translate the test set of semantic and syntactic into Korean test set
- Hyunyoung2 Korean test set v1

Efficient Estimation of Word Representations in Vector Space

Title of paper - Efficient Estimation of Word Representations in Vector Space

Efficient Estimation of Word Representations in Vector Space

Title of paper - Efficient Estimation of Word Representations in Vector Space

Reference