This is a brief summary of paper for me to study and organize it, On Using Very Large Target Vocabulary for Neural Machine Translation (Jean et al., ACL and IJCNLP 2015) that I read and studied.

Despite its recent success to neural machine translation, the neural machine translation has its limitation in handling a larger vocabulary.

So, they propose new algorithm to resolve the problem that neural network-based machine translation.

There is one of the main difficulties in training this neural machine traslation model.

It is the comptutational complexity involved in the target word probability.

They explained two model-specific approaches to this issue of large target vocabulary.

  • The first approach is to stochastically approximate the target word probability

  • Other than these model-specific approaches, there exist translation-specific approaches. A translation-specific approach exploits the properties of the rare target words.

They used a model-specific approacah that allows them to tarin a neural network machine translation model with a very large target vocabulary.

In other word, they also used the small subset of the target vocabulary at each update.

Once training is over, Their method can use the full target vocabulary to compute the output probabliity of each target word.

For their approach, since the number of parameter being updated for each sentence pair cannot be controlled, they partitions the training corpus and define a subset \(v^{`}\) of target vocabulary for each partition prior to training.

That is, before training begins, they sequentially examine each target sentence in the training corpus and accumulate unique target words until the number of unique target words reaches the predefined threshold τ.

The accumulated vocabulary will be used for this partition of the corpus during training.

They repeat this until the end of the training set is reached.

They also used the most likely target words as shortlist to deconding and call it a candidate list.

If you want to know the result of their experiment, I refer you to On Using Very Large Target Vocabulary for Neural Machine Translation. Jean et al. ACL and IJCNLP 2015

Reference