While I have studied for Korean Natural Language processing with Neural Network. I was finding the architecture for my work.

So I read this paper,Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et al., arXiv 2016) , and I realized about how to dealing with a seqeunce of data.

This paper is end-to-end model for Neural Network translation. In my case, I wondered the architectur of neural network about how to use LSTM for translation.

I was inspired from this paper.

Their basic architecture of neural network translation :

encoder, attention mechanism, and decoder.

Wu et al., arXiv 2016

Also they used residual connection like this:

Wu et al., arXiv 2016

plus, For feature generation as vector, they used Bi-directional LSTM considering long and short dependancy from output.

Wu et al., arXiv 2016

except for thing above, they said for computational speed, they used parallelism to each layers.

But I am goning to go through in detail here,

Another problem, in particular, of NLP is out-of-vocabulary. i.e. rare word is trouble in open vocabulary system.

In the case of theirs, they used sub-word units like wordpiece model.

• Word: Jet makers feud over seat width with big orders at stake
• wordpieces: _J et _makers _fe ud _over _seat _width _with _big _orders _at _stake

_ sign is special mark for the beggining of a word.

If you want to know detail, read 4 section, Segmentation Approaches.

Note: This paper explained Neural Machine Translation(NMT). it is an end-to-end learning approach for automated translation with the potential to overcome many of the weaknesses of conventional phrased-based translation systems. The architecture of it comprises encoder, attention mechanism, and decoder. The reminder about this article used bidirectional LSTM(long-short term memory) for feauture generation. And the connection between encoder and decoder is attention mechanism is responsible for it. They say, in their experience of translation task with large-scale data, the number of layer of lstm works well up to 4 layers, barely with 6 layers and very poorly beyond 8 layers.

Download URL:
The paper: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et al., arXiv 2016)

Reference

OPentNMT
Paper
- arXiv Version: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et al., arXiv 2016)
How to use html for alert
- how to use icon

Google's Neural Machine Translation System

Title of paper - Google's Neural Machine Translation System

Google's Neural Machine Translation System

Title of paper - Google's Neural Machine Translation System

Reference