While I have studied for Korean Natural Language processing with Neural Network. I was finding the architecture for my work.

So I read this paper,Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et al., arXiv 2016) , and I realized about how to dealing with a seqeunce of data.

This paper is end-to-end model for Neural Network translation. In my case, I wondered the architectur of neural network about how to use LSTM for translation.

I was inspired from this paper.

Their basic architecture of neural network translation :

encoder, attention mechanism, and decoder.

Wu et al., arXiv 2016

Also they used residual connection like this:

Wu et al., arXiv 2016

plus, For feature generation as vector, they used Bi-directional LSTM considering long and short dependancy from output.

Wu et al., arXiv 2016

except for thing above, they said for computational speed, they used parallelism to each layers.

But I am goning to go through in detail here,

Another problem, in particular, of NLP is out-of-vocabulary. i.e. rare word is trouble in open vocabulary system.

In the case of theirs, they used sub-word units like wordpiece model.

• Word: Jet makers feud over seat width with big orders at stake
• wordpieces: _J et _makers _fe ud _over _seat _width _with _big _orders _at _stake

_ sign is special mark for the beggining of a word.

If you want to know detail, read 4 section, Segmentation Approaches.

Reference