This is a brief summary of paper for me to study and organize it, Character-based Neural Machine Translation (Ling et al., arXiv 2015.) that I read and studied.

This paper propose a neural translation model that learns to encode and decode using character level.

In other words, they used composition model to make character embedding into word embedding in input and output layer.

First of all, they designed a attention-based neural translation model presented by Bahdanau et al.(2015) which is described as follows:

Ling et al., arXiv 2015.

Since the attention-based neural machine translation model is word-based neural network model, it has a problem which has to bottleneck of softmax.

In order for them to resolve this problem, they adapt the word-based neural machine translation model to operate over character sequences rather than word sequence.

However, they retain the notion of words when using the character embedding in each input and output layer.

In input layer, they use bidirectional LSTM to represent character sequence into word embedding like this:

Ling et al., arXiv 2015.

Their composition model for word vector from characters builds a representation of the words using characters, by reading character left-to-right and vice versa.

The representation of the word is obtained by combining each final state from forward LSTM and backward LSTM.

In output layer, they also use bidirectional LSTM such as the method of input layer. The figure belwo shows the generation of words from characters in output layer:

Ling et al., arXiv 2015.

The disticntion between input and output layer is that the output use the alignment and output context.

For detailed experiment analysis, you can found in Character-based Neural Machine Translation (Ling et al., arXiv 2015)

Note(Abstract): They introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words. Since word-level information provides a crucial source of bias, their input model composes representations of character sequences into representations of words (as determined by whitespace boundaries), and then these are translated using a joint attention/translation model. In the target language, the translation is modeled as a sequence of word vectors, but each word is generated one character at a time, conditional on the previous character generations in each word. As the representation and generation of words is performed at the character level, their model is capable of interpreting and generating unseen word forms. A secondary benefit of this approach is that it alleviates much of the challenges associated with preprocessing/tokenization of the source and target languages. They show that our model can achieve translation results that are on par with conventional word-based models.

Download URL:
The paper: Character-based Neural Machine Translation (Ling et al., arXiv 2015)

Reference

Paper
- arXiv Version: Character-based Neural Machine Translation (Ling et al., arXiv 2015)
How to use html for alert
- how to use icon

Character-based Neural Machine Translation

Title of paper - Character-based Neural Machine Translation

Character-based Neural Machine Translation

Title of paper - Character-based Neural Machine Translation

Reference