This is a brief summary of paper for me to study and organize it, Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation (Ling et al., EMNLP 2015) I read and studied.

This paper proposed the method to use character embedding instead of word look up table, which cannot generate representations for previously unseen words.

They said as follows:

Although models based on word lookup tables are often observed to learn that cats, kings, and queens exist in roughly the same linear correspondences to each other as cat, king, and queen do, the model does not represnt the fact that adding an s at the end of the words is evidence for this transformation. This means that word lookup tables cannot generate representations for previously unseen words, such as Frenchification, even if the components, French and -ification, are observed in other contexts.

They use bidirectional LSTMs to read character sequence that comprise each word and combine them into a vector representation of the word.

Their model assumes that each character type is associated with a vector, and the LSTM parameters encode both idiosyncratic lexical and regular morphological knowledge.

C2W model(i.e. their compositional character to word model) is based on bidirectional LSTM as follow:

Ling et al., EMNLP 2015

as shown in the figure above, The input of the C2W model use an alphabet of characters \(C\).

The input word \(w\)is decomposed into a sequence of characters \(c_1, …, c_n\), where m is the length of \(w\).

Each \(c_i\) is defined as a one hot vector \(1_{c_i}\), with one on the index of \(c_i\) in vocabulary \(M\).

They defined the projection layer \(P_C \in \mathbb R^{d_c \times |C|}\), where \(d_C\) is the number of parameters for each character in the character set \(C\).

That is, this is just character look up table.

They implemented the C2W model on two tasks such as POS tagging and Language modeling.

The following is for language modeling:

Ling et al., EMNLP 2015

The following is for POS tagging

Ling et al., EMNLP 2015

The detailed result can be found in Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. Ling et al. EMNLP 2015](https://www.aclweb.org/anthology/D15-1176/)

Reference