This is a brief summary of paper for me to study and organize it, Learning Character-level Representations for Part-of-Speech Tagging (Santos and Zadrozny., ICML 2014) I read and studied.

They used character embedding not pretrained other than word embedding pretrained.

They concatented character and word embedding, and characters is a intra-word to be used to extract morphological and shape information.

When constructing a word-level representation from characters, they used CNN network as follows:

Santos and Zadrozny., ICML 2014

The next layer used the concatenation of word-symbol and word-level representation with sucessive window centralized in a target word.

Finally, They compute the cost function of strucured inference depending neighboring tags.

Reference