This is a brief summary of paper for me to study and organize it, Semi-supervised sequence tagging with bidirectional language models (Peters et al., ACL 2017) I read and studied.

They used the tokens (i.e. words) sensitive to context surrounding it.

In order to make the context sensitive representation, they used pre-training LM called LM embedding in their paper.

They show how to used a LM embedding component in the following

Peters et al., ACL 2017

The total overview of their TagLM is as follows:

Peters et al., ACL 2017

Though their idea is simple, the resuling performance is superior to the previous moethod at the moment.

Reference