This paper,Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al., arXiv 2015), refered to how to use BiLSTM+CRF for seqeunce tagging in NLT task.

Normally, If you run into Sequence tagging problem, you would think of RNN.

It is because the key point is seqeunce in the problem.

So This paper implemented LSTM network, BiLSTM network, LSTM-CRF Network, BiLSTM-CRF network.

First, A LSTM network deals with information from left to right :

Huang et al., arXiv 2015

as you can already know and understand RNN structure, the utilize the infromation of previous information and current input.

So LSTM utilize the past information at the time, But BiLSTM is different as follows:

Second, Bidirectional LSTM network is like this :

Huang et al., arXiv 2015

BiLSTM have two type of LSTM, one is forward LSTM and the other is backward LSTM.

operation of the two LSTM is the same, the direction of information flow is different.

Let’s see how to take advantage of BiLSTM to extract information.

there two ways to extract information. one is only final state, the other is sequence output at the time.

firstly, use final state(output) that it summarize the infromation of forward and backward respectively :

Seconde, methods to use contextual represetation of forward and backward respectively.

as you could know, for sequence labeling problem, we need to use contextual represenation.

conditional random field

Conditional random field(CRF) is useful graphical model on probability.

It consider sentence level tag sequence information.

Huang et al., arXiv 2015

Let’s see the combination of LSTM and CRF

First, LSTM with a CRF layer

Huang et al., arXiv 2015

Second, Bi-direcational LSTM with A CRF Layer

Huang et al., arXiv 2015

As you can know, in this paper, LSTM is variant like Peephole LSTM.

They use cell state as input for input, output, forget gate.

In particular, the weight matrix from cell to gate vectors are diagonal.

additionaly, they used BIO2 annotation standard for Chunking and NER tasks.

also they use the connection trick of features like this:

Huang et al., arXiv 2015

They estimated the robustness of models with respect to engineered features(spelling and context features).

So they trained their models with word features only(spelling and context features removed).

they argued that

CRF model heavily rely on engineered features to obtain good performance.
on the other hand, LSTM, BiLSTM, and BiLSTM-CRF models are more robust and they are less affected by the removal of engineering features.

Tip: If you use this model for sequence tagging, be careful of the following about how to extract features :
There used the spelling features and context features

- spelling features :
• whether start with a capital letter
• whether has all capital letters
• whether has all lower case letters
• whether has non initial capital letters
• whether mix with letters and digits
• whether has punctuation
• letter prefixes and suffixes (with window size of 2 to 5)
• whether has apostrophe end (’s)
• letters only, for example, I. B. M. to IBM
• non-letters only, for example, A. T. &T. to ..&
• word pattern feature, with capital letters, lower case letters, and digits mapped to ‘A’, ‘a’ and ‘0’ respectively, for example, D56y-3 to A00a-0
• word pattern summarization feature, similar to word pattern feature but with consecutive identical characters removed. For example, D56y-3 to A0a-0

- context features : uni-gram, bi-gram or tri-gram

Note: In this paper, they proposed a variety of Long Short-Term Memory based models for sequence tagging. These models include LSTM networks, bidirectional LSTM(BI-LSTM) networks, LSTM with a Conditional Random Field(CRF) layer(LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRM) model.

Download URL:
The paper: Bidirectional LSTM-CRF Models for Sequence Tagging (Huang et al., arXiv 2015)

Bidirectional LSTM-CRF Models for Seqeunce Tagging

Title of paper - Bidirectional LSTM-CRF Models for Seqeunce Tagging

Bidirectional LSTM-CRF Models for Seqeunce Tagging

Title of paper - Bidirectional LSTM-CRF Models for Seqeunce Tagging

conditional random field

Reference