This is a brief summary of paper for me to study and arrange for Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition (Misawa et al., SCLeM 2017) I read and studied.

This paper is a research ralted to Japanese NER task applying, back then, the cutting-edge model to Japanese.

Most of neural network focused on English, So they verified neural network worked well on Japanese comparing to the conventional method of Japanese NER task.

Also while the CNN layer help performance of NER taks achieve hier performance becuase CNN can capture sub-information of a word which is capitalization, suffixes, and prefixes.

But They said the CNN has a problem in extracting sub-information in Japanese.

The reason is Japanese words tend to be shorter than English and Japanese character has no capitalization.

And They Japanese has boundary conflict problem when a part of a word compose an entity. So they finally argue the chacater-based than word-based model.

They propose character-based model to predict a tag for a character with word embeddign as follows:

Misawa et al., SCLeM 2017

Note(Abstract): Recently, neural models have shown superior performance over conventional models in NER tasks. These models use CNN to extract sub-word information along with RNN to predict a tag for each word. However, these models have been tested almost entirely on English texts. It remains unclear whether they perform similarly in other languages. They worked on Japanese NER using neural models and discovered two obstacles of the state-of-the-art model. First, CNN is unsuitable for extracting Japanese sub-word information. Secondly, a model predicting a tag for each word cannot extract an entity when a part of a word composes an entity. The contributions of this work are (1) verifying the effectiveness of the state-of-the-art NER model for Japanese, (2) proposing a neural model for predicting a tag for each character using word and character information.

Download URL:
The paper: Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition (Misawa et al., SCLeM 2017)

Reference

Paper
- SCLeM Version: Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition (Misawa et al., SCLeM 2017)
How to use html for alert
- how to use icon

Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition

Title of paper - Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition

Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition

Title of paper - Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition

Reference