This is a brief summary of paper for me to study and organize it, Linguistic Knowledge and Transferability of Contextual Representations (Liu et al., NAACL 2019) I read and studied.

This paper investigated the linguistic knowledge and transferability on contextual representation (e.g. ELMo, BERT) as follows:

Liu et al., NAACL 2019

They said their analysis reveals interesting insights:

Linear models trained on top of frozen CWRs are competitive with state-of-the-art taskspecific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge. In these cases, we show that tasktrained contextual features greatly help with encoding the requisite knowledge.

The first layer output of long short-term memory (LSTM) recurrent neural networks is consistently the most transferable, whereas it is the middle layers for transformers.

Higher layers in LSTMs are more taskspecific (and thus less general), while the transformer layers do not exhibit this same monotonic increase in task-specificity

Language model pretraining yields representations that are more transferable in general than eleven other candidate pretraining tasks, though pretraining on related tasks yields the strongest results for individual end tasks.

In summary, unlike Transformer, bidirectional LSTM language model yeilds the presentations that are more transferable in general that othe candidate tasks (i.e. supervied tasks)

Aslo, LSTM-based ELMo indicate the transferability and linguistic knowledge in lower layers but Transformer is in middel layer.

Unlike transformer, LSTM-based ELMo indicate the task-specific information in higher layer than lower layer.

Note(Abstract): Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, they study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of sixteen diverse probing tasks. They find that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge (e.g., conjunct identification). To investigate the transferability of contextual word representations, they quantify differences in the transferability of individual layers within contextualizers, especially between recurrent neural networks (RNNs) and transformers. For instance, higher layers of RNNs are more taskspecific, while transformer layers do not exhibit the same monotonic trend. In addition, to better understand what makes contextual word representations transferable, they compare language model pretraining with eleven supervised pretraining tasks. For any given task, pretraining on a closely related task yields better performance than language model pretraining (which is better on average) when the pretraining dataset is fixed. However, language model pretraining on more data gives the best results.

Download URL:
The paper: Linguistic Knowledge and Transferability of Contextual Representations (Liu et al., NAACL 2019)

Reference

Paper
- arXiv Version: Linguistic Knowledge and Transferability of Contextual Representations (Liu et al., arXiv 2019)
- NAACL Version: Linguistic Knowledge and Transferability of Contextual Representations (Liu et al., NAACL 2019)
How to use html for alert
- how to use icon
For your information
- NAACL 2019 Highlight on Ruder.io
- Slide
  - Transfer Learning in Natural Language Processing tutorial on NAACL-HLT 2019

Linguistic Knowledge and Transferability of Contextual Representations

Title of paper - Linguistic Knowledge and Transferability of Contextual Representations

Linguistic Knowledge and Transferability of Contextual Representations

Title of paper - Linguistic Knowledge and Transferability of Contextual Representations

Reference