This is a brief summary of paper for me to study and organize it, MASS: Sequence to Sequence Pre-training for Language Generation (Song et al., arXiv 2019) that I read and studied.
Tho following is the material of my paper seminar on MASS which is composed by me.
I hope someone who want to understand what is the MASS and pre-training in natural language processing field
MASS: Sequence to Sequence Pre-training for Language Generation (Song et al., arXiv 2019)
For detailed experiment analysis, you can found in MASS: Sequence to Sequence Pre-training for Language Generation (Song et al., arXiv 2019)
Note(Abstract):
Pre-training and fine-tuning, e.g., BERT (Devlin et al., 2018), have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks. Inspired by the success of BERT, they propose MAsked Sequence to Sequence pre-training (MASS) for encoder-decoder based language generation. MASS adopts the encoder-decoder framework to reconstruct a sentence fragment given the remaining part of the sentence: its encoder takes a sentence with randomly masked fragment (several consecutive tokens) as input, and its decoder tries to predict this masked fragment. In this way, MASS can jointly train the encoder and decoder to develop the capability of representation extraction and language modeling. By further fine-tuning on a variety of zero/low-resource language generation tasks, including neural machine translation, text summarization and conversational response generation (3 tasks and totally 8 datasets), MASS achieves significant improvements over baselines without pre-training or with other pretraining methods.
Download URL:
The paper: MASS: Sequence to Sequence Pre-training for Language Generation (Song et al., arXiv 2019)
The paper: MASS: Sequence to Sequence Pre-training for Language Generation (Song et al., arXiv 2019)
Reference
- Paper
- How to use html for alert
- For your information