This is a brief summary of paper for me to study and organize it, ALBERT- A Lite BERT for Self-supervised Learning of Language Representations (Lan et al., ICLR 2020) I read and studied.

They propose light version of BERT_based model by using two parameter redcutions.

There are several ways to reduce parameter as follows:

prunining
weight sharing
Quatatization
Low-rank Approximation
Sparse Regularization
Distillation

They used weight sharing and factoriztion of parameter in ALBERT they proposed.

Factorized embedding parameterization
Cross-layer parameter sharing

And then they used another loss against next sentence prediction loss utilized in BERT paper.

It is Inter-sentence coherence loss.

Note(Abstract): Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, they present two parameter reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that their proposed methods lead to models that scale much better compared to the original BERT. They also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs.

Download URL:
The paper: ALBERT- A Lite BERT for Self-supervised Learning of Language Representations (Lan et al., ICLR 2020)

Reference

Paper
- arXiv Version: ALBERT- A Lite BERT for Self-supervised Learning of Language Representations (Lan et al., arXiv 2020)
- ICLR Version: ALBERT- A Lite BERT for Self-supervised Learning of Language Representations (Lan et al., ICLR 2020)
How to use html for alert
- how to use icon
For your information
- GLUE benchmark
- Lightweight Deep Learning with Model Compression

ALBERT- A Lite BERT for Self-supervised Learning of Language Representations

Title of paper - ALBERT- A Lite BERT for Self-supervised Learning of Language Representations

ALBERT- A Lite BERT for Self-supervised Learning of Language Representations

Title of paper - ALBERT- A Lite BERT for Self-supervised Learning of Language Representations

Reference