This is a brief summary of paper for me to study and organize it, MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (Sun et al., ACL 2020) that I read and studied.

They proposed task-agnostic BERT for Resouce-Limited Devices by using layerwise knowledge distilattion technique as follows:

Sun et al., ACL 2020

For detailed experiment analysis, you can found in MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (Sun et al., ACL 2020)

Reference