This is a brief summary of paper for me to study and organize it, BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., NAACL 2019) I read and studied.

The following is the material for the presenation on paper seminar in my class.

Reference