This is a brief summary of paper for me to study and simply arrange it, Language Models are Unsupervised Multitask Learners (Radford et al.) I read and studied.

This paper extend GPT for model with sufficient capacity to converge.

That is, they implement training language model on a variety of domains with WebText, which is common crawling data.

Also They experiemtn languag model on natural language understanding tasks at with zero-shot setting.

They aslo used BPE as input representation.

Reference