This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled LLaMa: Open and Efficient Foundation Language Models (Touvron et al., arXiv 2023), that I read and studied.

LLaMa is a collection of fundation language models ranging from 7B to 65B parameters.

They focused on the situation like that the best performances are not acheived by the largest models but by smaller models trained on more data for a given compute budget.

Furthermore, They showed that the prior research by Hoffmann et al. 2022 is to determine how to best scale the dataset and model size for a particular training compute budget saying The Hoffmann et al. 2022 disregarding the infference budget.

They think of the best case for serving model that given a target level of performance, the preferred model is not the fastest to trian but the fatest at inference.

So, their work focuses on training a series of language models that achieve the best possible perforemance at various inference budgets, by training on more tokens that whtat is typically used.

For detailed experiment and explanation, refer to the paper, titled LLaMa: Open and Efficient Foundation Language Models (Touvron et al., arXiv 2023)