This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Direct Preference Optimization: Your Langauge Model is Secretly a Reward Model (Rafailov et al., arXiv 2023), that I read and studied.

Rafailov et al., arXiv 2023

Rafailov et al., arXiv 2023

Rafailov et al., arXiv 2023

For detailed experiment and explanation, refer to the paper, titled Direct Preference Optimization: Your Langauge Model is Secretly a Reward Model (Rafailov et al., arXiv 2023)