This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Direct Preference Optimization: Your Langauge Model is Secretly a Reward Model (Rafailov et al., arXiv 2023), that I read and studied.
For detailed experiment and explanation, refer to the paper, titled Direct Preference Optimization: Your Langauge Model is Secretly a Reward Model (Rafailov et al., arXiv 2023)