Meta-Rewarding Language Models - Self-Improving Alignment with LLM-as-a-Meta-Judge
Meta-Rewarding
This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge (Wu et al., arXiv 2024), that I read and studied.
[Read More]