Length-Controlled AlpacaEval - A Simple Way to Debias Automatic Evaluators

alpacaeval 2.0

Posted on July 20, 2024

Length-Controlled AlpacaEval - A Simple Way to Debias Automatic Evaluators

alpacaeval 2.0

Posted on July 20, 2024

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Length-Controleed AlpacaEval: A Simple Way to Debias Automatic Evaluators (Dubois et al., arXiv 2024), that I read and studied.

In this paper, they want to focus on operatinalizing “what would be the AlpacaEval metric be, if the output of all models had the same lengthas those of the baseline?” into a simple regression-based estimator.

In other words, The automated evaluation measures such as AlpacaEval return their quality estimates through a combination of direct effects that measure the quality of model response and indirect effects that are mediated by spurious variables such as the length of outputs.

Dubois et al., arXiv 2024

The following is length control via regression.

Model Identity
Length of output
Instruction difficulty

Dubois et al., arXiv 2024

For detailed experiment and explanation, refer to the paper, titled Length-Controleed AlpacaEval: A Simple Way to Debias Automatic Evaluators (Dubois et al., arXiv 2024)

Download URL:
The paper: Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators (Dubois et al., arXiv 2024)

Reference

Paper
- ArXiv Version: Length-Controleed AlpacaEval: A Simple Way to Debias Automatic Evaluators (Dubois et al., arXiv 2024)
How to use html for alert
- how to use icon
How to use MathJax
- MathJax basic tutorial and quick reference in StackExchange
- List of Greek letters and math symbols

Tags: LLM, LLMEval, Reward