This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., arXiv 2023), that I read and studied.

They propose adapting the truthfulness of LLMs using attetnion head.

Li et al. ArXiv 2023

ITI (Inference-time Intervention) is an alternative form of MHA, where:

Li et al. ArXiv 2023

But, ITI is supervised learning, they used the activation editing.

You can see the detailed empirical analysis and experiemtn in the paper, titled Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., arXiv 2023)

For detailed experiment and explanation, refer to the paper, titled Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., arXiv 2023)

Reference