This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Unifying Multimodal Retrieval via Document Screenshot Embedding (Ma et al. EMNLP 2024), that I read and studied.

Ma et al. EMNLP 2024

Ma et al. EMNLP 2024

For detailed experiment and explanation, refer to the paper, titled Unifying Multimodal Retrieval via Document Screenshot Embedding (Ma et al. EMNLP 2024)

Reference