I'm Hyunyoung2

An Image is Worth 16 X 16 Words - Transformers for Image Recognition At Scale

Vit

Posted on April 17, 2025

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled An Image is Worth 16 X 16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al. arXiv 2021), that I read and... [Read More]

Tags: LLM, VLM

A Veratile Vision-Langague Model for Understanding Localization, Text Reading, and Beyond

Qwen-VL

Posted on April 17, 2025

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond (Bai et al. arXiv 2023), that I read and studied. [Read More]

Tags: LLM, VLM

Visual Instruction Tuning

LLaVA

Posted on April 17, 2025

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Visual INstruction Tuning (Liu et al. arXiv 2023), that I read and studied. [Read More]

Tags: LLM, VLM

Unifying Multimodal Retrieval via Document Screenshot Embedding

DES

Posted on April 17, 2025

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Unifying Multimodal Retrieval via Document Screenshot Embedding (Ma et al. EMNLP 2024), that I read and studied. [Read More]

Tags: LLM, Retrieval, Multi-Modal

Measuring and Narrowing the Compositionality Gap in Language Models

Self-ask

Posted on April 10, 2025

This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Measuring and Narrowing the Compositionality Gag in Language Models (Press et al. arXiv 2023), that I read and studied. [Read More]

Tags: LLM, RAG, Prompt