This post is a brief summary about the paper that I read for my study and curiosity, so I shortly arrange the content of the paper, titled Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (Li et al., NeurIPS 2023), that I read and studied.

They proposed the new type of dataset which is called BIRD (i.e. a BIg bench for laRge-scale Database grounded in text-to-SQL task).

They said the existing dataset, i.e., WikiSQL and Spider, focus on database schema with few rows of database values leaving the gap between academic study and real-world applicationm. To reduce the gap, they present BIRD benchmark.

From the benchmark, they imply external knowledge grounding problem, database value comprehension, in Text2SQL task.

In addition, they tackled the efficiency of SQL generated from Text2SQL model which matter in industries.

Li et al., NeurIPS 2023

The following shows the overall process of the BIRD Annotation Workflow

Li et al., NeurIPS 2023

For detailed experiment and explanation, refer to the paper, titled Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs (Li et al., NeurIPS 2023)