A RAG-Driven Framework for Natural Language to SQL Translation in Relational Databases
AbstractThis paper presents a retrieval-augmented generation (RAG) system for translating natural language questions into SQL queries, enabling non-experts to interact intuitively with relational databases. By addressing limitations in conventional query interfaces—such as schema complexity and ambiguous user intent—the proposed approach aims to democratize data access and enhance usability. The system adopts a two-stage RAG framework: (1) a retrieval phase using similarity search and pre-trained language models (e.g., LLaMA, DeepSeek) to identify relevant database schemas and tables, achieving 97.20% accuracy in schema identification on the Spider benchmark; and (2) a generation phase that employs instruction-tuned models (e.g., Flan-T5) to synthesize SQL queries from natural language inputs. Preliminary results highlight the effectiveness of the retrieval phase in resolving schema ambiguity and mitigating error propagation, outperforming baseline methods in complex join scenarios. Evaluation of full pipeline execution accuracy is ongoing, with initial qualitative analysis indicating improved usability for non-expert users. This work advances NLP-driven database interaction by integrating retrieval-augmented models with text-to-SQL tasks. Its open-source implementation lowers the technical barrier for real-world adoption, underscoring the potential of RAG architectures to improve accessibility, precision, and efficiency in data-centric applications.