CV Home
Home CV


GradeSQL: Outcome Reward Models for Ranking SQL Queries from Large Language Models




My master’s thesis, now published as GradeSQL: Outcome Reward Models for Ranking SQL Queries from Large Language Models, explores how Outcome Reward Models (ORMs) can significantly improve the Text-to-SQL task.


Text-to-SQL enables users to query databases using natural language, but even powerful LLMs struggle with complex, compositional queries. Traditional approaches such as Best-of-N (BoN) or Majority Voting (Maj) rely on surface heuristics, often failing to capture semantic correctness.


In this work, we introduce and evaluate Outcome Reward Models (ORMs) as a principled, test-time method for ranking generated SQL queries. By scoring outputs based on semantic alignment rather than syntax alone, ORMs outperform ex-BoN and Maj on BIRD and SPIDER benchmarks, achieving +4.33% and +2.10% execution accuracy improvements over ex-BoN respectively.


We also show that fine-tuning ORM models on SQL-focused LLMs like OmniSQL further boosts results, and that ORMs scale well with larger candidate pools. This approach highlights a cost-effective path for using smaller models enhanced by inference-time scaling to rival much larger systems.


Read the full paper here: https://arxiv.org/abs/2509.01308


Acknowledgments



I would like to thank my supervisor Prof. Fedelucio Narducci and co-supervisors Dr. Dario Di Palma and Dr. Gaetano Rossiello for their continuous support and guidance throughout this research.