AI has created a new opportunity for data democratization. By harnessing the power of language model comprehension to generate insights without requiring SQL, a significantly larger number of users will be able to uncover valuable insights that are currently buried in massive amounts of data.
Last month, Vivek Raghunathan, VP of Engineering at Snowflake, discussed the pioneering Snowflake Copilot at the Fully Connected conference. The video became publicly available on YouTube last week. The Wren AI team has been working on text-to-SQL for several months, and we learned a lot of thoughts and techniques Vivek shared in his latest talk about the SQL LLM topic. Today, I would like to arrange thoughts and learnings from this video. I hope this helps other developers like us accelerate text-to-SQL innovation faster together!
Wren AIās mission is to democratize data by bringing text-to-SQL ability to any data source and industry. We believe in bringing the ability to make data accessible to any person.
If you are interested in the full video, check the link below!
Now, letās start!
Vivek talked about the current and future of AI in data analytics.
But the future will evolve into a fully-fledged conversational pilot for business users; this will empower individuals who donāt know SQL to simply ask questions in natural language and receive answers.
The goal is to create an interface that supports interactive data exploration and visualization, ultimately becoming an AI that users can rely on to accomplish their tasks efficiently.
This is also what we experienced in developing Wren AI. We believe there is still a limitation of current technology innovation to reach a full-fledged AI pilot for all business users; itās like what we experience in self-driving cars. Over the years, car vendors have been rolling out autopilots to copilot with drivers. Until it is proven that a high degree of accuracy and safety with the use cases of copilots is possible, we will still need data analysts to assist with AI.
Vivek mentioned that text-to-SQL is like an iceberg problem, issues appear simple on the surface but are immensely complex underneath.
Quote from the talk:
āIāve seen countless Twitter demos about this, I just saw one recently. Why not use standard approaches and buzzwords like schema recognition? Why not fine-tune on a schema? Thereās a new leader on the Spider leaderboard every so often, so why waste time on this trivial problem?ā Iām here to tell you itās not that simple
A few obvious challenges you will quickly face, which are mentioned in his talk:
Indeed, itās not simple at all!
Wren AI is very focused on solving the challenge between data and semantics; we believe the key to making text-to-SQL reliable, at its very core, is how to build a reliable semantic engine that responds to the semantic architecture on top of existing data structure, such as defining semantic relationships, calculations, aggregations, LLMs should learn how to deal with different context in different scenarios, it highly depends on a robust semantic layer.
Snowflake has been through several experiments from v0 to v4; thankfully Vivek generously shared what they tried, learned, and iterated in the following version to make improvements of text-to-SQL innovation.
Letās dive in!
In the talk, Vivek mentioned:
To tackle this, we began with a data set called Spider from Yale, optimizing our model for this benchmark. Our initial model performed well, but when tested on an internal data set reflective of real-world use, its accuracy plummeted.
This highlighted the need for a robust semantic catalog retrieval system
In the V0 version, they see using the Spider dataset to achieve the best model at 82%, Zero-shot GPT-4 (no-optimization) at 74%, but in real-world data, the accuracy plummeted to 9% using their best model, and using GPT-4 with prompt-optimized get 14%.
That is when they start realized the important of āSemantic Catalogā which semantic plays a huge part in data retrieval, since pre-trained LLMs knows nothing about your business context, the only way is to provide semantics through RAG.
Semantics is key for solving text-to-SQL challenge, and itās the centric core design when we started implementing Wren AI.
In the follow up V1 version, the Snowflake team start by thinkingā¦ If we can bring web-quality search to enterprise metadata search and feed the output into an LLM, performance can potentially get radically better.
In other words, to solve a simpler problem ā what should we pack into the context of this LLM ā weāre going to solve a harder problem first:
How can we adapt a consumer-grade web search engine to the problem of conversational catalog search?
Below is the result, the best model of Snowflake improves from 9% to 24%, and GPT-4 version also jumps from 14% to become 28%. Thus, the intuition that semantic catalog retrieval is essantial holds true.
Vivek mentioned how they retrieve the conversational catalog retrieval at Snowflake
At Snowflake, user queries come in and go through a query understanding layer, which includes multi-turn disambiguation and query rewriting. Every known trick from the IR and NLP literature is employed.
In the middle, weāre adapting a consumer-grade search engine, Neeva, to the problem of conversational catalog retrieval. This involves both sparse and dense retrieval and ranking techniques, such as keyword retrieval, bi-encoder LLM for retrieval, IR ranker (which outperforms BM25), and a cross-encoder LLM for ranking.
Of course, all the usual secret sauce is included. We train these models on web and schema data, incorporating factors like popularity, authority, and freshness. After extensive work, we ended up with a system that meets our standards.
The Snowflake team is also facing another challenge: the rater. They Re-annotated everything by hand, added more complex examples to evaluate the performance, and sliced based on data semantics such as single table vs. multi-table and simple join vs. complex join.
Below is what Vivek shared in his talk:
That was more reflective of real-world use, but it had subtle errors. So, what is the crux of the issue here? For those of you who write SQL, it isnāt easy. Itās not just learning the syntax; itās learning the underlying semantics of the data and thinking like an analyst.
Our annotators were not highly paid analysts, and even an analyst takes an hour to understand a database before writing queries. They have a very structured process for learning how data is organized. There are multiple other complex issues, such as join paths between tables that can cause problems, and phenomena like chasm traps and fan traps.
In other words, weāve hit the iceberg, which means we must curate the data correctly. With the problem thus identified, the solution is simple: we manually curated everything. We did multiple rounds of validation and cross-checking. The cross-checking process is actually harder than it seems because it requires truly solving the problem to ensure accuracy.
This insight is quite interesting to us; while we havenāt done much retrieval optimization when implementing Wren AI. Currently, we only use Top-N selection from vector store; we are looking into further details of retrieval techniques to improve this area.
Our team is also building internal eval datasets that could have more complex scenarios with single and multi-table, as well as simple and complex join datasets, making sure the solution remains accurate in the real world.
Welcome to open new issues for discussion on this topic! Our team is willing to look into more improvements!
Next, Vivek shared that despite these advances, the model still faced challenges in conversational capabilities. Optimizing for SQL tasks had crippled its ability to handle conversations and follow instructions.
Below are some insights they shared:
With this new clean eval, they are seeing significant improvements, their base model 27% has become 39%, using GPT-4 jumped from 40% has become 46%.
The next challenge they face in trying to build is a conversational co-pilot instead of a zero-shot Text-to-SQL solution. It needs to handle conversations and allow analysts to refine their queries as they go along.
Optimizing one part of the system will unintentionally deoptimize the whole thing. Two big issues arose:
The talk mentioned text-to-SQL challenge requires an interactive approach instead of a zero-shot approach. We at Wren AI are also pursuing this approach, we are still working on several experiences to improve.
Through our implementation, we implemented sophisticated processes in the āAugmentation & Generationā RAG pipeline, such as validation, correction, and clarification dialogs, to make sure LLMs fully understand the userās intent. But of course, still a lot of areas to improve.
Vivek shared they retrained the LLM to follow instructions by giving it a mix of text-to-SQL tasks and more general-purpose instruction-following tasks. and they layered it in a Multi-LLM setup with an orchestrator LLM.
The orchestrator modelās responsibility is to have a conversation with the customer; itās like an agentic approach. Allowing it to use another equally big text-to-SQL model whenever it needed to write SQL. This approach of smartly delegating tasks solved many problems.
And the numbers go even better, their best model used to be 38% now hit 41% due to the added instruction-following capability, and with GPT-4 plus optimization the eval reached 46%.
Going from 46.4% to 99%, the goal of text-to-SQL is to ā build a conversational co-pilot for business users who donāt know SQL. This is the opportunity, and they need 99% accuracy.
The goal is to build a conversational co-pilot for business users who donāt know SQL. This is the opportunity, and they need 99% accuracy.
At Wren AI, we are also optimistic about the future of text-to-SQL. We believe that the LLM innovation revolution at breakneck speed it is very soon, we will bring the LLM near human comprehensibility with semantic context together to achieve an ideal world of data democratization.
Itās now a work in progress at Snowflake! Using the semantic context provided by the customer to understand metrics, and join paths, etc.
Looking forward to more sharing from Snowflake! Very exciting!
Finally, Vivek shared his key takeaways and learnings from going through all the challenges, arranged below.
Thatās about it! Thank you Vivek and Snowflake team shared lotās of valuable lessons in this talk, weāve learned a lot from you!
Onward!
ā
Supercharge Your Data with AI Today?!