Deep dive into how Pinterest built its Text-to-SQL solution.

Discover how the Pinterest Engineering Team’s Text-to-SQL implementation inspired WrenAI’s development in our latest blog post.

Howard Chi
Co-founder of Wren AI
May 10, 2024
September 8, 2024
5 min read

In early April this year, the Pinterest Engineering Team shared how they internally implemented a Text-to-SQL solution to enable data users to retrieve data without writing SQL; they also shared an astonishing ROI on the solution provided to their users, quoted below.

In our real world data (which importantly does not control for differences in tasks), we find a 35% improvement in task completion speed for writing SQL queries using AI assistance.

The team at Wren AI has been working on the Text-to-SQL challenge for several months and found we gained a lot of new insights and learnings from the post; we would like to share what we had learned from the post and our learnings while developing Wren AI with the community to accelerate the innovation of bringing the LLM comprehension capabilities to data.

Before we start, if you haven’t checked out Wren AI, check us out on GitHub! If you love what we are doing, don’t forget to give us a star ⭐️ !

First, let’s investigate the architectural similarity between Pinterest Text-to-SQL and Wren AI Architecture.

Pinterest Text-to-SQL Architecture

Here’s the architecture Pinterest shared in their post.

Pinterest text-to-sql architecture

Let’s break down the architecture phase by phase; when working on Text-to-SQL implementation, there are three critical phases: indexing, retrieval, and augmentation & generation.

Mapping these three phases into the architecture below and delve deeper into each.

Mapping the most critical 3 phases in RAG

Indexing

In the architecture, they applied an offline job to generate indexes for the vector store based on tables’ summaries and historical queries against them. Currently, they rely on two types of document embeddings in the vector indexing process: Table summarization and Query summarization.

The indexing phase is usually closely related to how Text-to-SQL searches for the most relevant tables in your database; the more context you can provide into the vector store, the better the final result will be.

Retrieval

During the retrieval phase, if the user does not specify tables, their question is converted into embeddings. A few methods are applied to conduct the retrieval phase, including NLP Table Search and Table Re-selection. The table re-selection results will be returned to the user for validation before being generated into SQL.

We usually use post-retrieval methods to filter out low-related contexts based on additional information or apply different weights to the results.

Augmentation & Generation

In the post, they didn’t mention much about this part of the augmentation and generation phase.

Wren AI Architecture

Here’s what the Wren AI Text-to-SQL architecture looks like.

Wren AI Text-to-SQL Architecture

Wren AI consists of three core services:

  • Wren UI: An intuitive user interface for asking questions, defining data relationships, and integrating data sources within Wren AI’s framework.
  • Wren AI Service: Processes queries using a vector database for context retrieval, guiding LLMs to produce precise SQL outputs.
  • Wren Engine: Serves as the semantic engine, mapping business terms to data sources, defining relationships, and incorporating predefined calculations and aggregations.

The differences in design between Pinterest Text-to-SQL and Wren AI

Three RAG phases in Wren AI

Let’s see how Wren AI designs and implements those three phases in our text-to-SQL architecture so you can compare it with the Pinterest text-to-SQL architecture we just mentioned.

Indexing

In Wren AI, we have implemented a semantic engine architecture to provide the LLM context of your business; you can easily establish a logical presentation layer on your data schema that helps LLM learn more about your business context.

Indexing phase in Wren AI

What is included in our indexing:

  1. Table schema, name, description: Your table schema will automatically synchronize into the vector store once you connect to your data source. You can also add additional information, such as name, description, etc.
  2. Column, name, type, description: While synchronizing your table schema, we will store your column types and metadata in the vector store. You can also add additional information, such as name, description, etc.
  3. Relationships: Relationships are a very important feature in Wren AI semantic modeling, where you can define Many-to-one, one-to-one, and one-to-many relationships between columns. This may differ from setting constraints such as primary and foreign keys since it could be a more business-oriented context. Setting the right relationships could greatly improve LLM's generation of JOIN clauses using the right columns and tables.
  4. Calculation: In the business context, every business has its business definitions and terms, such as when a company is referring revenue , profit , you want LLM to understand how your company defines those terms; this is where calculation comes in, you can easily define calculations in the semantic modeling, and LLM could generate the right SQL based on your definition without hallucination.
Wren AI semantic modeling UI

In contrast, Pinterest currently relies on only table metadata and query logs, but they have implemented table summarization and query summarization methods to improve context.

  1. Table summarization: The table summarization prompt will include a variety of sample queries and the table schema. It will then be sent to the LLM to generate the summary and save it to the vector store.
  2. Query summarization: Besides their function in table summarization, sample queries linked to each table are individually summarized, encompassing specifics like the query’s objective and the tables used.

Lesson Learned: Our team plans to implement “Table summarization” and “Query summarization” in the Wren AI project.

Retrieval

In Wren AI, during the retrieval phase, the user question is transformed into embeddings, and a similarity search is conducted against the vector index to infer the top K suitable tables, as below.

Retrieval phase in Wren AI Architecture
from typing import Any

from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever


def init_retriever(document_store: Any, top_k: int = 10):
    return QdrantEmbeddingRetriever(document_store=document_store, top_k=top_k)

In Pinterest’s case, the Text-to-SQL implementation implemented the table search method above and Table Re-selection.

Table Re-selection:

Upon retrieving the top N tables from the vector index, we engage an LLM to choose the most relevant K tables by evaluating the question alongside the table summaries.

Lesson Learned: Table Re-selection after the table selection from the vector store could help improve performance.

Augmentation & Generation

In Wren AI, we have unique designs and techniques to help LLMs understand more of the business context to augment information in prompts. We also provide SQL validation and auto-correction while in the generation phase.

Augmentation & Generation phase in Wren AI

Augmentation

One of Wren AI's key designs is the semantic engine, the Wren Engine. This engine is responsible for the intricate processing that defines and maps metadata, schema, terminology, data relationships, and the logic behind calculations and aggregations through an analytics-as-code design approach.

With Wren Engine, you can define semantics through the “Modeling Definition Language” (MDL). This definition encapsulates calculation formulas, metrics definitions, and semantic relationships, reducing duplicate coding and handling of data joins.

Standard SQL to different SQL dialects

Here’s an example; below is a definition of a “customers” data model.

{
      "name": "customers",
      "refSql": "select * from main.customers",
      "columns": [
        {
          "name": "City",
          "type": "VARCHAR",
          "isCalculated": false,
          "notNull": false,
          "properties": {
            "description": "The Customer City, where the customer company is located. Also called 'customer segment'."
          }
        },
        {
          "name": "Id",
          "type": "VARCHAR",
          "isCalculated": false,
          "notNull": false,
          "properties": {
            "description": "A unique identifier for each customer in the data model."
          }
        },
        {
          "name": "State",
          "type": "VARCHAR",
          "isCalculated": false,
          "notNull": false,
          "properties": {
            "description": "A field indicating the state where the customer is located."
          }
        },
        {
          "name": "orders",
          "type": "orders",
          "relationship": "CustomersOrders",
          "isCalculated": false,
          "notNull": false,
          "properties": {}
        },
        {
          "name": "LatestRecord",
          "type": "DATE",
          "isCalculated": true,
          "expression": "max(orders.PurchaseTimestamp)",
          "notNull": false,
          "properties": {}
        },
        {
            "name": "FirstRecord",
            "type": "DATE",
            "isCalculated": true,
            "expression": "min(orders.PurchaseTimestamp)",
            "notNull": false,
            "properties": {}
        },
        {
            "name": "VIP",
            "type": "BOOLEAN",
            "isCalculated": true,
            "expression": "sum(orders.Size) > 2",
            "notNull": false,
            "properties": {}
        },
        {
            "name": "OrderCount",
            "type": "BIGINT",
            "isCalculated": true,
            "expression": "count(orders.OrderId)",
            "notNull": false,
            "properties": {}
        },
        {
          "name": "Debit",
          "type": "DOUBLE",
          "isCalculated": true,
          "expression": "sum(orders.OrderBalance)",
          "notNull": false,
          "properties": {}
        },
        {
            "name": "ReviewRate",
            "type": "DOUBLE",
            "isCalculated": true,
            "expression": "count(orders.IsReviewed = TRUE) / count(DISTINCT orders.OrderId)",
            "notNull": false,
            "properties": {}
        }
      ],
      "primaryKey": "Id",
      "cached": false,
      "refreshTime": "30.00m",
      "properties": {
        "schema": "main",
        "catalog": "memory",
        "description": "A table of customers who have made purchases, including their city"
      }
    }

While generating the prompt, we will augment the Data Definition Language(DDL) with the semantics defined in the file, as below.

/* {"schema": "main", "catalog": "memory", "description": "A table of customers who have made purchases, including their city"} */
CREATE TABLE customers (
    -- {"description": "The Customer City, where the customer company is located. Also called \'customer segment\'."}
    City VARCHAR,
    -- {"description": "A unique identifier for each customer in the data model."}
    Id VARCHAR PRIMARY KEY,
    -- {"description": "A field indicating the state where the customer is located."}
    State VARCHAR,
    -- This column is a Calculated Field
    -- column expression: max(orders.PurchaseTimestamp)
    LatestRecord DATE,
    -- This column is a Calculated Field
    -- column expression: min(orders.PurchaseTimestamp)
    FirstRecord DATE,
    -- This column is a Calculated Field
    -- column expression: sum(orders.Size) > 2
    VIP BOOLEAN,
    -- This column is a Calculated Field
    -- column expression: count(orders.OrderId)
    OrderCount BIGINT,
    -- This column is a Calculated Field
    -- column expression: sum(orders.OrderBalance)
    Debit DOUBLE,
    -- This column is a Calculated Field
    -- column expression: count(orders.IsReviewed = TRUE) / count(DISTINCT orders.OrderId)
    ReviewRate DOUBLE
)

Generation

Here’s how Wren AI deals with the generation phase, which we separate into two distinct scenarios:

Initiate your business question & with follow-up questions
  1. Initiate your business question (The first question from a thread): When you start a new conversation in Wren AI when asking the first question, Wren AI will transform your question into embeddings and a similarity search to get the most relevant tables, with those tables and original question, we will ask LLM to regenerate three most relevant questions based on the original question and the retrieved tables and ask the user to choose the most relevant one to generate the result.
  2. Follow-up questions: When asking follow-up questions, we will add the previous context into the prompt to ensure the SQL generation process can be based on previous conversations.

In Pinterest’s case, the final stage of SQL generation is relatively simple; they currently didn’t mention much in this part in their blog post.

The future roadmaps listed in Pinterest’s text-to-SQL solution post, what about Wren AI?

In the post, they mentioned a few areas they are researching and working on in their roadmaps; some of the topics are already implemented in Wren AI, and some are working in progress and planning. Let’s check below:

Metadata Management (Implemented in Wren AI)

In metadata management, we’ve implemented into Wren AI, which is called “Modeling Definition Language” (MDL) for LLMs to include semantic relationships, domain context, tags, description, etc; this significantly improves the filtering accuracy during the retrieval phase.

Index Update Automatically (Implemented in Wren AI)

In Wren AI, the index is updated automatically while you are modeling through our user interface; click on the “Deploy” button, and the system will automatically synchronize between the vector store and the semantic engine.

Similarity Search and Scoring Strategy Revision (Planning)

The current Wren AI scoring strategy is also pretty basic; in the future, we plan to fine-tune it to improve the relevance of retrieved results.

Query validation (Implemented in Wren AI)

Query validation process in Wren AI

The query validation process is implemented in Wren AI; we will dry run the query in Wren Engine to ensure that the query is executable, validating the SQL and asking for LLMs to correct if the SQL is in an invalid form.

User Feedback (Working in progress)

This is one of the most important development areas we are designing in Wren AI; we will soon release a new version with user feedback! We’ll share more details once we release it, so make sure you subscribe to our Medium to stay posted!

Evaluation (Working in progress)

This is also one of the most important development areas we design in Wren AI. We will soon release an end-to-end evaluation process into Wren AI! We’ll share more details once we release them; make sure you subscribe to our Medium to stay posted!

This is pretty much it! Finally, we thank the Pinterest Engineering team for sharing such an amazing and informative post. We hope more engineering teams can share the knowledge they have implemented in their company! Onwards!

If you haven’t checked out Wren AI, Check us out!

👉 GitHub: https://github.com/Canner/WrenAI

👉 X: https://twitter.com/getwrenai

Supercharge Your Data with AI Today?!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Join our newsletter to stay up to date on features and releases.
© 2024 Canner. All right reserved.