Category: All posts
Aug 21, 2025
Posted by
Matvey Arye
TL;DR: We’re experimenting with turning Postgres into a self-describing database, embedding meaning as part of the schema. By providing natural language explanations of PostgreSQL structures and logic, agents can more accurately query data, and answer a broader set of questions. In our early tests, using an LLM-generated semantic catalog improved SQL generation accuracy by up to 27%.
As any developer who has ever had the (mis)fortune of working on a legacy database knows quite well, a database is not self-describing. You can’t look at a database schema and tell what’s going on. That orders1
table can be for a purchase order or a customer order or it’s an experimental table someone created and forgot to drop 5 years ago. This is a long-standing problem with databases that people solve by talking to each other, looking at code that interacts with the database, examining git history, and screaming at the wall.
But, LLMs need to answer questions about the data just from the context provided by the database. No wonder they get confused.
Like our fearless leader Ajay Kulkarni once said—“LLMs crave (accurate) context the same way GPUs crave power.”
In order for agents to extract insights from data in the database it needs to understand which columns and tables to query. For example, if it doesn’t understand which table contains the customer vs purchase orders, it can’t understand when a customer’s order has shipped.
In fact, in internal experiments we conducted at TigerData, we found that 42% of context-less LLM-generated SQL queries missed critical filters or misunderstood relationships, silently returning misleading data.
The larger (and thus probably more important) the database, the worse things get.
It doesn’t take a senior engineer to know that this is unacceptable.
The solution is incredibly simple: allow developers to add context about the database in the form of natural language descriptions of the schemas and business logic within Postgres, injecting much-needed factual meaning into schemas.
We arrived at this by putting ourselves in the place of the LLM — asking, how would we generate the query if we were in its shoes? The answer was clear: without additional context, we wouldn’t know enough to make the right call. Along the way, we tried different prompts, techniques, and even alternative LLMs, but the core issue was always the same: the model simply lacked the context necessary to generate the query. That realization led us to build a way for developers to provide that missing context directly.
That's what we’ve been experimenting with and are excited to share today.
We set out to test what happens when SQL generation is powered by our LLM-generated semantic catalog. The outcome was a 27% boost in accuracy compared to the control.
Let's show you how we created this self-describing Postgres database.
Our thoughts for this self-describing Postgres are built around four core ideas:
EXPLAIN
) to catch errors before queries run.We’re exploring this approach through two building blocks. This post primarily covers the Semantic Catalog. Next week, we’ll publish a post about the evaluation harness.
Component | What It Does | Repo |
---|---|---|
Semantic Catalog | Stores natural-language descriptions for schema elements. Supports vector search to retrieve relevant context dynamically. | Link |
Evaluation Harness | Measures query accuracy in any agentic text-to-sql system and transparently reveals errors in retrieval (bad metadata/context) versus reasoning (bad SQL generation). Helps track accuracy over time. | TBD |
Retrieval using the schema element names alone does not cut it simply because the names lack enough meaning. It doesn’t matter if you use keyword search, semantic search, or this week’s newest shiny retrieval algorithm—the semantic meaning of payment_aborts
versus refunds
is too close for any retrieval mechanism to make heads or tails of it.
Even if you injected the entire schema into the context of your prompt (burning through cash and increasing response latency in the process), it’s doubtful the model would be able to reason about payment_aborts
versus refunds
either.
The semantic catalog solves this by offering a structured, natural language representation of your database's metadata and business logic. In this experimental version, it supports documentation and descriptions for:
Initial descriptions can be LLM-generated and stored in human-readable YAML files designed for version control, peer review, and governance. Once reviewed, this metadata can be imported into the semantic catalog and indexed for semantic search, simplifying the process of retrieving and reasoning about data structures.
In our early tests, using an LLM-generated semantic catalog improved SQL generation accuracy by up to 27%. Effective SQL generation involves two key steps:
Our evaluation revealed substantial improvements from using semantic descriptions for SQL generation accuracy (from 58% to 86%) for certain schemas. A smaller improvement was noted when using semantic descriptions in retrieval recall (from 52% to 58% on one of our tests).
Results varied significantly across datasets, schemas, and models. Our current evaluations were conducted using a more cost-effective model (OpenAI's gpt-4.1-nano). We're particularly interested in insights from users experimenting with diverse datasets and more advanced models.
With these results, let’s walk through how the agent actually interacts with the database.
Every semantic query follows this four-step flow:
Walking through step-by-step:
Step 1: Describe your database using an LLM
Generate initial YAML descriptions with an LLM:
pgai semantic-catalog describe -f descriptions.yaml
Review and refine these descriptions, storing them with the same rigor as your application code.
The resulting structure looks like this:
schema: public
name: restaurant
type: table
description: Stores core information about each restaurant, including its cuisine,
city, and rating.
columns:
- name: id
description: Primary key identifier for the restaurant.
- name: name
description: Lowercased name of the restaurant.
- name: food_type
description: Lowercased name of the type of cuisine or food style offered by the restaurant.
- name: city_name
description: Lowercased name of the city where the restaurant is located.
- name: rating
description: Numeric rating score assigned to the restaurant, where higher than 2.5 is considered good.
...
---
type: sql_example
sql: SELECT t2.house_number, t1.name FROM LOCATION AS t2 JOIN restaurant AS t1 ON t1.id = t2.restaurant_id WHERE t1.city_name IN (SELECT city_name FROM geographic WHERE region = 'bay area') AND t1.rating > 2.5
description: give me some good restaurants in the bay area ?
...
---
type: fact
description: When asking for a restaurant, provide its name (restaurant.name) and its house number (location.house_number).
...
Step 2: Have a human review the description and add context
A developer would then review the descriptions and add additional context. Crucially, business logic that is central to database operations is often not encoded into the schema, and thus cannot be derived by the LLM. A developer needs to provide that information.
After the descriptions are reviewed, they should be treated as a core part of application code and thus should be stored in version control, be reviewed in pull-requests, etc.
Step 3: Import into the catalog
Make descriptions available:
pgai semantic-catalog import -f descriptions.yaml
This supports declarative configuration and continuous deployment practices.
Step 4: Generate SQL
Generate SQL from natural language queries:
CLI:
pgai semantic-catalog generate-sql -p "Which passengers have experienced the most flight delays in 2024?"
Python:
response = await catalog.generate_sql(
con,
con,
"openai:gpt-4.1",
"Which passengers have experienced the most flight delays in 2024?",
)
Models are far better at generating correct SQL when they have access to rich semantic information, not just schema names, but natural-language descriptions. Even a small amount of context (e.g. "the order table tracks transactions for customers buying subscription plans") can dramatically improve reliability.
What we’ve noticed with agentic text-to-sql systems is that to get good accuracy you need to tighten the scope of what kind of information the system has access to. The relationship looks something like this:
Successful agentic systems start with tight scopes. We found it valuable to restrict agents to function-level or view-level access at first, and only expand access once correctness is proven.
Postgres allows you to balance control and flexibility through three main interfaces:
We suggest beginning with tightly scoped functions, then expanding access as confidence grows.
To further improve reliability, we employ Postgres's deterministic EXPLAIN command. This preemptively catches query errors such as incorrect column or table names, allowing agents to self-correct and substantially increasing accuracy.
For many deployments, integrating the semantic catalog directly into your existing database provides the simplest path. Recognizing, however, the risks and complexities of altering production environments, we've also enabled the semantic catalog to be hosted independently in a separate database. This approach offers greater deployment flexibility and ensures accessibility even when you only have read-only access to your primary database.
While this semantic foundation is already improving agentic querying, we're just beginning. Our roadmap focuses on:
We’ve open-sourced everything we’ve built so far. Check out the README with a Quickstart to dive in, and we warmly invite your contributions, whether it's running evaluations on your schemas, proposing datasets, or challenging our assumptions. The best databases aren't just designed; they evolve through community effort.
About the author
Matvey Arye is a founding engineering leader at TigerData (creators of TimescaleDB), the premiere provider of relational database technology for time-series data and AI. Currently, he manages the team at TigerData responsible for building the go-to developer platform for AI applications. Under his leadership, the TigerData engineering team has introduced partitioning, compression, and incremental materialized views for time-series data, plus cutting-edge indexing and performance innovations for AI.
Matvey earned a Bachelor degree in Engineering at The Cooper Union. He earned a Doctorate in Computer Science at Princeton University where his research focused on cross-continental data analysis covering issues such as networking, approximate algorithms, and performant data processing.