The Database Has a New User—LLMs—and They Need a Different Database

Posted by

Matvey Arye

We’re experimenting with a database that can self-describe. And we’re starting with the most popular one in the world: Postgres.

TL;DR: We’re experimenting with turning Postgres into a self-describing database, embedding meaning as part of the schema. By providing natural language explanations of PostgreSQL structures and logic, agents can more accurately query data, and answer a broader set of questions. In our early tests, using an LLM-generated semantic catalog improved SQL generation accuracy by up to 27%. Here's the repo link for reference.

Databases Lack Context About Their Structures

As any developer who has ever had the (mis)fortune of working on a legacy database knows quite well, a database is not self-describing. You can’t look at a database schema and tell what’s going on. That orders1 table can be for a purchase order or a customer order or it’s an experimental table someone created and forgot to drop 5 years ago. This is a long-standing problem with databases that people solve by talking to each other, looking at code that interacts with the database, examining git history, and screaming at the wall.

But, LLMs need to answer questions about the data just from the context provided by the database. No wonder they get confused.

Like our fearless leader Ajay Kulkarni once said—“LLMs crave (accurate) context the same way GPUs crave power.”

In order for agents to extract insights from data in the database it needs to understand which columns and tables to query. For example, if it doesn’t understand which table contains the customer vs purchase orders, it can’t understand when a customer’s order has shipped.

In fact, in internal experiments we conducted at TigerData, we found that 42% of context-less LLM-generated SQL queries missed critical filters or misunderstood relationships, silently returning misleading data.

The larger (and thus probably more important) the database, the worse things get.

It doesn’t take a senior engineer to know that this is unacceptable.

Adding Context with Semantic Catalogs

The solution is incredibly simple: allow developers to add context about the database in the form of natural language descriptions of the schemas and business logic within Postgres, injecting much-needed factual meaning into schemas.

We arrived at this by putting ourselves in the place of the LLM — asking, how would we generate the query if we were in its shoes? The answer was clear: without additional context, we wouldn’t know enough to make the right call. Along the way, we tried different prompts, techniques, and even alternative LLMs, but the core issue was always the same: the model simply lacked the context necessary to generate the query. That realization led us to build a way for developers to provide that missing context directly.

That's what we’ve been experimenting with and are excited to share today.

We set out to test what happens when SQL generation is powered by our LLM-generated semantic catalog. The outcome was a 27% boost in accuracy compared to the control.

Let's show you how we created this self-describing Postgres database.

Building a Self-Describing Database

Our thoughts for this self-describing Postgres are built around four core ideas:

Embed semantics alongside schema:
Every table, column, function, and business rule should be described clearly—in natural language.
Versioned and governed descriptions:
Metadata should live alongside application code, version-controlled, reviewed, and governed with the same rigor.
Self-correcting querying:
Postgres itself should expose safety hints, telling agents which queries are expensive or unsafe, and provide deterministic verification mechanisms (EXPLAIN) to catch errors before queries run.
Measure and iterate transparently:
Developers should be able to build an evaluation suite for agent interactions and determine how well the system performed. Critically, the developers should see what errors are due to bad schema descriptions and lack of context (retrieval errors) or incorrect reasoning (logical errors).

What We’re Experimenting With Today

We’re exploring this approach through two building blocks. This post primarily covers the Semantic Catalog. Next week, we’ll publish a post about the evaluation harness.

Component	What It Does	Repo
Semantic Catalog	Stores natural-language descriptions for schema elements. Supports vector search to retrieve relevant context dynamically.	Link
Evaluation Harness	Measures query accuracy in any agentic text-to-sql system and transparently reveals errors in retrieval (bad metadata/context) versus reasoning (bad SQL generation). Helps track accuracy over time.	TBD

Why isn't the problem just retrieval?

Retrieval using the schema element names alone does not cut it simply because the names lack enough meaning. It doesn’t matter if you use keyword search, semantic search, or this week’s newest shiny retrieval algorithm—the semantic meaning of payment_aborts versus refunds is too close for any retrieval mechanism to make heads or tails of it.

Even if you injected the entire schema into the context of your prompt (burning through cash and increasing response latency in the process), it’s doubtful the model would be able to reason about payment_aborts versus refunds either.

The semantic catalog solves this by offering a structured, natural language representation of your database's metadata and business logic. In this experimental version, it supports documentation and descriptions for:

Tables and views
Columns
Functions and procedures
Example queries
Facts: business logic rules or freeform context

Initial descriptions can be LLM-generated and stored in human-readable YAML files designed for version control, peer review, and governance. Once reviewed, this metadata can be imported into the semantic catalog and indexed for semantic search, simplifying the process of retrieving and reasoning about data structures.

Performance enhancement through semantic context

In our early tests, using an LLM-generated semantic catalog improved SQL generation accuracy by up to 27%. Effective SQL generation involves two key steps:

Retrieving Relevant Context: Identifying necessary database components
SQL Generation Reasoning: Formulating accurate SQL queries based on the retrieved context

Our evaluation revealed substantial improvements from using semantic descriptions for SQL generation accuracy (from 58% to 86%) for certain schemas. A smaller improvement was noted when using semantic descriptions in retrieval recall (from 52% to 58% on one of our tests).

Results varied significantly across datasets, schemas, and models. Our current evaluations were conducted using a more cost-effective model (OpenAI's gpt-4.1-nano). We're particularly interested in insights from users experimenting with diverse datasets and more advanced models.

With these results, let’s walk through how the agent actually interacts with the database.

How queries work: the minimal agent loop

Every semantic query follows this four-step flow:

Developers and agents alike gain visibility and control at each step.

Walking through step-by-step:

Step 1: Describe your database using an LLM

Generate initial YAML descriptions with an LLM:

pgai semantic-catalog describe -f descriptions.yaml

Review and refine these descriptions, storing them with the same rigor as your application code.

The resulting structure looks like this:

schema: public
name: restaurant
type: table
description: Stores core information about each restaurant, including its cuisine,
  city, and rating.
columns:
- name: id
  description: Primary key identifier for the restaurant.
- name: name
  description: Lowercased name of the restaurant.
- name: food_type
  description: Lowercased name of the type of cuisine or food style offered by the restaurant.
- name: city_name
  description: Lowercased name of the city where the restaurant is located.
- name: rating
  description: Numeric rating score assigned to the restaurant, where higher than 2.5 is considered good.
...
---
type: sql_example
sql: SELECT t2.house_number, t1.name FROM LOCATION AS t2 JOIN restaurant AS t1 ON t1.id = t2.restaurant_id WHERE t1.city_name IN (SELECT city_name FROM geographic WHERE region = 'bay area') AND t1.rating > 2.5
description: give me some good restaurants in the bay area ?
...
---
type: fact
description: When asking for a restaurant, provide its name (restaurant.name) and its house number (location.house_number).
...

Step 2: Have a human review the description and add context

A developer would then review the descriptions and add additional context. Crucially, business logic that is central to database operations is often not encoded into the schema, and thus cannot be derived by the LLM. A developer needs to provide that information.

After the descriptions are reviewed, they should be treated as a core part of application code and thus should be stored in version control, be reviewed in pull-requests, etc.

Step 3: Import into the catalog

Make descriptions available:

pgai semantic-catalog import -f descriptions.yaml

This supports declarative configuration and continuous deployment practices.

Step 4: Generate SQL

Generate SQL from natural language queries:

CLI:

pgai semantic-catalog generate-sql -p "Which passengers have experienced the most flight delays in 2024?"

Python:

response = await catalog.generate_sql(
    con,
    con,
    "openai:gpt-4.1",
    "Which passengers have experienced the most flight delays in 2024?",
)

Key Lessons So Far

Semantic context matters

Models are far better at generating correct SQL when they have access to rich semantic information, not just schema names, but natural-language descriptions. Even a small amount of context (e.g. "the order table tracks transactions for customers buying subscription plans") can dramatically improve reliability.

Narrow interfaces build confidence

What we’ve noticed with agentic text-to-sql systems is that to get good accuracy you need to tighten the scope of what kind of information the system has access to. The relationship looks something like this:

Successful agentic systems start with tight scopes. We found it valuable to restrict agents to function-level or view-level access at first, and only expand access once correctness is proven.

Postgres allows you to balance control and flexibility through three main interfaces:

Functions: Highly controlled but narrow scope
Views: Moderate control with broader access
Raw Tables: Most general but least constrained and more prone to failure

We suggest beginning with tightly scoped functions, then expanding access as confidence grows.

Self-Correcting Using EXPLAIN

To further improve reliability, we employ Postgres's deterministic EXPLAIN command. This preemptively catches query errors such as incorrect column or table names, allowing agents to self-correct and substantially increasing accuracy.

Store the Semantic Catalog Anywhere

For many deployments, integrating the semantic catalog directly into your existing database provides the simplest path. Recognizing, however, the risks and complexities of altering production environments, we've also enabled the semantic catalog to be hosted independently in a separate database. This approach offers greater deployment flexibility and ensures accessibility even when you only have read-only access to your primary database.

What’s Next: From Self-Describing to Self-Learning

While this semantic foundation is already improving agentic querying, we're just beginning. Our roadmap focuses on:

Self-learning catalog: Automatically enrich metadata by analyzing queries from production environments. Let databases learn from usage to continuously enhance accuracy.
Dynamic policy management: Express complex access rules—like row-level privacy policies—in natural language, automatically enforced by the database.

We’ve open-sourced everything we’ve built so far. Check out the README with a Quickstart to dive in, and we warmly invite your contributions, whether it's running evaluations on your schemas, proposing datasets, or challenging our assumptions. The best databases aren't just designed; they evolve through community effort.

About the author

Matvey Arye is a founding engineering leader at TigerData (creators of TimescaleDB), the premiere provider of relational database technology for time-series data and AI. Currently, he manages the team at TigerData responsible for building the go-to developer platform for AI applications. Under his leadership, the TigerData engineering team has introduced partitioning, compression, and incremental materialized views for time-series data, plus cutting-edge indexing and performance innovations for AI.

Matvey earned a Bachelor degree in Engineering at The Cooper Union. He earned a Doctorate in Computer Science at Princeton University where his research focused on cross-continental data analysis covering issues such as networking, approximate algorithms, and performant data processing.

About the author

Matvey Arye

TigerData Engineer

Date published

Aug 21, 2025

Posted by

Matvey Arye

Get Started Free with Tiger CLI

Date published

Aug 21, 2025

Posted by

Matvey Arye

Get Started Free with Tiger CLI