Skip to content

Semantic catalog CLI reference

The pgai semantic catalog feature provides a comprehensive command-line interface for managing semantic catalogs that enable natural language to SQL functionality. This document provides detailed i...

The pgai semantic catalog feature provides a comprehensive command-line interface for managing semantic catalogs that enable natural language to SQL functionality. This document provides detailed information about each CLI command, their usage, and their purpose in the text-to-SQL workflow.

The semantic catalog CLI commands are accessed through the pgai semantic-catalog command group. Each command serves a specific purpose in the workflow of creating, managing, and using semantic catalogs for natural language database interactions.

  1. Describe - Generate natural language descriptions of database objects
  2. Create - Create a new semantic catalog with embedding configuration
  3. Import - Import descriptions into the semantic catalog
  4. Vectorize - Generate embeddings for semantic search capabilities
  5. Search - Perform semantic searches to find relevant database objects
  6. Generate SQL - Generate SQL statements from natural language prompts

A semantic catalog is used to describe and generate SQL for a postgres database. The semantic catalog itself is also stored in a postgres database. It can be colocated with the database being described, or it can be stored in a separate database.

We refer to the database being described as the target database. We use the -d or --db-url argument or the TARGET_DB environment variable to specify this postgres connection string.

The database containing the semantic catalog is the catalog database. We use the -c or --catalog-db-url argument or the CATALOG_DB environment variable to specify this postgres connection string.

Some commands only need a connection to one database or the other. Some commands need a connection to both databases.

If you store the semantic catalog in the target database, you can specify just the target database connection string (-d, --db-url or TARGET_DB) and omit the catalog database connection string.

Purpose: Automatically generate natural language descriptions for database objects using AI.

This command connects to a database, analyzes its schema, and uses a large language model to create human-readable descriptions of tables, views, procedures, and other database objects. These descriptions form the foundation of your semantic catalog.

The command creates or appends to a YAML file. You may review and edit the descriptions. The file can be imported into a semantic catalog with the import command.

Terminal window
pgai semantic-catalog describe [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-d, --db-urlDatabase connection URLRequiredTARGET_DB
-m, --modelLLM model for generating descriptionsopenai:gpt-4.1
-f, --yaml-fileOutput YAML file pathstdout
--include-schemaRegex pattern to include schemas
--exclude-schemaRegex pattern to exclude schemas
--include-tableRegex pattern to include tables
--exclude-tableRegex pattern to exclude tables
--include-viewRegex pattern to include views
--exclude-viewRegex pattern to exclude views
--include-procRegex pattern to include procedures
--exclude-procRegex pattern to exclude procedures
--include-extensionExtension names to include objects from
-a, --appendAppend to output file instead of overwritingfalse
--sample-sizeNumber of sample rows per table/view3
--batch-sizeObjects to process per LLM request5
--request-limitMaximum LLM requests (cost control)
--total-tokens-limitMaximum LLM tokens (cost control)
--dry-runList objects without describing themfalse
-q, --quietSuppress progress messagesfalse
Terminal window
# Generate descriptions for all objects in a database
pgai semantic-catalog describe -f descriptions.yaml
# Only include specific schemas and exclude system objects
pgai semantic-catalog describe \
--include-schema "public|app_.*" \
--exclude-schema "pg_.*|information_schema" \
-f descriptions.yaml
# Use a different model with cost controls
pgai semantic-catalog describe \
--model anthropic:claude-3-sonnet-20240229 \
--request-limit 100 \
--total-tokens-limit 50000 \
-f descriptions.yaml
# Dry run to see what would be processed
pgai semantic-catalog describe --dry-run
  • Starting a new text-to-SQL project with an existing database
  • Adding new database objects that need descriptions
  • Refreshing descriptions after significant schema changes
  • Creating documentation for database objects

Purpose: Create a new semantic catalog with an embedding configuration.

This command initializes a semantic catalog in your database and sets up the necessary embedding configuration for generating vector embeddings. The catalog requires at least one embedding configuration to enable semantic search.

Terminal window
pgai semantic-catalog create [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB or TARGET_DB
-n, --catalog-nameName for the semantic catalogdefault
-e, --embed-configName for the embedding configuration
-p, --providerEmbedding provideropenai
-m, --modelEmbedding modeltext-embedding-3-small
-v, --vector-dimensionsVector dimensions1536
--base-urlCustom base URL for embedding provider
--api-key-nameEnvironment variable containing API key

OpenAI (requires API key):

  • text-embedding-3-small (1536 dimensions)
  • text-embedding-3-large (3072 dimensions)
  • text-embedding-ada-002 (1536 dimensions)

Ollama (local inference):

  • nomic-embed-text
  • mxbai-embed-large
  • Any embedding model available in your Ollama instance

Sentence Transformers (local inference):

  • all-MiniLM-L6-v2
  • all-mpnet-base-v2
  • Any model from Hugging Face sentence-transformers
Terminal window
# Create a catalog with OpenAI embeddings (default)
pgai semantic-catalog create
# Create a catalog with custom name and embedding configuration
pgai semantic-catalog create \
--catalog-name "production_catalog" \
--embed-config "openai_embeddings"
# Create a catalog with Ollama (local model)
pgai semantic-catalog create \
--provider ollama \
--model nomic-embed-text \
--vector-dimensions 768
# Create a catalog with custom OpenAI base URL
pgai semantic-catalog create \
--provider openai \
--base-url "https://api.openai.com/v1" \
--api-key-name "CUSTOM_OPENAI_KEY"
  • Setting up your first semantic catalog
  • Creating separate catalogs for different environments (dev, staging, prod)
  • Setting up A/B testing with different embedding configurations
  • Migrating to a new embedding provider or model

Purpose: Import database object descriptions from a YAML file into a semantic catalog.

This command reads descriptions (typically generated by the describe command) from a YAML file and loads them into your semantic catalog. After importing, it will automatically generate embeddings for the imported items.

Terminal window
pgai semantic-catalog import [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-d, --db-urlTarget database connection URLRequiredTARGET_DB
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB
-f, --yaml-fileInput YAML file pathstdin
-n, --catalog-nameSemantic catalog namedefault
-e, --embed-configEmbedding configuration to vectorizeAll configs
-b, --batch-sizeEmbeddings per batch
Terminal window
# Import from a YAML file
pgai semantic-catalog import -f descriptions.yaml
# Import to a specific catalog and vectorize only a specific embedding config
pgai semantic-catalog import \
-f descriptions.yaml \
--catalog-name "my_catalog" \
--embed-config "openai_embeddings"
# Import from stdin
cat descriptions.yaml | pgai semantic-catalog import
# Import with custom batch size for vectorization
pgai semantic-catalog import \
-f descriptions.yaml \
--batch-size 16
  • Loading initial descriptions into a new semantic catalog
  • Updating descriptions after running the describe command
  • Migrating descriptions between environments
  • Restoring a semantic catalog from backup

Purpose: Generate vector embeddings for items in the semantic catalog.

This command processes database objects, SQL examples, and facts in your semantic catalog that don’t yet have embeddings and generates vector representations using your configured embedding provider. These embeddings enable semantic search capabilities.

Terminal window
pgai semantic-catalog vectorize [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB or TARGET_DB
-n, --catalog-nameSemantic catalog namedefault
-e, --embed-configEmbedding configuration to useAll configs
-b, --batch-sizeItems per vectorization batch32
Terminal window
# Vectorize all items using all embedding configurations
pgai semantic-catalog vectorize
# Vectorize using a specific embedding configuration
pgai semantic-catalog vectorize --embed-config "openai_embeddings"
# Vectorize with custom batch size
pgai semantic-catalog vectorize --batch-size 16
# Vectorize a specific catalog
pgai semantic-catalog vectorize --catalog-name "production_catalog"
  • After importing new descriptions into the catalog
  • When adding a new embedding configuration to existing data
  • Regenerating embeddings after model updates
  • Processing items that failed during initial vectorization

Purpose: Search the semantic catalog using natural language queries.

This command performs semantic search across database objects, SQL examples, and facts using natural language. It’s useful for exploring your database schema, finding relevant examples, and understanding what data is available.

Terminal window
pgai semantic-catalog search [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-d, --db-urlTarget database connection URLRequiredTARGET_DB
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB
-n, --catalog-nameSemantic catalog namedefault
-e, --embed-configEmbedding configuration to useFirst available
-p, --promptNatural language search queryRequired
-s, --sample-sizeSample rows per table/view3
--renderShow formatted results for LLM promptsfalse
Terminal window
# Search for user-related objects
pgai semantic-catalog search --prompt "user accounts and profiles"
# Search with specific question
pgai semantic-catalog search --prompt "How are orders related to customers?"
# Search and see how results would be rendered for LLM
pgai semantic-catalog search \
--prompt "product inventory and stock levels" \
--render
# Search with more sample data
pgai semantic-catalog search \
--prompt "sales data" \
--sample-size 5
  • Exploring unfamiliar database schemas
  • Finding relevant tables for a specific business question
  • Discovering existing SQL examples for similar queries
  • Understanding relationships between database objects
  • Testing the quality of your semantic catalog

Purpose: Generate SQL statements from natural language prompts using the semantic catalog.

This is the primary command for text-to-SQL functionality. It uses the semantic catalog to find relevant context and generates SQL statements that answer your natural language questions.

Terminal window
pgai semantic-catalog generate-sql [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-d, --db-urlTarget database connection URLRequiredTARGET_DB
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB
-m, --modelLLM model for SQL generationopenai:gpt-4.1
-n, --catalog-nameSemantic catalog namedefault
-e, --embed-configEmbedding configuration to useFirst available
-p, --promptNatural language queryRequired
--iteration-limitMaximum refinement attempts5
-s, --sample-sizeSample rows per table/view3
--request-limitMaximum LLM requests
--total-tokens-limitMaximum LLM tokens
--print-messagesShow LLM conversationfalse
--print-usageShow token usagefalse
--print-query-planShow query execution planfalse
--save-final-promptSave final LLM prompt to file
Terminal window
# Generate SQL for a simple question
pgai semantic-catalog generate-sql \
--prompt "Find all users who signed up last month"
# Use a specific model with debugging enabled
pgai semantic-catalog generate-sql \
--model "anthropic:claude-3-opus-20240229" \
--prompt "What are the top 5 products by revenue?" \
--print-usage \
--print-messages
# Generate SQL with cost controls
pgai semantic-catalog generate-sql \
--prompt "Show customer order history" \
--request-limit 10 \
--total-tokens-limit 20000
# Save the final prompt for analysis
pgai semantic-catalog generate-sql \
--prompt "Find inactive customers" \
--save-final-prompt debug_prompt.txt
  • Converting business questions to SQL queries
  • Exploring data through natural language
  • Rapid prototyping of data analysis queries
  • Training and education on database querying
  • Building natural language interfaces to your database

Purpose: Export semantic catalog contents to a YAML file.

This command exports all database objects, SQL examples, and facts from a semantic catalog to a YAML file. This is useful for backups, migration between environments, or editing catalog contents (file can be subsequently imported).

Terminal window
pgai semantic-catalog export [OPTIONS]
OptionDescriptionDefaultEnvironment Variable
-c, --catalog-db-urlCatalog database connection URLCATALOG_DB or TARGET_DB
-f, --yaml-fileOutput YAML file pathstdout
-n, --catalog-nameSemantic catalog namedefault
Terminal window
# Export to a YAML file
pgai semantic-catalog export -f catalog_backup.yaml
# Export a specific catalog
pgai semantic-catalog export \
--catalog-name "production_catalog" \
-f production_backup.yaml
# Export to stdout and pipe to another command
pgai semantic-catalog export | gzip > catalog_backup.yaml.gz
  • Creating backups of semantic catalogs
  • Migrating catalogs between environments
  • Sharing catalog contents with team members
  • Version controlling semantic catalog contents

Purpose: Fix database object references in the semantic catalog after database changes.

When database operations like dumps/restores, renames, or schema changes occur, the internal references in your semantic catalog may become outdated. This command updates these references to maintain accuracy.

Terminal window
pgai semantic-catalog fix [OPTIONS]
OptionDescriptionDefault
-d, --db-urlTarget database connection URLRequired
-c, --catalog-db-urlCatalog database connection URL
-n, --catalog-nameSemantic catalog namedefault
-m, --modeFix mode: fix-ids or fix-namesfix-ids
--dry-runShow what would be changedfalse

fix-ids: Updates internal PostgreSQL object IDs

  • Use after database dumps/restores
  • Use after major schema changes
  • Updates classid, objid, objsubid references

fix-names: Updates object name identifiers

  • Use after object renames
  • Use after schema renames
  • Updates objnames arrays
Terminal window
# Fix internal iDs after database restore
pgai semantic-catalog fix --mode fix-ids
# Fix object names after renames
pgai semantic-catalog fix --mode fix-names
# Dry run to see what would be fixed
pgai semantic-catalog fix --mode fix-ids --dry-run
# Fix a specific catalog
pgai semantic-catalog fix \
--catalog-name "production_catalog" \
--mode fix-names
  • After database dumps and restores
  • After renaming database objects or schemas
  • When semantic catalog searches return incorrect results
  • After major database schema changes
  • When object references become stale
Terminal window
# 1. generate descriptions
pgai semantic-catalog describe -f descriptions.yaml
# 2. review and edit descriptions.yaml if needed
# 3. create catalog
pgai semantic-catalog create
# 4. import descriptions
pgai semantic-catalog import -f descriptions.yaml
# 5. test search functionality
pgai semantic-catalog search --prompt "your test query"
# 6. generate SQL
pgai semantic-catalog generate-sql --prompt "your business question"
Terminal window
# Use separate catalog database
export CATALOG_DB="postgres://user:pass@catalog-host:5432/catalog_db"
export TARGET_DB="postgres://user:pass@app-host:5432/app_db"
# Create named catalog with specific embedding config
pgai semantic-catalog create \
--catalog-name "production" \
--embed-config "production_embeddings"
# Import
pgai semantic-catalog import \
-f descriptions.yaml \
--catalog-name "production"
Terminal window
# Regular backup
pgai semantic-catalog export \
--catalog-name "production" \
-f "backup-$(date +%Y%m%d).yaml"
# After database changes
pgai semantic-catalog fix --mode fix-ids
# Update descriptions periodically
pgai semantic-catalog describe -f new_descriptions.yaml
pgai semantic-catalog import -f new_descriptions.yaml

“No embedding configuration found”

  • Run pgai semantic-catalog create to set up embedding configuration
  • Check that the catalog name and embedding config name are correct

“Connection refused”

  • Verify database URLs are correct
  • Ensure databases are running and accessible
  • Check authentication credentials

“API key not found”

  • Set the appropriate environment variable (e.g., OPENAI_API_KEY)
  • Use --api-key-name to specify custom environment variable

“Object not found” after database restore

  • Run pgai semantic-catalog fix --mode fix-ids to update object references

Poor SQL generation quality

  • Review and improve object descriptions in YAML files
  • Add more SQL Examples and/or Facts to the catalog
  • Use higher quality embedding models
  • Use a more powerful LLM model
  • Increase sample size for more context

Use the --help flag with any command for detailed option information:

Terminal window
pgai semantic-catalog --help
pgai semantic-catalog describe --help
pgai semantic-catalog generate-sql --help