TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Guide to PostgreSQL Performance

Schema design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

Performance tuning

PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database

Partitioning

Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning

Database design and modeling

An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design

Database replication

A PostgreSQL Database Replication Guide

Data analysis

A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL

Data transformation

How to Use PostgreSQL for Data Transformation

Data aggregation

How PostgreSQL Data Aggregation Works

Scaling postgres

A Guide to Scaling PostgreSQLBuilding a Scalable Database

Database security

Guide to PostgreSQL SecurityWhat Is Audit Logging and How to Enable It in PostgreSQL

Data management

Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres

Database operations

Guide to PostgreSQL Database Operations

JSON

How to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL Perspective

Query optimization

Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Optimizing Array Queries With GIN Indexes in PostgreSQLRecursive Query in SQL: What It Is, and How to Write One

Database tools and libraries

How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python

Database indexes

How to Monitor and Optimize PostgreSQL Index Performance

Database backups and restore

A Guide to pg_restore (and pg_restore Example)

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Mar 7, 2025

How to Index JSONB Columns in PostgreSQL

Try for free

Start supercharging your PostgreSQL today.

Written by Warda Bibi

JSON (JavaScript Object Notation) is an open standard format that structures data using key/value pairs and arrays, as defined in RFC 7159. It is widely used for data exchange, document storage, and handling unstructured data in web services.

In this article, we will explore various techniques and best practices for efficiently indexing JSON data in PostgreSQL. If you want to learn more, be sure to check out our dedicated articles on JSON and JSONB in PostgreSQL.

Why Use JSON in PostgreSQL?

At first glance, the idea of storing unstructured data like JSON in a relational database like PostgreSQL might seem counterintuitive. After all, relational databases are designed for structured, tabular data. However, there are compelling reasons why JSON support in PostgreSQL is not just useful but often essential for today's data-intensive applications.

Let’s explore some key scenarios where JSON is helpful:

Adapting to changing data structures

One of the biggest challenges in data management is dealing with evolving schemas. Traditional relational databases require a fixed structure, but real-world data often involves dynamic structures. JSON offers flexibility, allowing us to store data without rigidly defining its structure upfront. This is especially valuable for use cases like event logging, analytics, or tagging systems, where the data format can change frequently. Instead of constantly altering the database schema (which can be resource-intensive), JSON lets us adapt seamlessly. 

Simplifying nested data

When handling deeply nested or hierarchical data, JSON provides a more natural way to represent it. Flattening such structures into multiple tables or columns can lead to unnecessary complexity. With JSON, we can store and query nested objects directly, making our database design cleaner and more intuitive.

Streamlining data integration

In many cases, data comes from external systems in JSON format, such as APIs, third-party services, or modern applications. Instead of immediately transforming this data to fit a relational model, PostgreSQL allows us to store it as-is. This can serve as a temporary staging area before further processing or integration into other parts of our system.

JSON support was first introduced in PostgreSQL 9.2  in 2012. Over the years, PostgreSQL has evolved with powerful JSON features that allow it to natively handle arbitrary data structures. As of now, PostgreSQL provides two data types for working with JSON data: JSON and JSONB.

JSON vs. JSONB

The JSON and JSONB data types in PostgreSQL accept almost identical input. Their main difference is their efficiency.

  • The JSON data type stores the raw JSON text exactly as it is, including any spaces, line breaks, duplicates, and the order of keys in JSON objects. 

  • On the other hand, JSONB (here, B stands for binary) stores JSON data in a binary format. It converts the input into a binary representation before storing it.

  • This JSONB format eliminates the need to reparse the data each time it is accessed, leading to significantly better performance for querying and processing. However, the trade-off is that writing data to a JSONB column incurs some overhead due to the conversion from text to binary. This makes JSONB slightly slower to insert than JSON, but much faster for querying and processing.

  • JSONB also normalizes data by removing unnecessary whitespace and formatting, and it does not preserve the order of keys in objects. Additionally, it does not allow duplicate keys in JSON objects; only the last occurrence of a key is stored, and earlier duplicates are discarded.

  • One of JSONB's most significant advantages is its advanced support for indexing, which can drastically improve query performance, especially on large datasets.

When it comes to use cases, you should choose JSON if you don’t plan on performing frequent queries or need to preserve the exact format of the input data, such as for logging purposes. On the other hand, if you need to query or index the data, or if performance is a priority, then JSONB is the better choice.

Why Index JSON/JSONB?

Without indexes, PostgreSQL processes queries by performing sequential full-table scans, examining each row in the table to find matches. As the dataset grows, this approach becomes inefficient, leading to slower query execution due to increased I/O operations and higher CPU usage. By using an index, PostgreSQL can avoid making these repeated, slow comparisons. Instead of comparing entire columns row-by-row, the index can be used to quickly identify rows that match the condition, dramatically improving performance, even for large datasets. 

As JSON and JSONB are designed to store hierarchical structures, this can make querying even more challenging compared to flat, tabular data. Without proper indexing, PostgreSQL has to traverse the entire JSON structure for each row to find matches, which can be inefficient, even for smaller datasets.

PostgreSQL indexes solve this problem by allowing us to quickly pinpoint the relevant parts of the JSONB data, bypassing the need to scan and parse the entire structure repeatedly. We've written extensively about fine-tuning index performance in PostgreSQL, so check it out to learn more.

JSONB Indexes

PostgreSQL provides several options to index JSONB data, such as:

  • GIN

  • BTREE

  • HASH

  • TRIGRAM with GIN

It’s important to note that not all index types support every operator or query pattern. Choosing the right index depends on the types of queries you run most frequently. Let’s dive into each index type to understand its strengths and use cases.

GIN (Generalized Inverted Index) 

GIN stands for “Generalized Inverted indexes.” GIN is designed to index composite values, such as arrays or JSON fields, in tables. It is particularly useful for queries that need to efficiently search for elements within these composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words. To understand GIN indexes, let's say you are managing a large e-commerce platform that processes millions of transactions daily. Here’s what the data might look like:

id

user_id

amount

details (JSONB)

1

101

500

{"tags": ["electronics", "laptop", "discount"], "gift_points": false}

2

103

1500

{"tags": ["electronics", "phone"], "gift_points": true, "express_shipping": true}

3

104

200

{"tags": ["home_appliances", "discount"], "gift_points": false}

4

101

100

{"tags": ["beauty", "skincare"], "gift_points": true, "first_time_buyer": true}

Now, suppose your marketing team frequently searches for discount-related transactions to analyze the effectiveness of promotions. Without an index, PostgreSQL would have to:

  • Scan each row in the table.

  • Parse the JSONB column.

  • Check if the "discount" tag exists in the data.

This approach becomes extremely slow when dealing with millions of transactions. A GIN index optimizes these queries by mapping each unique tag to the corresponding transaction IDs that contain it.

For example, If the "tags" field contains "discount" in transaction 1, the GIN index creates an entry mapping:                      

 "discount" → {1}.

If "discount" also  appears in transaction 3, the index updates the mapping to: 

"discount" → {1, 3}.

This mapping allows the GIN index to quickly reference the transactions (rows) where the discount tag exists, eliminating the need to scan every transaction. 

GIN structure

image

Source: Oleg Bartunov and Teodor Sigaev, PostgreSQL Summit, Toronto

Instead of searching every transaction manually for specific tags, the GIN index has a list of all the tags found in the transactions. Each tag in the list points to a smaller list of rows that contain that tag.

Entry Page (Keywords Level)
  • Think of each entry as a unique tag from our JSONB column, such as "discount," "electronics," "laptop," etc.

  • Each keyword (like "discount") acts as an index entry, helping us find transactions containing that tag.

Entry Page ( Leaf Nodes)
  • If a tag (like "discount") appears in a few transactions, PostgreSQL stores a sorted list of those transaction IDs inside a list called the posting list.

  • If the tag appears too many times, it doesn’t fit into a simple list, so PostgreSQL uses a posting tree (B-Tree structure) instead. This makes searching even faster!

Posting Pages ( Item Pointers)
  • These store references to actual rows in the transactions table.

  • If "discount" appears in transaction IDs 1 and 3, the posting list would contain {1, 3}.

  • If "electronics" appears in many transactions (1, 2, ...1000+), PostgreSQL builds a tree structure for better search performance which further uses these posting pages for exact item pointers.

Operator classes:

An operator class can be thought of as a rulebook that provides GIN with instructions on how to manage specific data types. Each operator class defines the operations that can be performed on the data. GIN supports two operator classes:

  • jsonb_ops (default): ?, ?|, ?&, @>, @@, @?

    •  Index each key and value in the JSONB element

  • jsonb_pathops: @>, @@, @?

    • This uses an optimized GIN index structure that stores the key/value pairs as hash values: hashfn(field1, value1), hashfn(field2, value2), and hashfn(other_field, value2). 

Syntax:

CREATE INDEX index_name ON table_name USING gin (jsonb_column_name);

Existence Operators (?, ?|, ?& )

These operators can be used to check for the existence of top-level keys in the JSONB. Let’s create a GIN index on the details JSONB column. 

CREATE INDEX details_gin_idx ON transactions USING GIN (details);

To find all the transactions that have an express shipping tag.

postgres=# EXPLAIN ANALYZE SELECT * FROM transactions WHERE details ? 'express_shipping'; QUERY PLAN ----------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=2017.04..5660.14 rows=1166 width=56) (actual time=89.180..445.504 rows=1000001 loops=1) Recheck Cond: (details ? 'express_shipping'::text) Heap Blocks: exact=16656 -> Bitmap Index Scan on details_gin_idx (cost=0.00..2016.74 rows=1166 width=0) (actual time=87.190..87.191 rows=1000001 loops=1) Index Cond: (details ? 'express_shipping'::text) Planning Time: 0.184 ms Execution Time: 463.929 ms (7 rows)

As you can see from the EXPLAIN output, the GIN index we created is being used for the search. Let's say we want to find users who are first-time buyers and have tags:

postgres=# EXPLAIN ANALYZE SELECT * FROM transactions WHERE details ?& ARRAY['first_time_buyer', 'tags']; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on transactions (cost=444.00..448.01 rows=1 width=102) (actual time=1.408..1.409 rows=1 loops=1) Recheck Cond: (details ?& '{first_time_buyer,tags}'::text[]) Heap Blocks: exact=1 -> Bitmap Index Scan on details_gin_idx (cost=0.00..444.00 rows=1 width=0) (actual time=1.383..1.384 rows=1 loops=1) Index Cond: (details ?& '{first_time_buyer,tags}'::text[]) Planning Time: 0.173 ms Execution Time: 1.475 ms

The GIN index only supports the “existence” operators on “top-level” keys. If the key is not at the top level, then the index will not be used. It will result in a sequential scan:

postgres=# EXPLAIN ANALYZE SELECT * FROM transactions                           WHERE details->'tags' ? 'discount';

                          QUERY PLAN                                                        ------------------------------------------------------------------------------  Seq Scan on transactions  (cost=0.00..31656.10 rows=10000 width=102) (actual time=1.036..163.573 rows=299887 loops=1)    Filter: ((details -> 'tags'::text) ? 'discount'::text)    Rows Removed by Filter: 700120  Planning Time: 0.289 ms  Execution Time: 170.906 ms (5 rows)

The GIN index was not used because "tags" is a nested key.

Path Operators @>, <@:

The path operator tests whether one document contains another. This operator can compare partial JSON strings against a JSONB column. Let’s use it similar to the ? operator above:

postgres=# EXPLAIN ANALYZE SELECT * FROM transactions WHERE details @> '{"express_shipping" :true}'; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on transactions (cost=2007.19..21184.45 rows=201701 width=102) (actual time=90.695..230.534 rows=200056 loops=1) Recheck Cond: (details @> '{"express_shipping": true}'::jsonb) Rows Removed by Index Recheck: 399708 Heap Blocks: exact=16656 -> Bitmap Index Scan on details_gin_idx  (cost=0.00..1956.76 rows=201701 width=0) (actual time=88.512..88.512 rows=599764 loops=1) Index Cond: (details @> '{"express_shipping": true}'::jsonb) Planning Time: 0.425 ms Execution Time: 234.453 ms (8 rows) The path operators support queries on both nested and top-level objects:

postgres=# explain analyze SELECT * FROM transactions WHERE details @> '{"tags" :"discount"}'; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on transactions (cost=444.00..448.01 rows=1 width=102) (actual time=1.484..1.486 rows=0 loops=1) Recheck Cond: (details @> '{"tags": "discount"}'::jsonb) -> Bitmap Index Scan on details_gin_idx (cost=0.00..444.00 rows=1 width=0) (actual time=1.480..1.480 rows=0 loops=1) Index Cond: (details @> '{"tags": "discount"}'::jsonb) Planning Time: 0.274 ms Execution Time: 1.563 ms (6 rows)

B-Tree indexes

B-tree indexes are the most commonly used index type in relational databases. They are highly efficient for range queries and equality checks, making them ideal for structured data. However, when applied to an entire JSONB column, B-tree indexes only support whole-object comparisons using the operators =, <, <=, >, and >=. This is rarely useful in practice because JSONB objects are typically complex and contain nested values.

A more practical approach is to use B-tree expression indexes, which allow indexing specific keys or nested values inside a JSONB column. These indexes support the common comparison operators (=, <, >, <=, >=), which GIN indexes do not support.

Let's say we want to find all transactions where the first_time_buyer field is true.

postgres=# explain analyze SELECT * FROM transactions WHERE (details->'first_time_buyer')::boolean = true; QUERY PLAN ------------------------------------------------------------------------------  Seq Scan on transactions  (cost=0.00..31656.10 rows=333336 width=102) (actual time=0.047..113.492 rows=1 loops=1)    Filter: ((details -> 'first_time_buyer'::text))::boolean    Rows Removed by Filter: 1000006  Planning Time: 0.190 ms  Execution Time: 113.520 ms (5 rows) Even though there is a GIN index in the details column, it does not support the = operator. As a result, PostgreSQL performs a full sequential scan, making the query inefficient. To optimize this query, we can create a B-tree expression index on first_time_buyer inside the details JSONB column:

postgres=# CREATE INDEX idx_btree_first_time_buyer ON transactions USING BTREE (((details->'first_time_buyer')::boolean));

postgres=# explain analyze SELECT * FROM transactions WHERE (details->'first_time_buyer')::boolean = true;

                             QUERYPLAN                                                              -----------------------------------------------------------------------------  Bitmap Heap Scan on transactions  (cost=5533.79..29689.85 rows=333336 width=102) (actual time=0.073..0.075 rows=1 loops=1)    Filter: ((details -> 'first_time_buyer'::text))::boolean    Heap Blocks: exact=1    ->  Bitmap Index Scan on idx_btree_first_time_buyer  (cost=0.00..5450.45 rows=500004 width=0) (actual time=0.051..0.052 rows=1 loops=1)          Index Cond: (((details -> 'first_time_buyer'::text))::boolean = true)  Planning Time: 0.782 ms  Execution Time: 0.115 ms (7 rows)

Here, adding a B-tree index on first_time_buyer improved query performance, reducing execution time from 113 ms to just 0.1 ms.

Hash Indexes 

Hash indexes are a specialized type of index in PostgreSQL that are optimized for equality checks (i.e., the = operator). They are particularly useful when we need to perform exact matches on specific JSONB fields, whether those fields are top-level or deeply nested.

Hash indexes tend to be smaller than B-tree or GIN indexes, which can be beneficial for storage.

CREATE INDEX idx_hash_express ON transactions USING HASH (((details->'express_shipping')::boolean)); This index extracts the publisher field from the JSONB data column and creates a hash index on the extracted values.

postgres=# explain analyze SELECT * FROM transactions WHERE (details->'first_time_buyer')::boolean = true; QUERY PLAN ------------------------------------------------------------------------------  Seq Scan on transactions  (cost=0.00..31656.10 rows=333336 width=102) (actual time=0.047..113.492 rows=1 loops=1)    Filter: ((details -> 'first_time_buyer'::text))::boolean    Rows Removed by Filter: 1000006  Planning Time: 0.190 ms  Execution Time: 113.520 ms (5 rows)

This shows that the query uses the hash index to quickly locate the row where express_shipping equals true.

  • fields: Indexing every field can lead to index bloat and increased storage usage.

GIN Trigram Indexes

PostgreSQL supports a powerful feature called trigram indexes for efficient string matching, especially when dealing with partial or fuzzy searches. When combined with GIN indexes, trigram indexes can be used to index and query JSONB fields for arbitrary substring matches.

Trigram indexes work by breaking up text into trigrams, which are sequences of three consecutive characters. For example, the word "hello" is broken into the trigrams: "hel", "ell", "llo".

These trigrams are then indexed, allowing PostgreSQL to efficiently search for patterns within strings, even if the pattern appears in the middle of the text.

When working with JSONB data, we might need to search for partial matches within string fields, such as finding a tag field that contains a specific substring ("dis").

Before creating a trigram index, we need to enable the pg_trgm extension, which provides the necessary functions and operators for trigram indexing.

CREATE EXTENSION pg_trgm; Now we can create a GIN trigram index on a specific JSONB field using the gin_trgm_ops operator class.

CREATE INDEX tags_trgms_idx ON transactions USING GIN ((details->>'tags') gin_trgm_ops);

We can now perform efficient substring searches on the tags field.

Explain analyze SELECT * FROM transactions WHERE (details->>'tags')::text LIKE '%dis%'; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on transactions (cost=414.93..31248.46 rows=40120 width=102) (actual time=42.538..157.725 rows=300796 loops=1) Recheck Cond: ((details ->> 'tags'::text) ~~ '%dis%'::text) Heap Blocks: exact=16707 -> Bitmap Index Scan on tags_trgms_idx (cost=0.00..404.90 rows=40120 width=0) (actual time=38.784..38.784 rows=300796 loops=1) Index Cond: ((details ->> 'tags'::text) ~~ '%dis%'::text) Planning Time: 0.336 ms Execution Time: 164.067 ms (7 rows)

postgres=# explain analyze SELECT * FROM transactions WHERE (details->'express_shipping')::boolean = true; QUERY PLAN ------------------------------------------------------------------------------ Bitmap Heap Scan on transactions: (cost=24615.32..60866.85 rows=334201 width=102) (actual time=22.495..362.902 rows=203054 loops=1) Filter: ((details -> 'express_shipping'::text))::boolean Heap Blocks: exact=16707 -> Bitmap Index Scan on idx_hash_express (cost=0.00..24531.76 rows=501302 width=0) (actual time=18.872..18.872 rows=203054 loops=1) Index Cond: (((details -> 'express_shipping'::text))::boolean = true) Planning Time: 0.247 ms Execution Time: 367.889 ms (7 rows) This shows that the query uses the GIN trigram index (tags_trgms_idx) to efficiently locate rows where the tags field contains the substring "dis".

Note: Regular B-tree indexes are left-anchored, meaning they only work efficiently for patterns that start at the beginning of the string ( LIKE 'DIS%'). However, GIN trigram indexes allow us to search for arbitrary substrings (LIKE '%DIS%').

Conclusion

Indexes are powerful tools and are often essential for optimizing performance. However, they must be used thoughtfully, especially on tables with frequent write operations, as improper indexing can lead to overhead and inefficiencies. By carefully selecting the right indexing strategies based on your query patterns, JSONB's advanced indexing capabilities enhance query performance and maintain a well-balanced, high-performing database.

If you want to get even more performance out of your PostgreSQL database, try TimescaleDB. Available as a standalone on-prem solution or as part of Timescale Cloud's fully managed PostgreSQL platform, which suits the needs of mission-critical applications, TimescaleDB extends PostgreSQL for real-time analytics, events, vector data, and time-series data.

If you think PostgreSQL can't scale, you're probably doing it wrong—Timescale can fix it. Install TimescaleDB on your machine today or skip the extra steps and create a 30-day free Timescale Cloud account.

Learn more

  • How to Query JSON Metadata in PostgreSQL

  • How to Query JSONB in PostgreSQL

On this page

    Try for free

    Start supercharging your PostgreSQL today.