TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Guide to PostgreSQL Performance

Schema design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

Performance tuning

PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database

Partitioning

Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning

Database design and modeling

An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design

Database replication

A PostgreSQL Database Replication Guide

Data analysis

A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL

Data transformation

How to Use PostgreSQL for Data Transformation

Data aggregation

How PostgreSQL Data Aggregation Works

Scaling postgres

A Guide to Scaling PostgreSQLBuilding a Scalable Database

Database security

Guide to PostgreSQL SecurityWhat Is Audit Logging and How to Enable It in PostgreSQL

Data management

Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres

Database operations

Guide to PostgreSQL Database Operations

JSON

How to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL Perspective

Query optimization

Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Optimizing Array Queries With GIN Indexes in PostgreSQLRecursive Query in SQL: What It Is, and How to Write One

Database tools and libraries

How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python

Database indexes

How to Monitor and Optimize PostgreSQL Index Performance

Database backups and restore

A Guide to pg_restore (and pg_restore Example)

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Mar 28, 2025

Optimizing Array Queries With GIN Indexes in PostgreSQL

Try for free

Start supercharging your PostgreSQL today.

Written by Warda Bibi

Efficient querying is crucial when managing data in PostgreSQL. One way to ensure this is by using the right indexes on your tables. Indexes help queries retrieve results faster by skipping irrelevant parts of the table. Among the various index types PostgreSQL offers, one important index type is GIN (Generalized Inverted Index).  

Although its name might sound complex, the concept is straightforward. GIN indexes excel at searching for elements within composite items such as arrays, text fields, or JSON. In this article, we'll walk you through on how to optimize array queries using GIN indexes.

This exploration of GIN indexes highlights why we've invested heavily in optimizing indexing capabilities at Timescale. Our work on PostgreSQL indexes for compressed columnar data represents a significant advancement, allowing developers to enjoy the benefits of columnar compression while maintaining the flexibility of PostgreSQL's rich indexing capabilities. Check it out!

The Importance of Indexing an Array

Without indexes, PostgreSQL processes queries by performing sequential full-table scans, examining each row in the table to find matches. As the dataset grows, this approach becomes inefficient, leading to slower query execution due to increased I/O operations and higher CPU usage. 

Consider a table with a column sensor_readings { Temperature , Humidity , Motion , BatteryLevel , Alert } containing arrays of values captured by IoT devices: 

  • Row 1: {25.5, 60.2,Null,Null, 1} (Temperature: 25.5°C, Humidity: 60.2%, Alert: 1) 

  • Row 2: {Null,Null,0, 15.3, 1} (Motion: 0, Battery Level: 15.3%, Alert: 1) 

  • Row 3: {25.5, Null, 0, 10.5,Null} (Temperature: 25.5°C, Motion: 0, Battery Level: 10.5%) 

Suppose you want to find all rows where the sensor_readings array contains the values {25.5, 1} (Temperature = 25.5°C and Alert = 1). PostgreSQL will compare each value in the search array {25.5, 1} with the elements in the sensor_readings array for every row. 

For example: 

  • In row 1 ({25.5, 60.2,Null,Null, 1}), PostgreSQL checks if both 25.5 and 1 are present. Since they are, this row is a match. 

  • In row 2 ({Null,Null,0, 15.3, 1}), only 1 is present, so this row is not a match. 

  • In row 3 ({25.5, Null, 0, 10.5,Null}), only 25.5 is present, so this row is not a match. 

PostgreSQL goes through each row in the table and checks if all the values in the search array match those in the column's array. However, performing this comparison for each row can be very slow for large datasets, potentially resulting in thousands of operations, especially as the size of the arrays and the number of rows increase.

By using an index, PostgreSQL avoids these repeated, slow comparisons. Instead of comparing entire arrays row-by-row, the index can quickly identify rows that match the condition, dramatically improving performance, even for large datasets. 

For faster queries, learn how to optimize your PostgreSQL database indexes.

What Are GIN Indexes?  

The Generalized Inverted Index (GIN) is specifically designed to index composite values, such as arrays or JSON fields, in tables. It is particularly useful for queries that need to search for elements within these composite items efficiently. 

 Let's say you have transaction data in a database where: 

  • Each row represents an account. 

  • Each account has done multiple transactions, stored as an array of transaction values.

Acc_id

Transactions

1

{$100, $400, $600}

2

{$200, $500}

Here:

  • Row 1: Account 1 (Acc1) has completed transactions of $100, $400, and $600. 

  • Row 2: Account 2 (Acc2) has completed transactions of $200 and $500.   

Now, you frequently run queries to find transactions that meet specific criteria, such as transactions exceeding a certain value (e.g., "$300"). 

Without an index, the database must scan each row and check if any transaction value in the array meets the condition (e.g., "greater than $300"). This requires evaluating every element in every array across all rows, which is slow and resource-intensive.  

A GIN index works by creating a mapping between each unique transaction value (the key) and the corresponding row IDs (accounts) that contain that value. This efficient mapping allows PostgreSQL to quickly locate rows containing specific elements without scanning the entire table. 

  • If a transaction value of $300 is found in the array of transactions for Account 1 (Row 1), the index records that $300 is associated with Account 1. 

  • Similarly, if $300 is found in Account 2, the index maps $300 to Account 2 as well. 

  • This mapping is stored in a structure called listOfAccounts, which is a set of row IDs (accounts) linked to a specific transaction value. 

  • For example, if $300 appears in the transaction arrays for Accounts 1 and 2, the listOfAccounts for $300 would be {1, 2}. 

This mapping allows the GIN index to quickly reference the accounts (rows) where $300 exists, eliminating the need to scan every row and every transaction. 

The term "generalized" in GIN refers to its ability to handle different types of data, such as arrays, JSON, or full-text documents. However, GIN itself doesn’t understand these data types. Instead, it relies on operator classes to guide it on how to interact with and process these various data types. 

What Are Operator Classes?

An operator class can be thought of as a rulebook that provides GIN with instructions on how to manage specific data types. Each operator class defines the operations that can be performed on the data. For example:  

  • For arrays, does one array contain another (@>), or do two arrays overlap (&&)?  

  • For JSON, does a JSON object contain a specific key (?), or does one JSON object contain another (@>)? 

These instructions ensure that GIN efficiently handles complex data types like arrays and JSON. Each operation, such as @> or ?, is assigned a strategy number within the operator class. For instance, the @> operator for arrays might be strategy 1, and the && operator could be strategy 2. When you run a query like SELECT * FROM my_table WHERE my_array_column @> '{1, 2}'; , the GIN index doesn’t directly understand @>.

Instead, it checks the operator class for arrays, finds strategy number 1, and uses the logic associated with it to execute the operation. 

What makes GIN "generalized" is its ability to handle different data types using separate operator classes. GIN doesn’t need to be redesigned every time a new data type is introduced. You simply define a new operator class that tells it how to work with that type. For example: 

  • One operator class teaches GIN how to work with arrays. 

  • Another operator class teaches it how to handle JSON. 

You can even define custom operator classes for your own data types. This flexibility makes GIN so powerful it can index a wide variety of data types without needing a complete overhaul. By separating the GIN index method (which is flexible) from the operator classes (which are data-type-specific), PostgreSQL can easily support multiple data types. This approach also allows new data types or operations to be added without having to modify the GIN index itself. 

The "inverted" aspect of GIN refers to its index structure, which differs fundamentally from the traditional B-tree index. Instead of mapping a single value to a specific row (as in B-trees), GIN creates a reverse mapping where each indexed value points to a list of rows that contain that value. This design allows GIN to store multiple representations of a single row because a single row can contain multiple indexed values.  

This "inverted" mapping structure allows GIN to handle complex queries exceptionally well, such as those involving multiple conditions (e.g., searching for rows where an array contains specific elements or where a document contains specific words). GIN combines these indexed entries to efficiently retrieve the relevant rows, even for queries that would be costly to execute with other types of indexes. 

Another way of explaining GIN indexes comes from a presentation by Oleg Bartunov and Alexander Korotkov at a PGConf.EU in Prague. They describe a GIN index like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:  

image

 In the table with index, ‘compensation’ points to pages 30 and 68, indicating that these pages contain the keyword ‘compensation’. Similarly, ‘accelerometers’ point to pages 5, 10, 25, 28, 30, 36, 58, 59, 61, 73, 74, and 68, showing that these pages include the keyword ‘accelerometers’. When searching for ‘compensation accelerometers’, page 30 is common to both keywords. Therefore, page 30 would be returned as the result. 

Before we move on, know that GIN indexes only support bitmap index scans (not index scan or index-only scan). This happens because a GIN index can list the same row pointer multiple times under different tokens (keys) that the field is broken into. To prevent returning duplicate rows, GIN forces queries through a bitmap index scan, which inherently deduplicates these pointers. Let's see an example.

Suppose we have a table ‘document’ where each document contains an array of keywords, and we have a GIN index on the ‘keywords’ column. 

document_id 

keywords 

1 

{apple, banana, cherry} 

2 

{banana, cherry, date} 

3 

{apple, cherry, date} 

4 

{banana, apple} 

Here, the GIN index stores a list of tokens (keywords) and the document IDs (row pointers) associated with those tokens. Each token (keyword) points to the rows where it appears.  Here is the internal GIN index representation:

  • apple → {1, 3, 4} (appears in documents 1, 3, and 4) 

  • banana → {1, 2, 4} (appears in documents 1, 2, and 4) 

  • cherry → {1, 2, 3} (appears in documents 1, 2, and 3) 

  • date → {2, 3} (appears in documents 2 and 3) 

Now, when a query is run to find documents that contain both ‘apple’ and ‘banana’, PostgreSQL uses the bitmap index scan to process the GIN index and ensure deduplication. Let's say we run the following query: 

SELECT * FROM documents WHERE keywords @> ARRAY ['apple', 'banana'];  

The operator @> checks if the array of keywords contains the specified values (in this case, 'apple' and 'banana'). PostgreSQL first retrieves the rows for the keywords apple and banana from the GIN index. 

  • For ‘apple’, the GIN index tells us the rows are {1, 3, 4}. 

  • For ‘banana’, the GIN index tells us the rows are {1, 2, 4}. 

PostgreSQL then creates bitmaps for both sets of results. A bitmap is a data structure that represents rows using a series of bits, where each bit corresponds to a unique row and is set to 1 for a match and 0 otherwise. For apple ({1, 3, 4}), the bitmap might look like: 

1 0 1 1 

Where: 

  • 1 indicates the document is included (e.g., documents 1, 3, and 4). 

  • 0 indicates the document is not included (e.g., document 2 is not included for apple). 

For banana ({1, 2, 4}), the bitmap might look like: 

1 1 0 1 

Where: 

  • 1 indicates the document is included (e.g., documents 1, 2, and 4). 

  • 0 indicates the document is not included (e.g., document 3 is not included for banana). 

PostgreSQL now ANDs the two bitmaps together to find the rows that match both conditions (contain both 'apple' and 'banana'). 

Bitmap for apple:      1 0 1 1 Bitmap for banana:   1 1 0 1 -------------------------------------------- Result bitmap:            1 0 0 1  

The result bitmap shows 1 for documents 1 and 4, meaning these are the only documents that contain both 'apple' and 'banana'. Documents 2 and 3 are excluded because they do not meet both conditions. 

The deduplication process occurs in the AND operation by ensuring that each document is included only once in the result. The final query result will be: 

document_id

keywords

1

{apple, banana, cherry}

4

{banana, apple}

These are the documents where both apple and banana appear in the keywords array.  Don’t be surprised when EXPLAIN always shows bitmap index/heap scans for your GIN indexes. Bitmap scans are ideal for deduplication as they can easily identify and combine duplicate row pointers, resulting in a clean set of relevant rows. 

Supported operators 

The standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators: 

  • <@ : If the left-hand array is contained within (or a subset of) the right-hand array)

  • @> : If the left-hand array contains all the elements of the right-hand array

  • = : If two arrays are equal

  • && : If two arrays have any elements in common

 

Implementation 

Let's say we have a table “inventory” with column “product” of type TEXT [ ] (string array). We also have a GIN index on the product column. This index will help optimize queries that involve the array operators like <@, @>, =, and &&. 

image

To ensure that PostgreSQL uses the GIN index instead of performing a sequential scan, you'll need a substantial number of rows. A sequential scan is more likely to occur if the dataset is small because PostgreSQL might decide that scanning the entire table is more efficient than using the index.  

image

This will insert 1,000 rows with a repetitive set of string values in the array column.  

<@ (Contained by) 

The <@ operator is used to check if the left-hand array is fully contained within the right-hand array. 

  • Left-hand array: This is the array stored in your database column (e.g., the product column in an inventory table). It acts as the "source" array being checked. 

  • Right-hand array: This is the array you specify in your query for comparison. It contains the elements you want to verify. 

PostgreSQL uses the <@ operator to compare the two arrays and determine if all the elements in the right-hand array exist within the left-hand array. 

This operator is particularly useful for filtering rows based on whether their array values match a desired subset. 

image

 

@> (Contains) 

The @> operator checks if the right-hand array is contained by the left-hand array, in other words, the right array must be a subset of the left array. 

image

The results will include arrays that have [laptop, server] as a subset. 

= (Equality) 

The = operator checks if two arrays are exactly equal. It compares both the size and the content of the arrays, meaning the elements must be in the same order. 

image

 

&& (Overlap) 

The && operator in PostgreSQL checks if two arrays overlap if they share at least one element. This is different from the equality operator because it doesn't require the arrays to be the same length or order; it only checks for common elements between the two arrays. 

image

  

Using ANY with GIN index

The ANY operator in PostgreSQL is used to compare a value against any element of an array. For example, value = ANY(array) checks if the value matches at least one element in the array. It's a shorthand for multiple OR conditions, making queries involving arrays more concise. So instead of writing  

value = element1 OR value = element2 OR value = element3   (where element1, element2, etc., are values in the array), you can simply write  

value = ANY(array) Note: PostgreSQL will not use the GIN index with ANY because the =ANY operator does not work with GIN indexes. The ANY construct can be used in combination with various operators, but it is not an operator itself. When ANY is used as: 

constant = ANY (array_expression) only indexes that support the = operator on array elements would qualify. In this case, GIN indexes are not suitable. To utilize GIN, you can use the <@ operator with a one-element array instead.

Suppose we have a table test with a column of type int[]. And we also have a GIN index in this column. 

image

If you try to run a search query using ANY, the query will likely result in a sequential scan. On the other hand, using @> will yield the same result but with a bitmap heap scan. 

image

Moreover, constant = ANY (array_expression) is not completely equivalent to array_expression @> ARRAY[constant]. Array operators return an error if any NULL elements are involved, while the ANY construct can handle NULL on either side, leading to different results for data type mismatches. 

Intarray Extension Challenges

In PostgreSQL, integer arrays (int [ ]) are a specialized data type that allows you to store multiple integer values within a single column of a table. Each element in the array is of the integer type, and the array itself can be one-dimensional or multi-dimensional. 

For example, a column of type int [ ] can store data like: 

  • {1, 2, 3} (a one-dimensional array) 

  • {{1, 2}, {3, 4}} (a two-dimensional array) 

 

Note: The intarray extension provides specialized functions and operators for integer arrays in PostgreSQL. However, these custom operators replace PostgreSQL's default array operators (like @>, <@). If you create a GIN index without accounting for this, PostgreSQL will ignore the index for queries using these operators and perform a sequential scan instead, significantly slowing down performance on large datasets.

For example, a query like that 

SELECT * FROM my_table WHERE codes @> array[123];

will skip the GIN index and rely on a sequential scan. 

To fix this, you need to specify the correct operator class for integer arrays when creating the GIN index. The intarray extension provides a dedicated operator class called gin__int_ops, which ensures that the index works with the custom operators introduced by intarray. 

Here’s how to create the index with the correct operator class: 

image

 

The Downside of GIN Indexes: Expensive Updates 

Unlike regular indexes that create one entry per row, GIN indexes can create multiple entries for a single row, making them versatile but challenging to update. Modifying a GIN index can be expensive, potentially affecting tens or hundreds of index entries for just one row. This overhead slows down INSERT and UPDATE operations, especially with large datasets.

To address this, PostgreSQL uses fastupdate (enabled by default) for GIN indexes. Rather than immediately updating the index, fastupdate batches multiple updates into a temporary "pending list" that stores new or updated entries. These "deferred updates" improve initial write performance but must eventually be processed, adding overhead later.

 This deferred data is flushed to the main index in one of three scenarios: 

  • When the size of the pending list exceeds the value of gin_pending_list_limit, PostgreSQL triggers a process to flush the accumulated entries from the pending list into the main GIN index. The gin_pending_list_limit is a configuration parameter in PostgreSQL that determines the maximum size of the pending list for a GIN index. By default, this limit is set to 4 MB. 

  • When an explicit call to gin_clean_pending_list is made, it manually flushes the contents of the pending list to the main GIN index. 

  • When autovacuum runs on the table with the GIN index, it automatically maintains tables and indexes in PostgreSQL by cleaning up dead tuples, analyzing table statistics, and flushing the pending list of GIN indexes as part of its routine maintenance at the end of the vacuum process. 

While this deferred flushing helps optimize performance initially, it can cause noticeable slowdowns during write operations if the pending list becomes too large. For example, after every Nth INSERT or UPDATE, when the pending list exceeds its limit, PostgreSQL will flush the data to the main index, resulting in a significant delay for that operation.   

Conclusion 

GIN indexes provide powerful capabilities for specialized queries and complex data types in PostgreSQL, but they must be used strategically, especially on write-heavy tables.

This exploration of GIN indexes highlights why we've invested heavily in optimizing indexing capabilities at Timescale. Our work on PostgreSQL indexes for compressed columnar data represents a significant advancement, allowing developers to enjoy the benefits of columnar compression while maintaining the flexibility of PostgreSQL's rich indexing capabilities.

TimescaleDB leverages PostgreSQL's native capabilities, including its support for GIN indexes, while adding several performance enhancements for time-partitioned data. This makes your queries faster so you can power analytics in real time.

Sign up for a free and fully managed PostgreSQL cloud on Timescale Cloud (no credit card required), or install TimescaleDB and see firsthand how we're pushing the boundaries of PostgreSQL.

On this page

    Try for free

    Start supercharging your PostgreSQL today.