TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Guide to PostgreSQL Performance

Schema design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

Performance tuning

PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database

Partitioning

Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning

Database design and modeling

An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design

Database replication

A PostgreSQL Database Replication Guide

Data analysis

A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL

Data transformation

How to Use PostgreSQL for Data Transformation

Data aggregation

How PostgreSQL Data Aggregation Works

Scaling postgres

A Guide to Scaling PostgreSQLBuilding a Scalable Database

Database security

Guide to PostgreSQL SecurityWhat Is Audit Logging and How to Enable It in PostgreSQL

Data management

Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres

Database operations

Guide to PostgreSQL Database Operations

JSON

How to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL Perspective

Query optimization

Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Optimizing Array Queries With GIN Indexes in PostgreSQLRecursive Query in SQL: What It Is, and How to Write One

Database tools and libraries

How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python

Database indexes

How to Monitor and Optimize PostgreSQL Index Performance

Database backups and restore

A Guide to pg_restore (and pg_restore Example)

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Mar 10, 2025

Handling Large Objects in Postgres

Written by Warda Bibi

Postgres is designed to handle various types of data efficiently, making it a powerful choice for businesses and developers. One common application is storing large files, such as audio, images, videos, and logs, along with related metadata like titles and descriptions. While these files can be stored in a separate system, keeping them within the database ensures data integrity, consistency, and security.

Under the hood, these multimedia files are stored as binary, which allows efficient handling of raw binary data. However, these files can be very large, and databases impose size limits on how much data can be stored in a single tuple. This limit varies depending on the database system.

For example, in Oracle, a dedicated large object (LOB) data type is used to store such large files:

  • BLOB (Binary Large Object): stores binary data like images, videos, and PDFs

  • CLOB (Character Large Object): for large text-based content

These data types can store up to 128 TB and store data in a separate LOB segment outside the table, optimizing performance for large objects.

So, how does Postgres handle storing large files like images, videos, or logs? In Postgres, storage units are called pages, and pages have a fixed size of 8 kB by default. This page size works great for most data, but when it comes to larger files, it needs some special mechanism to keep things running smoothly.

In this blog, we will explore these mechanisms and discuss best practices for storing large data efficiently.

How to Handle Large Objects in Postgres

There are mainly two ways to store large objects in Postgres: You can use existing data types, such as BYTEA for binary large objects or text for character-based large objects, or you can use pg_largeobject.

Existing data types

BYTEA

BYTEA stands for byte array. It is a binary data type that stores variable-length binary strings or byte sequences. The maximum size of a BYTEA column is 1 GB, which means we can only store binary data up to 1 GB in a single BYTEA column.

This data type is suitable for storing images, multimedia files, and raw binary data. Unlike text or character data types, BYTEA stores raw binary data without any encoding.

The following example shows how to define a column with the BYTEA data type:

First, create a table called binary_data to store binary strings:

CREATE TABLE binary_data(     id INT,     data BYTEA );

Inserting data into a BYTEA column requires converting the binary data into a format Postgres can understand. Use either hex format (\x) or escape format (\) for storing binary data. The hex format is preferred for its readability and ease of use.

INSERT INTO binary_data(id,data) VALUES (1,'\x012345');

Third, retrieve data from the BYTEA column:

SELECT * FROM binary_data; Postgres returns the binary data in the format it was stored. 

Output:

id  |   data ----+----------   1 | \x012345

TEXT

The TEXT data type is used to store large text-based content such as logs, documents, and textual data. It allows the storage of variable-length character data. Similar to BYTEA, the maximum size of a text column is 1 GB.

The following example shows how to define a column with the TEXT data type:

CREATE TABLE text_data (     id SERIAL PRIMARY KEY,     content TEXT );

TOAST (The Oversized-Attribute Storage Technique):

Although TEXT and BYTEA have a size limit of 1 GB, Postgres has a default page size of 8 kB. Furthermore, a single tuple in a Postgres table cannot span multiple pages, as the relational database likes to ensure it can store multiple tuples on a single page. If tuples are too large, fewer tuples fit on each page, leading to increased I/O operations and reduced performance.

This means that when the combined size of all fields in a tuple exceeds approximately 2 kB, Postgres automatically offloads the excess data using TOAST (The Oversized-Attribute Storage Technique).

TOAST refers to the automatic mechanism that Postgres uses to efficiently store and manage values in Postgres that do not fit within a page. TOAST handles this in two primary ways:

  • Compression: Postgres can compress the large field values within the tuple to reduce their size using a compression algorithm. By default, if compression is sufficient to bring the tuple's total size below the threshold, the data will remain in the main table, albeit in a compressed format.

  • Out-of-line storage: If compression alone isn't effective enough to reduce the size of the large field values, Postgres moves them to a separate TOAST table. This process is known as "out-of-line" storage because the original tuple in the main table doesn’t hold the large field values anymore. Instead, it contains a "pointer" or reference to the location of the large data in the TOAST table.

Almost every table we create has its own associated (unique) TOAST table, which may or may not ever be used, depending on the size of the rows we insert. A table with only fixed-width columns like integers may not have an associated toast table. All of this is transparent to the user and enabled by default.

Read more about TOAST and why it isn't enough for data compression.

While TOAST efficiently handles large values, it has a practical limit of 1 GB per column. Furthermore, if you need to update a large object stored in a TOAST column, Postgres rewrites the entire object, which can be inefficient for large files. This is where Postgres' pg_largeobject comes into play. It is designed specifically for managing very large objects and provides a more flexible and efficient way to store, access, and manipulate massive datasets.

Pg_largeobject

In Postgres, pg_largeobject is a system table used to store and manage large objects of binary data. Each large object is broken into chunks or “pages” small enough to be conveniently stored as rows in pg_largeobject.

Each large object has an entry in the system table pg_largeobject_metadata. Using a read/write API, large objects managed using pg_largeobject can be created, modified, and deleted. pg_largeobject allows storing large objects up to four TB.

Here is the schema of the two system tables designed for pg_largeobject.

    Table "pg_catalog.pg_largeobject"  Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description  --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------  loid   | oid     |           | not null |         | plain    |             |              |   pageno | integer |           | not null |         | plain    |             |              |   data   | bytea   |           | not null |         | extended |             |              |  Indexes:     "pg_largeobject_loid_pn_index" PRIMARY KEY, btree (loid, pageno) Access method: heap

Here:

  • loid is the object ID for identifying the large object.

  • pageno is the sequential page number of the chunk.

  • data is the actual binary content of that chunk (stored as BYTEA). Data is stored row by row, with each page linked to its respective object via loid.

                          Table "pg_catalog.pg_largeobject_metadata"   Column  |   Type    | Collation | Nullable | Default | Storage  | Compression | Stats target | Description  ----------+-----------+-----------+----------+---------+----------+-------------+--------------+-------------  oid      | oid       |           | not null |         | plain    |             |              |   lomowner | oid       |           | not null |         | plain    |             |              |   lomacl   | aclitem[] |           |          |         | extended |             |              |  Indexes:     "pg_largeobject_metadata_oid_index" PRIMARY KEY, btree (oid) Access method: heap

Here:

  • oid is the unique identifier for each large object.

  • lomowner represents the owner (role/user) of the large object identifying which user created or owns the object.

  • lomacl represents the Access Control List (ACL) which is used to store permissions for who can read/write/delete the large object.

Pg_largeobject interfaces

Postgres' pg_largeobject is like a file system inside your database, where you can create, open, read, write, and even seek (jump to a specific part of a file) just like you would with files on your computer.

To make this easy, Postgres offers a set of tools (or "interfaces") through its libpq library, which is the standard way for applications to talk to Postgres. These tools let us work with large objects in a way that feels familiar if you have ever worked with files in a programming language or on a Unix-like system. Below are all the interfaces related to pg_largeobject:

postgres=# \dfS lo*                                 List of functions    Schema   |     Name      | Result data type |    Argument data types    | Type  ------------+---------------+------------------+---------------------------+------  pg_catalog | lo_close      | integer          | integer                   | func  pg_catalog | lo_creat      | oid              | integer                   | func  pg_catalog | lo_create     | oid              | oid                       | func  pg_catalog | lo_export     | integer          | oid, text                 | func  pg_catalog | lo_from_bytea | oid              | oid, bytea                | func  pg_catalog | lo_get        | bytea            | oid                       | func  pg_catalog | lo_get        | bytea            | oid, bigint, integer      | func  pg_catalog | lo_import     | oid              | text                      | func  pg_catalog | lo_import     | oid              | text, oid                 | func  pg_catalog | lo_lseek      | integer          | integer, integer, integer | func  pg_catalog | lo_lseek64    | bigint           | integer, bigint, integer  | func  pg_catalog | lo_open       | integer          | oid, integer              | func  pg_catalog | lo_put        | void             | oid, bigint, bytea        | func  pg_catalog | lo_tell       | integer          | integer                   | func  pg_catalog | lo_tell64     | bigint           | integer                   | func  pg_catalog | lo_truncate   | integer          | integer, integer          | func  pg_catalog | lo_truncate64 | integer          | integer, bigint           | func  pg_catalog | lo_unlink     | integer          | oid                       | func  pg_catalog | lowrite       | integer          | integer, bytea            | func

Note: When using the interface to manipulate a large object, it must happen within an SQL transaction block. This is because the system uses file descriptors to manage large objects, and these file descriptors are only valid for the duration of a transaction. 

Creating a large object

To store a large object, we first need to create it using lo_create. The function generates a new large object and returns an OID (object identifier), which uniquely identifies the object in the database. For example:

SELECT lo_create(12345);

This creates a large object with an OID of 12345. If the OID is omitted, Postgres assigns a system-generated OID.

 lo_create  -----------      12345 An entry is made in pg_largeobject_metadata, storing information such as ownership and access privileges for this object. The actual binary data is stored in pg_largeobject, referenced by the OID. At this stage, the large object contains no data so the pg_largeobject system table has no entry.

postgres=# select * from pg_largeobject_metadata;   oid  | lomowner | lomacl  -------+----------+--------  12345 |       10 |  postgres=# select * from pg_largeobject;  loid | pageno | data  ------+--------+------ (0 rows)

Reading/writing data

After creating a large object, we need to write data into it. This is typically done using a combination of lo_open, lo_write, and lo_close inside a transaction block.

Opening a large object is necessary because Postgres treats large object operations like file handling. We open the object to obtain a file descriptor, perform read/write operations using that file descriptor, and close it afterward. Furthermore, Postgres requires large object read/write operations to be performed inside a transaction block. Here are the syntax and descriptions of these functions:

lo_open(OID, MODE)

This opens a large object. The MODE parameter determines whether the object is opened for one of these purposes:

  • Reading (INV_READ = 0x40000 or 262144 in decimal)

  • Writing (INV_WRITE = 0x20000 or 131072 in decimal)

  • Both (INV_READ | INV_WRITE = 0x60000 or 393216 in decimal)

On success, this function returns a non-negative large object descriptor, which is further used with functions like lo_read, lo_write, lo_lseek, lo_tell, lo_truncate, and lo_close. The descriptor remains valid only for the duration of the current transaction. 

lowrite(fd, content)

This writes data to a large object identified by file descriptor fd. It returns the number of bytes actually written. In the event of an error, the return value is -1.

loread(fd, length)

This reads length bytes of data from a large object.

lo_close(fd)

This closes the large object file descriptor (fd). On success, lo_close returns zero. In the case of error, the return value is -1. Any large object descriptors that remain open at the end of a transaction will be closed automatically.

Example:

Let's go step by step to manipulate a large object: postgres=# BEGIN; This starts a transaction block, meaning all the subsequent operations will either commit (save) together or roll back if an error occurs.

postgres=*# SELECT lo_open(12345, 131072);  lo_open  ---------        0

This query returns 0, which is a file descriptor-like number assigned to this large object.

This descriptor (0) will be used in subsequent operations.

postgres=*# SELECT lowrite(0, 'Hello, World!');  lowrite  ---------       13

This query writes "Hello, World!" into the large object identified by descriptor 0. The return value 13 represents the number of bytes written (since "Hello, World!" has 13 characters).

postgres=*# SELECT loread(0, 12);            loread            ----------------------------  \x48656c6c6f2c20576f726c64

This query reads 12 bytes from the large object associated with file descriptor 0. The result is in hexadecimal (hex) format. If you want to display the text instead of hex, you can cast it to text:

postgres=*# SELECT convert_from(loread(0, 12), 'UTF8');      convert_from  --------------  Hello, World postgres=*# SELECT lo_close(0);

This releases the descriptor but does not delete the object from the database.

postgres=*# COMMIT;

This permanently saves all operations inside the transaction.

lo_lseek

This function moves the current location pointer for the large object descriptor identified by fd to the new location specified by offset.

Syntax:

lo_lseek(fd, offset, whence);

Here:

  • fd is a file descriptor returned by lo_open.

  • offset represents the number of bytes to move.

  • whence specifies where to seek from:

    • 0 (SEEK_SET) means move from the beginning.

    • 1 (SEEK_CUR) means move relative to the current position.

    • 2 (SEEK_END) means move from the end.

The return value is the new location pointer or -1 on error. 

For example:

SELECT lo_lseek(0, 7, 0); This moves the cursor seven (7) bytes from the beginning. Now your read/write operation will read/write from the new position.

SELECT loread(0, 5); If the large object contains "Hello, World!", this will read:

'World'

Note: When dealing with large objects that might exceed 2 GB in size, instead use:

 lo_lseek64(fd, offset, whence); This function behaves like lo_lseek, but it can accept an offset larger than 2 GB and/or deliver a result larger than 2 GB. Note that lo_lseek will fail if the new location pointer is greater than 2 GB.

Updating a large object

One major advantage of using pg_largeobject is that it allows us to modify a specific portion of a large object instead of replacing the entire content. This is possible because Postgres provides chunk-based access, enabling efficient updates.

The following example demonstrates how to modify an existing large object by seeking a specific position and writing new data over it. Instead of replacing the entire object, we update only a portion of it.

begin; select lo_open(12345, x'60000'::int); select lo_lseek(0, 7, 0); select lowrite(0, ', + large object.\012'); Select lo_close(0) commit; Explanation:

  • Start a transaction to ensure atomicity.

  • Open the large object (12345) in read-write mode (x'60000'::int = INV_READ | INV_WRITE).

  • Seek to byte position 7 using lo_lseek(0, 7, 0), where 0 (SEEK_SET) moves the cursor from the beginning. That means any write operation overwrites content starting from this position.

  • Write new data at the current position using lowrite().

  • Commit the transaction, saving the modifications.

By default, the BYTEA output is hexadecimal (\x48656c6c6f). To make it more readable:

SET bytea_output = 'escape'; Now to see the updated object : postgres=# select * from pg_largeobject;  loid  | pageno |             data              -------+--------+------------------------------  12345 |      0 | Hello, , + large object.\012 This demonstrates partial modification of a large object without replacing the entire content. This is particularly useful when doing one of the following:

  • Appending additional metadata to a stored binary file

  • Fixing or updating content without rewriting the entire object

  • Overwriting specific sections of an image, video, or other binary data

Importing a large object

We can also import an external file into pg_largeobject. The syntax for that is:

lo_import(File_Path)

This will create an object, store the file’s content in it, and return the object's OID.

To properly test the use case, let's create a large file using the following command:

dd if=/dev/urandom of=/tmp/large_file.bin bs=1M count=1024

This creates a random binary file (large_file.bin) of 1 GB. Now import this file in Postgres large object:

SELECT lo_import('/tmp/large_file.bin'); lo_import  -----------      24600 This command stores the file in pg_largeobject with OID 24600. Postgres will split the file into 2 kB chunks for efficient retrieval and modification. To confirm this, use: postgres=# SELECT loid, pageno, length(data) FROM pg_largeobject WHERE loid = 24600 LIMIT 10;  loid  | pageno | length  -------+--------+--------  24600 |      0 |   2048  24600 |      1 |   2048  24600 |      2 |   2048  24600 |      3 |   2048  24600 |      4 |   2048  24600 |      5 |   2048  24600 |      6 |   2048  24600 |      7 |   2048  24600 |      8 |   2048  24600 |      9 |   2048

Explanation:

  • The file is chunked into 2048-byte (2 kB) pages. 

  • A 1 GB file has ~524,288 pages (1 GB / 2 kB = 524,288).

So, instead of loading 1 GB into memory, we can read a portion using:

SELECT encode(loread(lo_open(24600, 262144), 50), 'hex'); ​​                                                encode                                                 -------------------------------------------------------------------------------------------  173c11a19ded7b79c5a4b01a3ae4683c5a5ba3a44714e8dbaec50db73c147ccb4ee9791c8f9cabcb406bec5a0aaf99b38fda

Here:

  • lo_open opens the file (OID 24600) in read mode (262144 = INV_READ).

  • loread reads 50 bytes and encodes in hex for readability.

  • This is useful for streaming large data instead of loading it all at once.

To demonstrate efficient modification, let’s overwrite some bytes at page 100.

BEGIN; SELECT lo_open(24600, 131072 | 262144);  -- Open in Read-Write mode SELECT lo_lseek(0, 204800, 0);  -- Move to byte 200KB (100th page) SELECT lowrite(0, 'MODIFIED_DATA_HERE'); COMMIT;

Here:

  • Seek 200 kB into the file ( 2048-byte chunks).

  • Overwrite only a small part instead of replacing 1 GB.

Exporting the large object

After working with a large object (modifying or appending data), we can export it back to the filesystem using:

SELECT lo_export(24600, '/tmp/lo_test_new.txt'); lo_export  -----------          1 This writes the large object with OID 24600 to the file /tmp/lo_test_new.txt.The return value (1) indicates success. Now, let's check the exported file:

ls -l /tmp/lo_test_new.txt -rw-r--r--  1 user  wheel  1073741824 Feb 21 00:31 /tmp/lo_test_new.txt The exported file contains the modified data, including any changes made while it was stored in pg_largeobject.

Deleting a large object

When a large object is no longer needed, it should be deleted to free up storage space. The lo_unlink(OID) function removes the large object from the pg_largeobject system catalog.

For example:

SELECT lo_unlink(24600)  lo_unlink  -----------          1

This command deletes the large object with OID 24600. A return value of 1 indicates that the deletion was successful.

Conclusion

Postgres offers various tools to handle large files, each tailored to different needs and scenarios. Use BYTEA for smaller binary data, rely on TOAST for moderate-sized objects, and turn to pg_largeobject when working with extremely large files that demand advanced features like streaming or partial updates. Developers who understand these mechanisms will be able to easily design databases that are both performant and scalable.

If you're working with massive but structured datasets, TimescaleDB's hypercore columnar compression is designed for structured, time-series, and analytical workloads rather than raw file storage. By leveraging column-oriented storage and time-based chunking, it achieves superior compression ratios (often 90 % or more) while maintaining compatibility with standard SQL queries. TimescaleDB's compression works transparently, reducing storage costs and improving query performance without additional complexity.

For developers managing structured datasets at scale, TimescaleDB provides a more efficient and scalable alternative to traditional Postgres large object storage mechanisms. To try it yourself, create a free account today.

On this page