TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Postgres overview

Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data Types

Postgres errors

How to Address ‘Error: Could Not Resize Shared Memory Segment’ 5 Common Connection Errors in PostgreSQL and How to Solve ThemHow to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound Exhaustion

Install postgres

How to Install PostgreSQL on LinuxHow to Install PostgreSQL on MacOS

Postgres clauses

Understanding FROM in PostgreSQL (With Examples)Understanding FILTER in PostgreSQL (With Examples)Understanding HAVING in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)Understanding LIMIT in PostgreSQL (With Examples)Understanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPUnderstanding DISTINCT in PostgreSQL (With Examples)Understanding WHERE in PostgreSQL (With Examples)Understanding OFFSET in PostgreSQL (With Examples)

Postgres joins

PostgreSQL Joins : A SummaryWhat Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?What Is a PostgreSQL Left Join? And a Right Join?PostgreSQL Join Type TheoryStrategies for Improving Postgres JOIN Performance

Postgres operations

A Guide to PostgreSQL ViewsData Partitioning: What It Is and Why It MattersWhat Is Data Compression and How Does It Work?Self-Hosted or Cloud Database? A Countryside Reflection on Infrastructure Choices

More

Understanding ACID Compliance Structured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQL

Postgres functions

Understanding PostgreSQL FunctionsPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUsing PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsUnderstanding PostgreSQL Date and Time FunctionsUnderstanding the Postgres string_agg FunctionUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUnderstanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL

Postgres statements

Understanding PostgreSQL SELECTUsing PostgreSQL UPDATE With JOINWhat Characters Are Allowed in PostgreSQL Strings?

Data analysis

What Is Data Transformation, and Why Is It Important?

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Nov 21, 2024

Understanding PostgreSQL User-Defined Functions

Written by Semab Tariq

PostgreSQL is highly regarded for its flexibility, making it an ideal choice for databases that need custom functionality. A key feature supporting this adaptability is user-defined functions (UDFs). With user-defined functions, developers can create custom routines to implement business logic in PostgreSQL, keeping processing close to the data for better efficiency.

In this blog post, we’ll cover the concepts of UDFs, explore the different types available, and provide clear examples for each. Additionally, we will look at how you can use UDFs effectively on our managed PostgreSQL platform.

What Are User-Defined Functions in PostgreSQL?

User-defined functions (UDFs) are custom functions created by users to perform specific operations in PostgreSQL. Unlike built-in functions, UDFs are developed based on individual requirements and are particularly useful for repetitive tasks or operations unique to a given workload. UDFs can be written in various languages, with PL/pgSQL (PostgreSQL’s procedural language) being one of the most common, alongside PL/Python, PL/Perl, and others.

Types of User-Defined Functions in PostgreSQL

PostgreSQL offers four types of functions:

  1. SQL functions: functions written directly in SQL

  2. Procedural language functions: functions written in languages like PL/pgSQL or PL/Tcl

  3. Internal functions: built-in functions provided by PostgreSQL

  4. C-language functions: functions written in C for advanced performance and customization

Let’s examine them.

SQL functions

In PostgreSQL, SQL functions are user-defined functions written using standard SQL syntax. They perform operations such as calculations, data manipulation, and retrieval and are helpful for creating reusable code blocks within the database. SQL functions are typically used for relatively simple operations because they are interpreted directly by the PostgreSQL SQL engine, making them fast and efficient. 

Suppose you have a sales table and want to calculate the total sales amount for a specific year and month. To achieve this, you can create an SQL function like the one below:

CREATE FUNCTION monthly_sales(year INT, month INT) RETURNS NUMERIC AS $$     SELECT SUM(amount)     FROM sales     WHERE EXTRACT(YEAR FROM sale_date) = year       AND EXTRACT(MONTH FROM sale_date) = month; $$ LANGUAGE SQL;

To call the monthly_sales function with the year 2024 and month 01, you would use the following SQL statement:

SELECT monthly_sales(2024, 01);

Procedural language (PL) functions

PostgreSQL allows you to write user-defined functions in various languages beyond SQL. 

These are known as procedural languages (PLs). 

When you create a function in a procedural language, PostgreSQL doesn't directly understand it. Instead, it uses a special handler to interpret and execute the function's code. This handler acts as a translator, either managing all aspects of the function, such as parsing and execution, or serving as a bridge between PostgreSQL and an external programming language.

For example, if you create a function in PL/Python, PostgreSQL will use the PL/Python handler to interpret and execute the Python code. The handler acts as a bridge, loading the necessary components to run Python code within the database environment.

Currently, PostgreSQL mainly offers four procedural languages as part of its standard distribution:

  • PL/pgSQL

  • PL/Tcl

  • PL/Perl, and 

  • PL/Python

The handlers for these languages enable PostgreSQL to execute code written in these external languages by translating function calls and managing execution through the respective language's runtime environment.

Handlers are implemented as C functions that are compiled and loaded into the PostgreSQL server dynamically whenever a function in a procedural language is called. This setup enables PostgreSQL to expand its capabilities by supporting additional programming languages.

There are many other procedural languages available that are not part of PostgreSQL core and can be used on demand. For example:

  • PL/Java

  • PL/R

  • PL/Rust

  • PL/sh

Check out the complete list of available procedural languages.

PL/pgSQL

PL/pgSQL is a powerful procedural language for the PostgreSQL database system that enables the creation of functions, procedures, and triggers with enhanced control structures, allowing for complex computations beyond what standard SQL functions can achieve. Unlike SQL functions, which execute individual SQL statements one at a time, PL/pgSQL enables you to group multiple queries and computations into a single block, significantly reducing interprocess communication and network overhead, thus improving performance. 

Example

Calculate the total sales for a given product in a specified year with PL/pgSQL functions:

CREATE FUNCTION total_sales_for_product(prod_id INT, sales_year INT) RETURNS NUMERIC AS $$ DECLARE     total_sales NUMERIC := 0;  -- Variable to store the total sales BEGIN     -- Calculate total sales for the specified product and year     SELECT SUM(amount) INTO total_sales     FROM sales     WHERE product_id = prod_id  -- Using the updated parameter name       AND EXTRACT(YEAR FROM sale_date) = sales_year;

    -- Return the total sales amount     RETURN total_sales; END; $$ LANGUAGE plpgsql;

You can call this function to get the total sales for a specific product and year as follows:

SELECT total_sales_for_product(101, 2024);

PL/Tcl

PL/Tcl enables developers to create functions using Tcl, a flexible and powerful scripting language. It merges the strengths of C function writing with Tcl's rich string processing capabilities, allowing for the development of complex database logic within a secure environment. A key benefit of PL/Tcl is its safe execution context; it operates within a controlled Tcl interpreter that restricts available commands. This safety measure ensures that users without special privileges cannot access the database or the operating system inappropriately.

Before using PL/Tcl, we need to ensure that the PL/Tcl language is available in the current database. Once it is installed, you can create functions written in Tcl.

CREATE EXTENSION pltcl;

Example

Create a PL/Tcl function that processes a list of integers, filters out even numbers, and returns the sum of the remaining odd numbers.

CREATE FUNCTION sum_odd_numbers_tcl(int[]) RETURNS int AS $$     set total 0     foreach num [split [string trim $1 "{}"] ","] {         if {([expr {$num % 2}]) != 0} {             set total [expr {$total + $num}]         }     }     return $total $$ LANGUAGE pltcl;

Execute the above function with:

SELECT sum_odd_numbers_tcl(ARRAY[1, 2, 3, 4, 5, 6, 7]) AS sum_of_odds;

PL/Perl

PL/Perl allows developers to write functions and procedures using the Perl programming language. To use PL/Perl in a specific database, you can simply run the query: 

CREATE EXTENSION plperl; Additionally, if PL/Perl is installed in the template1 database, it will automatically be available in all newly created databases. However, users who compile PostgreSQL from source must ensure that PL/Perl is enabled during the installation, while those using binary packages may find it in a separate subpackage.

Example

Suppose you have a table of customer reviews and want to analyze the sentiment of each review to determine whether it's positive, negative, or neutral. We'll create a simple PL/Perl function that uses a basic keyword-based approach to classify the sentiment of a review.

CREATE FUNCTION analyze_sentiment(review_text TEXT) RETURNS TEXT AS $$     my $text = lc $_[0];  # Convert text to lowercase for case-insensitive matching     my @positive_keywords = ('good', 'great', 'excellent', 'happy', 'love');     my @negative_keywords = ('bad', 'terrible', 'poor', 'sad', 'hate');     my $positive_count = 0;     my $negative_count = 0;

    foreach my $word (@positive_keywords) {         $positive_count++ while $text =~ /\b\Q$word\E\b/g;     }

    foreach my $word (@negative_keywords) {         $negative_count++ while $text =~ /\b\Q$word\E\b/g;     }

    return 'Positive' if $positive_count > $negative_count;     return 'Negative' if $negative_count > $positive_count;     return 'Neutral'; $$ LANGUAGE plperl;

You can run the sentiment analysis function to classify each review:

SELECT id, review_text, analyze_sentiment(review_text) AS sentiment FROM customer_reviews;

PL/Python

PL/Python enables developers to write PostgreSQL functions and procedures using the Python programming language. To install PL/Python in a specific database, you can execute the query:

CREATE EXTENSION plpython3u;

It’s important to note that PL/Python is an "untrusted" language, meaning it does not restrict user capabilities. Functions can perform any action that a database administrator could. Consequently, only superusers are permitted to create functions in PL/Python. If PL/Python is installed in the template1 database, it will automatically be available in any new databases created afterward.

Example

We'll create a function that performs data normalization on a numeric column. Data normalization is a common preprocessing step in data analysis and machine learning, where you scale the data to a specific range, typically between zero (0) and one (1).

CREATE FUNCTION normalize_sales() RETURNS TABLE (id INT, normalized_amount NUMERIC) AS $$     result = plpy.execute("SELECT MIN(sales_amount) AS min_val, MAX(sales_amount) AS max_val FROM sales_data")     min_val = result[0]['min_val']     max_val = result[0]['max_val']          if min_val == max_val:         raise Exception("Normalization is not possible with constant values")

    query = """     SELECT id, (sales_amount - %s) / (%s - %s) AS normalized_amount     FROM sales_data     """ % (min_val, max_val, min_val)         return plpy.execute(query) $$ LANGUAGE plpython3u; You can now call the function to normalize the sales data:

SELECT * FROM normalize_sales(); id | normalized_amount ----+------------------------ 1 | 0.00000000000000000000 2 | 0.25000000000000000000 3 | 0.50000000000000000000 4 | 0.75000000000000000000 5 | 1.00000000000000000000 (5 rows)

Internal functions

In PostgreSQL, internal functions are built-in functions implemented in C that are part of the PostgreSQL server's core functionality. These functions are compiled into the PostgreSQL binary and available without any additional setup or installation. 

They offer essential operations and tools commonly required for managing and manipulating the database. In other words, these functions provide a set of built-in features that make it easier for users to perform routine tasks within a PostgreSQL database.

Here are some key points about internal functions:

  • Since internal functions are implemented in C and run directly within the PostgreSQL server, they are highly optimized for performance.

  • These functions are always available in a PostgreSQL installation, as they are part of the core system.

  • Internal functions cover a wide range of operations, including mathematical calculations, string manipulations, date and time processing, and more.

Here are some internal functions in PostgreSQL that can simplify our daily tasks and make things easier.

postgres=# SELECT abs(-100) AS absolute_value, -- Mathematical function length('Hello, PostgreSQL!') AS string_length, -- String function now() AS current_timestamp, -- Date/Time function round(123.456, 2) AS rounded_value, -- Mathematical function upper('lowercase text') AS upper_case_text, -- String function date_trunc('day', now()) AS truncated_date, -- Date/Time function random() AS random_number -- Mathematical function;

-[ RECORD 1 ]-----+------------------------------ absolute_value | 100 string_length | 18 current_timestamp | 2024-11-11 13:34:37.155338+05 rounded_value | 123.46 upper_case_text | LOWERCASE TEXT truncated_date | 2024-11-11 00:00:00+05 random_number | 0.8308769807656364

Typically, internal functions are declared automatically during the database cluster initialization, but users can create additional aliases for these functions using the CREATE FUNCTION command with the LANGUAGE internal specification. For example, a user can create an alias for the square root function as follows:

CREATE FUNCTION square_root(double precision) RETURNS double precision AS 'dsqrt' LANGUAGE internal STRICT; Calculating the square root of a number:

SELECT square_root(25);

C-language functions

User-defined functions in PostgreSQL can be written in C or compatible languages like C++. These functions are turned into shared libraries that the PostgreSQL server can load when needed. This feature sets them apart from internal functions, though the coding style is similar. 

When creating a C function, you need to use the CREATE FUNCTION command and specify both the library file's name and the C function name inside it. PostgreSQL follows specific steps to find the library file, allowing for different file paths. Each C function must include a "magic block" to ensure it works with the correct version of PostgreSQL. 

There's also an optional function that can run right after the library loads. However, users must compile the C code into a shared library before using it in PostgreSQL, as the system won't compile it automatically.

The following example shows how to create and use a user-defined function in PostgreSQL using C. Let's say we want to calculate the factorial of a number.

First, you need to create a C file, say factorial.c, with the following:

#include "postgres.h" #include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(factorial);

Datum factorial(PG_FUNCTION_ARGS) {     int32 n = PG_GETARG_INT32(0);     int32 result = 1;

    for (int32 i = 1; i <= n; i++) {         result *= i;    }

    PG_RETURN_INT32(result); }

Next, you need to compile this code into a shared library. You can do this using the following command in your terminal (make sure PostgreSQL development packages are installed):

gcc -shared -o factorial.so -fPIC -I/usr/include/postgresql/17 -I/usr/include/postgresql/17/server factorial.c -L/usr/lib/postgresql/17/lib $(pg_config --ldflags)

Copy factorial.so to required directory: cp factorial.so /usr/lib/postgresql/17/lib/

Now, you can create the function in PostgreSQL using the CREATE FUNCTION command. Connect to your PostgreSQL database and run:

CREATE FUNCTION factorial(INT) RETURNS INT AS '$libdir/factorial', 'factorial' LANGUAGE C STRICT; You can now call the factorial function in your SQL queries. For example:

SELECT factorial(5); -- This will return 120


User-defined functions are highly versatile tools for adding custom logic to PostgreSQL, allowing for a wide range of uses. One powerful way to leverage UDFs is by automating their execution, particularly for tasks that need to run on a schedule. In these cases, a job scheduler becomes an invaluable tool.

Automating UDFs With a Job Scheduler

A job scheduler is a system component that automates the execution of tasks or jobs at specified intervals or in response to specific events, independent of user intervention. In the context of databases, a job scheduler facilitates the execution of database functions, procedures, and maintenance tasks at predetermined times, enhancing efficiency and reliability. 

By allowing these operations to run in the background, a job scheduler reduces the need for manual oversight, ensures timely data processing, and optimizes resource management.

Moreover, since databases are typically I/O bound and the CPU is often underutilized, job schedulers help make better use of available resources by scheduling tasks during periods of low CPU demand, thus improving overall system performance and throughput.

PostgreSQL doesn't have a native job scheduler feature built into its core. However, we wrote an article explaining why the PostgreSQL community has been hesitant to incorporate job scheduling directly into the core. 

TimescaleDB’s Job Scheduler

TimescaleDB is built on top of PostgreSQL and designed to handle large volumes of data. Certain background maintenance tasks, such as data archival, compression, and general storage management, are necessary for this to work efficiently. These tasks need to run automatically in the background, which is why a job scheduler is crucial for TimescaleDB.

Rather than creating a custom scheduler just for internal use, the Timescale team built a general-purpose job scheduler. This scheduler can be used for a variety of tasks, not just Timescale's internal needs, and it comes with a public API (application programming interface), which means other users can also take advantage of it.

The scheduler is part of TimescaleDB and is built to work within the PostgreSQL ecosystem. It allows users to schedule functions and procedures that can be expressed using PostgreSQL.

Here are some benefits of using Timescale's built-in job scheduler:

  • When you use physical streaming replication, the job schedule is also replicated. This means that when you switch over to a replica, your scheduled jobs will already be set up and ready to run without needing additional configuration.

  • You don’t need a separate plan to make your job scheduler highly available (HA). If your PostgreSQL system is running, your scheduled jobs will also continue to run, as they are inherently tied to the system's availability.

  • The scheduler can report the success or failure of jobs by logging the results in internal tables and the PostgreSQL log file. This makes it easy to track and troubleshoot the execution of scheduled tasks.

  • The jobs can perform essential administrative tasks, such as dropping tables or altering table structures, based on the system's needs. This allows for more automated management of your database.

  • When you install TimescaleDB, the job scheduler is already included and ready to use. You don’t need to set up or configure a separate scheduler system.

Integrating the TimescaleDB Job Scheduler With User-Defined Functions

Before scheduling a job, you need to create the UDF. If you're unsure about the type of UDF to create, refer to the previous section on UDF types. Once the UDF is created, you can register it with the job scheduler using the add_job function. 

Provide essential configurations such as the UDF name, how often it should be executed, and any additional parameters needed to run the UDF. 

Here’s the syntax to run a custom UDF every hour, where totalRecords is the name of the user-defined action:

SELECT add_job('process_hourly_reports', '1h');

If your function requires an additional parameter for execution, you can specify it using the following query, where hypertable is key and metrics is the value:

SELECT add_job('process_hourly_reports', '1h', config => '{"hypertable":"metrics"}');

Leveraging User-Defined Functions With the Job Scheduler for Efficient Data Management

Integrating user-defined functions with the job scheduler enables automated and efficient handling of data management tasks. Below are some key data management tasks we can automate.

Data aggregation

You can use a UDF to aggregate daily totals from main_table into daily_summary_table and schedule it to run once per day using TimescaleDB's job scheduler.

CREATE OR REPLACE FUNCTION daily_summary() RETURNS VOID LANGUAGE SQL AS $$     INSERT INTO daily_summary_table (day, total)     SELECT time_bucket('1 day', time), SUM(value)     FROM main_table     WHERE time_bucket('1 day', time) NOT IN (SELECT day FROM daily_summary_table)     GROUP BY 1; $$;

SELECT add_job('daily_summary', '1d');

Data archiving

Similar to data aggregation, you can create a UDF to archive and delete records older than one year from main_table into archive_table. You can then schedule the function to run daily using TimescaleDB's job scheduler.

CREATE OR REPLACE FUNCTION archive_old_data() RETURNS VOID LANGUAGE SQL AS $$ INSERT INTO archive_table (SELECT * FROM main_table WHERE time < NOW() - INTERVAL '1 year'); DELETE FROM main_table WHERE time < NOW() - INTERVAL '1 year';

$$;

SELECT add_job('archive_old_data', '1d');

Important consideration: DELETE is bad for performance

  • Deleting rows in a large table creates table bloat and triggers costly autovacuum operations.

  • This method is not scalable for managing time-series data as your PostgreSQL table grows.

How hypertables solve the problem

  • With hypertables, data is automatically partitioned into chunks based on time.

  • Instead of deleting rows, you can drop entire chunks using drop_chunks, which instantly frees up space.

  • Hypertables avoid autovacuum overhead.

Read how you can reduce table bloat in your large PostgreSQL tables.

Retention policies in TimescaleDB

  • You can automate this process by defining a retention policy with TimescaleDB, which periodically removes old data without manual intervention.

Learn more about how data retention policies can help you manage large datasets.

Maintenance tasks

We can automate maintenance tasks, such as reindexing, using UDFs. This query creates the simple_maintenance_task function, which resets PostgreSQL statistics and reindexes the entire database to improve performance and clear accumulated data. The function is then scheduled to run once a week.

CREATE OR REPLACE FUNCTION simple_maintenance_task() RETURNS VOID LANGUAGE plpgsql AS $$ BEGIN     PERFORM pg_stat_reset();  -- Resets statistics to clear any accumulated data     PERFORM reindex_database(current_database());  -- Reindexes the entire database END; $$;

SELECT add_job('simple_maintenance_task', '1w');

Monitoring the jobs

Monitoring your scheduled jobs is essential because it helps you keep track of their performance and ensures they run smoothly. With proper monitoring, you can quickly identify and debug any issues. Close monitoring allows you to fix problems early and improve the efficiency of your jobs, leading to better overall system performance.

Enhance logging level for job

Change the minimum log level shown to the client. Set it to DEBUG1.

SET client_min_messages TO DEBUG1;

To check the current level for client_min_messages, you can query: 

SHOW client_min_messages;

The available levels are typically:

  • debug5

  • debug4

  • debug3

  • debug2

  • debug1

  • log

  • notice

  • warning

  • error

Restarting the background workers

In TimescaleDB, background workers are essential for performing maintenance tasks like compression, continuous aggregate refreshes, and data retention. Occasionally, you might need to restart these workers to address specific issues, including the following:

  • Resolving locks on hypertables caused by stalled jobs

  • Refreshing processes that might be in an unexpected state

To restart the background workers, run the following query:

SELECT _timescaledb_functions.start_background_workers(); For TimescaleDB versions earlier than 2.12, use this query:

SELECT _timescaledb_internal.start_background_workers();

When you execute this query, the workers are stopped and restarted gracefully, sending a friendly SIGTERM signal to ensure tasks are paused safely without disrupting normal operations.

And what if PostgreSQL stops? See how TimescaleDB solves the problem of PostgreSQL locks.

Using Timescale's built-in internal view

timescaledb_information.jobs is a system view in TimescaleDB that provides information about scheduled jobs.

View all jobs managed by the job scheduler

You can see a list of all your currently registered jobs by querying the job scheduler like this:

tsdb=> SELECT * FROM timescaledb_information.jobs; -[ RECORD 1 ]-----+---------------------------------------- job_id | 3 application_name | Job History Log Retention Policy [3] schedule_interval | 1 mon max_runtime | 01:00:00 max_retries | -1 retry_period | 01:00:00 proc_schema | _timescaledb_functions proc_name | policy_job_stat_history_retention owner | postgres scheduled | t fixed_schedule | t config | {"drop_after": "1 month"} next_start | 2024-12-01 00:00:00+00 initial_start | 2000-01-01 00:00:00+00 hypertable_schema | hypertable_name | check_schema | _timescaledb_functions check_name | policy_job_stat_history_retention_check

Turning off a scheduled job

First, get the job ID from timescaledb_information.

To turn off the automatic scheduling of a job where 1000 is your job ID, use this query:

SELECT alter_job(1000, scheduled => false); And to completely delete the job, you can use the following:

SELECT delete_job(1234);

View jobs triggered by user-defined actions

If you want to see what jobs are triggered by user-defined actions, you can use the following query:

SELECT * FROM timescaledb_information.jobs where application_name like 'User-Define%';

Get job success/failure information for a specific hypertable

Use this query to retrieve job statistics for a specific hypertable named test_table in TimescaleDB. It pulls details from the timescaledb_information.job_stats view, which stores metrics about scheduled jobs managed by TimescaleDB’s job scheduler.

SELECT job_id, total_runs, total_failures, total_successe FROM timescaledb_information.job_stats WHERE hypertable_name = 'test_table';

See information about recent job failures

The following query retrieves error details for jobs managed by the TimescaleDB job scheduler from the timescaledb_information.job_errors view. This view brings information about errors encountered during job execution, providing valuable insights for debugging and maintaining the job scheduling system.

SELECT job_id, proc_schema, proc_name, pid, sqlerrcode, err_message from timescaledb_information.job_errors ;

Conclusion

User-defined functions (UDFs) in PostgreSQL are powerful tools that allow you to create custom operations to enhance data processing capabilities. When combined with Timescale’s job scheduler, you can significantly improve the execution of these functions at specified time intervals. Additionally, by monitoring your queries, you can track various aspects of your jobs, ensuring optimal performance and reliability.

On this page