Implementation of Change Data Capture Using TimescaleDB for Shoplogix Industrial Monitoring Services

Posted by

Carlos Olmos

Austin Schaffer

Rob Cook

This is an installment of our Community Member Spotlight series, in which we invite our customers to share their work, spotlight their success, and inspire others with new ways to use technology to solve problems.

Introduction

At Shoplogix IMS, we build Industrial IoT (IIoT) solutions, integrating software and hardware designed around batteryless wireless sensors. These sensors continuously monitor critical manufacturing assets, delivering real-time insights. As our platform evolved and scaled, the need to propagate real-time database changes—like sensor activations, configuration updates, or asset status transitions—across different systems became essential. Such capabilities power interactive dashboards, automated workflows, and analytics pipelines.

To address these requirements, we developed a streamlined Change Data Capture (CDC) pipeline utilizing TimescaleDB, PostgreSQL triggers, and Kafka Connect. This solution allowed us to stream database changes into Kafka seamlessly, without modifying existing table schemas or relying on PostgreSQL's write-ahead logs. In this post, we'll explore the details of our CDC architecture, highlight TimescaleDB's strengths, and demonstrate how we've achieved continuous synchronization across downstream systems with minimal overhead.

Proof of Concept

To better explain the concepts of our approach, we have prepared a public repository that mimics the implementation laid out in this article. This repository provides a fully containerized proof of concept (PoC) demonstrating a Change Data Capture (CDC) pipeline using TimescaleDB, Kafka, Kafka Connect, and a JDBC Source Connector.

https://github.com/carlospsikick/timescale-cdc

Change Data Capture 101

Change Data Capture (CDC) captures database operations (INSERT, UPDATE, DELETE) at the row level, converting these changes into structured events for immediate downstream consumption. Unlike traditional batch processes, CDC supports real-time data propagation.

While tools like Debezium rely on transaction logs, requiring elevated privileges or logical replication, our approach employs PostgreSQL triggers. This method offers full control, simplicity, and compatibility with managed environments, seamlessly integrating with TimescaleDB.

The Shoplogix IMS IIoT Ecosystem

Our data model comprises sensor-generated time-series data (e.g., temperature, pressure, vibration) and metadata describing assets (such as types, serial numbers, and locations). Combined, these data streams offer a comprehensive view of the operational state of industrial assets, enabling visualization, anomaly detection, and predictive analytics.

As our business expands, we continuously evolve our ecosystem to support new sensor types, analytics, and integrations with external systems. Central to these developments is Event Streaming, which facilitates data transformations and redirections without impacting the broader infrastructure. Change Data Capture (CDC) is vital in this context, translating row-level database changes into real-time events that downstream systems can immediately process.

Implementing CDC with Timescale

In our CDC pipeline the database changes triggered by APIs or microservices are captured in real time using database triggers and logged into a dedicated CDC times series schema. A Kafka Connect JDBC Source Connector polls these CDC Tables and Views, and streams the captured changes as events into Kafka topics. These event topics can then be consumed by various subscribers, enabling real-time data propagation across microservices, analytics platforms, and external systems in a decoupled and scalable manner.

Let’s take a closer look:

A typical schema holds the tables, hypertables, and views that the APIs and microservices interact with.
Metadata Tables and Time-Series Hypertables are the primary sources of change.
Each table has associated PostgreSQL triggers that fire on INSERT, UPDATE, or DELETE events.
These triggers invoke a centralized CDC PSQL Function that resides in the CDC schema.

The CDC Function processes row-level changes and inserts structured change records into a dedicated CDC Log time-series Hypertable.
This log acts as an append-only ledger of all data modifications across different tables.
The CDC Log Hypertable is the canonical source of truth for all captured changes.
Using a TimescaleDB Hypertable is important because it provides all the time-series data functionality: compression, retention policies, automatic partitioning, etc.
One or more CDC Views (e.g., View A and View B) expose subsets of the log tailored to specific use cases or consuming systems.
Kafka Connect runs a JDBC Source Connector that periodically polls these CDC Views.
Each view is mapped to a corresponding Kafka topic (e.g., Topic A and Topic B).
This allows downstream consumers to subscribe to granular, schema-specific change events.
Once data is published to Kafka, it’s available in real time to any event-driven microservice, stream processor, or analytics pipeline.
This architecture enables low-latency, decoupled integration between databases and consumers in an event-streaming ecosystem.

The CDC Function

cdc.change_data_capture()

This function serves as the core mechanism for capturing data changes—inserts, updates, and deletes—from any table that invokes it via a trigger.

Trigger Scope: Designed to be attached to a table with AFTER INSERT, AFTER UPDATE, or AFTER DELETE triggers.
Event Logging: When a row is inserted, updated, or deleted, the function executes. It logs the following into cdc.event_log:
Return Value: Returns NEW, which is standard for triggers on INSERT or UPDATE, ensuring that the row modification proceeds.

Note: To capture events from a TimescaleDB Hypertable we have to change the function a little bit, but the functionality is the same. See the code repo for more details.

The CDC Event Log Hypertable

The cdc.event_log is the central audit and event tracking table. It stores detailed, structured records of every change captured by the cdc.change_data_capture() trigger function.

Unifies change events across all tables into a single schema
Is ideal for Kafka Connect JDBC polling

Defining the cdc.event_log table as a Timescale hypertable has significant performance and scalability benefits for CDC workloads. It enables efficient time-based partitioning, making incremental polling and historical queries faster. Hypertables are optimized for high-throughput inserts, ideal for the append-only nature of CDC logs. Timescale also offers native features like automated data retention, compression through columnar storage, and real-time analytics capabilities, allowing you to manage storage effectively and build responsive downstream applications. Importantly, this setup integrates seamlessly with tools like Kafka Connect, without altering your connector configuration.

Column Description:

ts (timestamp with time zone): The exact time the change occurred. This is useful for ordering events chronologically and for incremental polling in CDC pipelines.
schema_name (text): The name of the schema where the change originated. Helps identify the source context in multi-schema databases.
table_name (text): The name of the table where the row was modified. Enables routing of events to topic-specific or table-specific consumers.
operation (text): The type of database operation: 'INSERT', 'UPDATE', or 'DELETE'. Used to interpret the semantics of the before and after fields.
before (json): A JSON snapshot of the row before the change (used for UPDATE and DELETE). Null on INSERT.
after (json): A JSON snapshot of the row after the change (used for INSERT and UPDATE). Null on DELETE.
event_id (bigint): A unique identifier for the event. This is generated by a SEQUENCE and is essential for incremental polling (e.g., using event_id > last_seen_id).

CDC Events Views

Creating views like cdc.event_log_assets enables clean separation of events from a shared CDC log into table-specific or domain-specific streams. These views simplify Kafka topic routing. Mapping each view to a unique Kafka topic reduces downstream filtering, improving performance by narrowing the data scope for polling connectors. These views also provide a flexible layer for schema shaping, enrichment, and transformation, making the CDC pipeline more modular, scalable, and easier to maintain.

CDC Trigger

To begin capturing change events for a particular table, all we need to do is add the trigger function to its definition. For example, to monitor `dataschema.assets`:

Key Takeaways

PostgreSQL triggers capture changes into a CDC schema.
A central function logs structured changes to a TimescaleDB hypertable.
CDC Views provide tailored change logs.
Kafka Connect polls these views, streaming changes to Kafka. This enables low-latency, decoupled integration.

Resources

Date published

May 22, 2025

Posted by

Carlos Olmos

Austin Schaffer

Rob Cook

Get Started Free with Tiger CLI

Date published

May 22, 2025

Posted by

Carlos Olmos

Austin Schaffer

Rob Cook

Get Started Free with Tiger CLI