Category: All posts
Dec 18, 2025

Posted by
Damaso Sanoja
Consider an AI-powered financial analytics platform that processes real-time stock price data while simultaneously searching for similar trading patterns across historical data embeddings. Or an IoT monitoring system tracking sensor metrics over time while running anomaly detection through vector similarity search. These AI applications share a common architecture challenge: they require both efficient time-series storage and vector similarity search to operate at production scale.
Tiger Data's timescaledb-ha Docker image offers an elegant solution by unifying time-series and vector operations within a single PostgreSQL instance, with Kubernetes deployment scripts and Patroni-based clustering for scalability and high availability. But what if your organization requires declarative cluster management through native GitOps workflows rather than script-based orchestration?
This tutorial addresses that requirement by building a custom Docker image from CloudNativePG's PostgreSQL base that integrates TimescaleDB, pgvector, and pgvectorscale, enabling fully declarative, operator-managed clusters on Kubernetes.
You'll discover why TimescaleDB official Docker images are incompatible with CloudNativePG, build a custom image that resolves these incompatibilities, and deploy a demonstration-scale proof-of-concept cluster to validate the solution.
The timescale/timescaledb-ha image is purpose-built for Patroni-based clustering and delivers high availability through battle-tested PostgreSQL patterns. For environments where Patroni's script-driven orchestration aligns with operational workflows, timescaledb-ha provides a comprehensive solution including TimescaleDB, vector extensions, PostGIS, Kubernetes deployment helpers, and many other tools.
CloudNativePG takes a different approach: operator-driven PostgreSQL management where cluster lifecycle operations execute through Kubernetes Custom Resource Definitions (CRDs) rather than external scripts. The operational differences become evident when managing multiple PostgreSQL clusters through native GitOps workflows:
Aspect | timescaledb-ha (Patroni) | CloudNativePG |
Cluster Management | Script-driven via Patroni | Kubernetes-native operator with CRDs |
Configuration | Environment variables + ConfigMaps | Fully declarative YAML manifests |
Lifecycle Operations | Manual coordination through Patroni | Automated rolling upgrades, failover |
GitOps Integration | Initial deployment only, runtime changes bypass version control | Complete infrastructure-as-code workflow |
Extension Customization | Pre-bundled versions only | Control extension versions and the build process |
Given these operational differences, deploying TimescaleDB with CloudNativePG requires understanding why the timescaledb-ha image, despite bundling all necessary extensions, cannot work with CloudNativePG's operator-driven model.
TimescaleDB’s image architecture reflects Patroni's operational requirements: custom entrypoint scripts manage cluster initialization, data directories reside at /home/postgres/pgdata instead of PostgreSQL's standard /var/lib/postgresql/data, and the initialization process expects root-level access before transitioning to the Postgres user.
CloudNativePG's operator requires PostgreSQL official image patterns: standard data directory paths, init container-based bootstrapping, and postgres-user ownership from container startup. These architectural differences make the timescale-ha image incompatible with CloudNativePG's reconciliation model: the operator cannot initialize clusters when the base image expects different initialization sequences, data paths, and permission models.
Building a custom image from CloudNativePG's PostgreSQL base that adds TimescaleDB and vector extensions resolves these architectural conflicts while preserving full operator compatibility. This approach requires maintaining a build pipeline for extension updates. Still, it provides complete control over extension versions and ensures the image adheres to PostgreSQL's official conventions, which CloudNativePG's operator expects for initialization, configuration management, and lifecycle operations.
CloudNativePG's PostgreSQL base images don't include TimescaleDB or vector extensions, requiring compilation from source. Our build uses a four-stage Dockerfile targeting specific versions tested for compatibility: PostgreSQL 17 on Debian 12 (Bookworm), pgvector 0.8.1, pgvectorscale 0.9.0, and TimescaleDB 2.x. This version combination works reliably, although diverging to newer versions (PostgreSQL 18, Debian 13) requires thorough testing, as extension compatibility with the base OS and PostgreSQL versions isn't guaranteed.
Start by cloning the tutorial repository to access the complete Dockerfile and Kubernetes manifests:
git clone https://TIMESCALE-REPO.git
cd TIMESCALE-REPOThe Dockerfile implements a four-stage build process, with each stage isolating specific compilation requirements.
The first stage compiles pgvector (the traditional vector data type for similarity operations) using CloudNativePG's PostgreSQL 17 Debian 12 base image as the builder. Debian 12 is critical here because Debian 11 is deprecated in CloudNativePG's image catalog, and Debian 13 lacks stable packages for several PostgreSQL extensions.
Using the same base image for both building and runtime ensures library compatibility and aligns with CloudNativePG's initialization patterns. The compilation follows standard PostgreSQL extension build procedures with make install:
FROM ghcr.io/cloudnative-pg/postgresql:17-bookworm AS pgvector-builder
ARG PG_MAJOR=17
ARG PGVECTOR_VERSION=0.8.1
…
RUN make clean && make OPTFLAGS="" && make installpgvectorscale extends pgvector with DiskANN indexes—disk-based approximate nearest neighbor search that maintains optimal query performance while using minimal RAM. This makes vector search viable in resource-constrained Kubernetes environments. The extension compiles from Rust source using cargo-pgrx with a critical configuration requirement:
ENV RUSTFLAGS="-C target-cpu=x86-64-v3 -C target-feature=+avx2,+fma"
The RUSTFLAGS configuration requires AVX2 and FMA CPU instructions for pgvectorscale's DiskANN implementation. While most modern CPUs support these instructions (Intel since 2013, AMD since 2015), virtualized environments often don't expose them to guest VMs by default. If deploying on VMs, verify the hypervisor configuration exposes these instruction sets by running this command inside your Kubernetes nodes:
lscpu | grep -E 'avx2|fma'TimescaleDB installs from the official apt repository rather than compiling from source:
FROM ghcr.io/cloudnative-pg/postgresql:17-bookworm AS timescaledb-builder
…
RUN apt-get update && apt-get install -y \
Timescaledb-2-postgresql-17
This approach simplifies maintenance since version updates only require changing the Dockerfile's ARG declaration and rebuilding, while apt handles PostgreSQL version compatibility automatically.
The final stage combines all components on CloudNativePG's PostgreSQL 17 base image. CloudNativePG's operator is strict about file paths and ownership; extensions must follow standard PostgreSQL directory conventions for the operator's initialization process to recognize them:
…
COPY --from=pgvector-builder \
/usr/lib/postgresql/17/lib/vector.so \
/usr/lib/postgresql/17/lib/
COPY --from=pgvector-builder \
/usr/share/postgresql/17/extension/vector*.sql \
/usr/share/postgresql/17/extension/
…
The assembly stage copies shared libraries to /usr/lib/postgresql/17/lib/ and SQL/control files to /usr/share/postgresql/17/extension/, then switches to user 26 (the postgres user). CloudNativePG expects this ownership model from container startup; diverging from these paths or permissions breaks operator initialization.
Build the image with standard Docker commands, adjusting the registry, image name, and tag for your environment:
docker build -t YOUR_REGISTRY/timescale-ai:pg17-v1.0.0 .
docker push YOUR_REGISTRY/timescale-ai:pg17-v1.0.0
After building and pushing the image to your registry, CloudNativePG requires an ImageCatalog resource to reference custom PostgreSQL images during cluster initialization. The ImageCatalog configuration and cluster deployment manifests are covered in the next section.
This tutorial assumes the CloudNativePG operator is installed in your cluster. If not, follow the official installation guide.
Verify the operator is running:
kubectl get deployment -n cnpg-systemYou should see the cnpg-cloudnative-pg deployment ready and available.
With the custom image built and pushed to your registry, the ImageCatalog resource in kubernetes/imagecatalog-timescaledb.yaml tells CloudNativePG where to find the image during cluster initialization:
apiVersion: postgresql.cnpg.io/v1
kind: ImageCatalog
metadata:
name: timescale-ai
namespace: timescaledb
spec:
images:
- major: 17
image: YOUR_REGISTRY/timescale-ai:pg17-v1.0.0This catalog bridges the custom build process with Kubernetes-native deployment by referencing the catalog by name rather than specifying image URLs directly, allowing centralized image version management across multiple PostgreSQL clusters.
The Cluster resource defines PostgreSQL deployment topology, storage configuration, and extension initialization. The tutorial repository includes a script named cluster-setup.sh, run it, to create the namespace, deploy the ImageCatalog, and provision the cluster:
./cluster-setup.shThe script applies the manifest in kubernetes/cluster-timescaledb.yaml, which configures a three-instance deployment (one primary, two replicas) with 10Gi storage per instance:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: timescaledb-cluster
namespace: timescaledb
spec:
instances: 3
imageCatalogRef:
apiGroup: postgresql.cnpg.io
kind: ImageCatalog
name: timescale-ai
major: 17
storage:
size: 10Gi
postgresql:
shared_preload_libraries:
- timescaledb
parameters:
work_mem: '128MB'
maintenance_work_mem: '1GB'
shared_buffers: '512MB'
effective_cache_size: '1536MB'
max_connections: '100'
timescaledb.max_background_workers: '4'
bootstrap:
initdb:
database: app
owner: app
postInitApplicationSQL:
- CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
- CREATE EXTENSION IF NOT EXISTS vector CASCADE;
- CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;Let's break down the key configuration parameters:
Note on resource limits: For demonstration purposes, this manifest doesn't define Kubernetes resource limits or requests. The parameters above assume PostgreSQL has at least 2GB memory available. Production deployments should define explicit resources in the Cluster spec and tune parameters based on actual pod sizing.
CloudNativePG provisions the three-instance cluster within 2-3 minutes. The operator creates persistent volumes, initializes the primary instance, and starts replication to replicas automatically. Monitor cluster initialization:
kubectl get cluster -n timescaledb -wAfter the cluster reaches a ready state, verify that all extensions are loaded correctly with a single comprehensive test:
kubectl exec -it timescaledb-cluster-1 -n timescaledb -- \
psql -U postgres -d app -c "
-- Verify all extensions
SELECT extname, extversion FROM pg_extension
WHERE extname IN ('timescaledb', 'vector', 'vectorscale')
ORDER BY extname;
-- Verify DiskANN access method
SELECT amname FROM pg_am WHERE amname = 'diskann';
-- Test TimescaleDB hypertable creation
CREATE TABLE IF NOT EXISTS metrics (
time TIMESTAMPTZ NOT NULL,
value DOUBLE PRECISION
);
SELECT create_hypertable('metrics', 'time', if_not_exists => TRUE);
-- Confirm operational
SELECT 'All extensions operational!' as status;
"
The expected output is:
extname | extversion
-------------+------------
timescaledb | 2.19.3
vector | 0.8.1
vectorscale | 0.9.0
(3 rows)
amname
---------
diskann
(1 row)
CREATE TABLE
create_hypertable
------------------------------
(1,public,metrics,t)
(1 row)
status
-----------------------------
All extensions operational!
(1 row)
This output verifies extension versions match the build configuration, the DiskANN access method is registered correctly for pgvectorscale indexes, TimescaleDB hypertable creation works, and the complete stack is operational.
The tutorial repository includes a sample application demonstrating time-series data combined with vector embeddings. The application models stock market pattern recognition: 432,000 stock price records stored in TimescaleDB hypertables with 1,800 corresponding pattern embeddings for similarity search.
This demo dataset validates the stack integration at demonstration scale: hypertables partition correctly, DiskANN indexes build successfully, compression activates automatically, and time-series + vector queries execute as expected. The dataset is intentionally sized for quick experimentation: initialization completes in 2-3 minutes, allowing rapid iteration on configuration changes.
Initialize the sample schema and data:
./initialize-db.shThe initialization script creates hypertables, seeds data, generates embeddings, and builds DiskANN indexes for both cosine and L2 distance metrics. The next section validates query performance and compression behavior using this dataset, providing baseline metrics you can compare against when experimenting with different parameter configurations or larger datasets.
With the cluster deployed using the custom image, it is time to validate that all components integrate correctly. To that end, the tutorial repository includes a script named run-capability-tests.sh, an interactive test suite that confirms the custom image solution works as intended.
The test suite offers two modes:
Run the test suite:
./run-capability-tests.shSelect Full Capability Demo if you want to generate the complete test dataset. The demonstration completes in approximately 15 minutes on a single-node K3s cluster (4 vCPU, 8GB RAM), validating that the custom image successfully integrates all components without compatibility issues.
The Full Capability Demo creates a demonstration workload combining time-series and vector data to test integration completeness. The dataset comprises 432,000 stock price records (open, high, low, close, and volume) for 10 stock symbols, spanning 272 days with 7-day chunk intervals. Pattern embeddings consist of 50,000 vectors with 1,536 dimensions distributed across the same time range.
The test verifies that TimescaleDB's automatic chunk exclusion functions work correctly, executes vector similarity searches to verify DiskANN index creation and usage, and enables compression policies to validate automatic compression on historical data.
TimescaleDB automatically partitions data into chunks based on time intervals. Testing queries with different time ranges confirms that chunk exclusion works correctly with the custom image:
Time Range | Execution Time | Chunks Scanned |
7 days | ~5ms | 1 chunk |
30 days | ~8ms | 4 chunks |
90 days | ~20ms | 8 chunks |
270 days | ~132ms | 40 chunks |
The performance pattern demonstrates that TimescaleDB's query planner successfully excludes chunks outside the specified time range. A 38.5x increase in time range (7 days to 270 days) results in 26x slower execution, confirming sub-linear scaling. Without time filters, queries scan all 40 chunks regardless of the data needed. The difference between 5ms (1 chunk) and 132ms (all chunks) illustrates why time predicates matter in query design.
These results confirm that the custom image preserves TimescaleDB's automatic partitioning and chunk exclusion capabilities without introducing compatibility issues.
The test suite validates pgvectorscale's DiskANN indexes work correctly. Three common vector search patterns execute successfully:
Query Type | Execution Time | Description |
Top-K similarity | ~12ms | Find the 10 most similar patterns from 50,000 vectors |
Cross-stock correlation | ~4ms | Identify similar patterns across different stocks |
Historical matching | ~6ms | Search patterns from 6 months ago |
DiskANN index size remains small at 24 KB per index (one for cosine distance, one for L2 distance), confirming the disk-based indexing approach works as designed. Query execution plans verified through EXPLAIN ANALYZE show "Index Scan using idx_pattern_embeddings_diskann_cosine," proving indexes built correctly and execute as expected.
These results validate that our Rust compilation of pgvectorscale integrated successfully with CloudNativePG's PostgreSQL base image.
TimescaleDB's automatic compression policy activates correctly on data older than 30 days. The test results demonstrate compression working as expected:
The exceptional compression rate (95.6%) reflects the synthetic nature of randomly generated test vectors. Real-world semantic embeddings from language models exhibit structured patterns that compress less effectively. For reference, production deployments typically achieve 60-80% compression on actual embeddings. Even at the lower end, 60% compression provides 2.5x storage savings for long-term data retention.
Queries against compressed chunks execute transparently without application changes, confirming the compression integration works correctly. The compression policy runs automatically in the background, requiring no manual intervention once configured.
This validation proves the custom image approach resolves the architectural incompatibilities between official TimescaleDB images and CloudNativePG. The stack functions correctly for the demonstration workload, providing a foundation for experimentation with your own data volumes and query patterns. Production deployments with larger datasets will require parameter tuning based on your specific workload characteristics, but the core integration is proven functional.
The validation testing ran on a single-node K3s cluster with 8GB RAM and 4 vCPUs, representing resource-constrained edge deployment scenarios. Monitoring during the Full Capability Demo revealed that CPU usage peaked at 33% with significant headroom remaining, while RAM utilization reached 90%. However, Linux's aggressive disk caching likely accounts for much of this. These results provide practical insights for optimizing similar resource-constrained deployments.
The demonstration cluster used 512MB shared_buffers, 128MB work_mem, and 1GB maintenance_work_mem without defining explicit Kubernetes resource limits. With three PostgreSQL instances running on an 8GB node, memory pressure remained manageable for the demo workload. The key insight: calculate maximum concurrent operations to avoid out-of-memory conditions using (Total RAM - shared_buffers) / work_mem. On our 8GB VM with approximately 6GB available after system overhead, this allowed roughly 45 concurrent operations at 128MB work_mem.
For production deployments, define explicit resource limits and tune parameters proportionally:
1-2GB pods (edge nodes): shared_buffers: 256MB, work_mem: 64MB, maintenance_work_mem: 512MB
2-4GB pods (standard nodes): shared_buffers: 512MB-1GB, work_mem: 128-256MB, maintenance_work_mem: 1-2GB
4GB+ pods (larger workloads): Tune based on actual query patterns and dataset size
Monitor actual memory pressure through swap usage and OOM events rather than total RAM percentage.
DiskANN's disk-based indexing made K3s's local-path provisioner viable for our demo: queries executed in 10-15ms despite using node local storage rather than high-performance network volumes. This validates a two-tier storage approach for production: fast local or SSD storage for recent uncompressed data (frequent access), and slower network-attached storage for compressed historical chunks (infrequent access, decompression overhead dominates).
TimescaleDB's compression moves data from hot to cold storage patterns automatically. Chunks compressed beyond 30 days tolerate higher storage latency because decompression CPU time exceeds storage access time. This architectural advantage reduces infrastructure costs in resource-constrained environments where high-performance storage across all data is prohibitively expensive.
Our deployment used DiskANN indexes exclusively because they enabled vector search on resource-constrained K3s nodes. At 50,000 vectors with 1,536 dimensions, DiskANN indexes consumed 24 KB of disk space, while HNSW would have required 2-3GB of RAM.
Choose DiskANN when:
Choose HNSW when:
Many AI applications tolerate 10-15ms vector search latency, as embedding generation, LLM inference, and application logic typically dominate the total request time. Our testing confirmed this; the demo application ran effectively with DiskANN's ~15ms similarity search latency.
AI workloads create unpredictable connection patterns: batch embedding generation produces connection bursts, RAG implementations open connections per query, and LLM integrations generate variable query patterns. CloudNativePG's integrated PgBouncer addresses this. Enable connection pooling by adding to the Cluster spec:
spec:
instances: 3
pooler:
enabled: true
type: rw
instances: 2
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "1000"
default_pool_size: "25"Transaction-mode pooling works with most AI workloads. Use session-mode only if your application requires prepared statements or temporary tables.
This tutorial solved a specific integration problem: The custom image built from CloudNativePG's PostgreSQL base successfully integrates TimescaleDB, pgvector, and pgvectorscale, enabling declarative cluster management through Kubernetes CRDs rather than script-based orchestration.
The validation confirmed the stack functions correctly at demonstration scale: sub-15ms vector similarity search across 50,000 embeddings, 5-20ms time-windowed queries through automatic chunk exclusion, and compression reducing storage requirements. DiskANN's 24 KB indexes proved that vector search works effectively in resource-constrained environments—our K3s testing showed 33% CPU utilization and stable operation on 8GB nodes. These baseline metrics provide a reference point for scaling experiments with your data volumes and query patterns.
Production deployment requires operational steps beyond this tutorial: configure backup policies for continuous WAL archiving and point-in-time recovery, implement monitoring through CloudNativePG's Prometheus integration, tune PostgreSQL parameters based on your workload characteristics, and validate failover behavior under load. Moreover, expect to revise memory allocations, storage sizing, and index strategies as dataset sizes grow from thousands to millions of vectors.
The complete implementation is available in the tutorial repository, including Dockerfile, Kubernetes manifests, initialization scripts, and test suite. For production deployment guidance, consult CloudNativePG documentation, TimescaleDB best practices, and pgvectorscale performance tuning.