---
title: "How TimescaleDB Outperforms ClickHouse and MongoDB for LogTide's Observability Platform"
published: 2026-04-15T08:24:18.000-04:00
updated: 2026-04-15T11:33:16.000-04:00
excerpt: "How one developer built an open-source log management platform handling 5M logs/day on minimal hardware—using TimescaleDB continuous aggregates, compression, and hypertables."
tags: Dev Q&A, TimescaleDB
authors: Giuseppe Pollio, Nicole Bahr
---

> **TimescaleDB is now Tiger Data.**

_Giuseppe “Polliog” Pollio started writing code for LogTide in September 2025. By early 2026, the platform was handling five million logs per day for alpha users, compressing 220GB of production data down to 25GB._

## LogTide 

Most enterprise log management tools are built for enterprises. Datadog and Splunk far exceed small operation budgets. For developers running a self-hosted stack, there is no clear alternative for affordable log observability.

LogTide addresses this gap as an open-source log management and SIEM platform built specifically for teams who need serious observability without serious hardware. Sigma rule-based detection, structured log search, alerting, and notifications, the same capabilities that make Datadog and Splunk useful, run in two gigabytes of RAM with Logtide.

"That's because our target is small agencies and home labs," Giuseppe explains. "I wanted to create an ecosystem with low impact on RAM, something you can host on a really old machine."

LogTide launched its cloud alpha in early 2026, with around 100 companies stress-testing the platform for free. One of them sends five million logs per day.

## The Challenge

When Giuseppe set out to build LogTide, he targeted home labs and small businesses who cannot afford enterprise infrastructure, let alone enterprise pricing.

ELK - Elasticsearch, Logstash, Kibana typically require multiple nodes and significant RAM. Grafana Loki is lighter but still has indexing and query limitations that make full-text log search painful at scale. ClickHouse is fast and compresses well, but is built for analytics clusters, not Raspberry Pis. Datadog and Splunk simply cost too much. 

LogTide needed a reliable database to underpin its OSS log observability that could scale to production without split architecture or excessive budget spend. 

## Why TimescaleDB

Giuseppe found TimescaleDB while searching for Postgres with additional support for high ingest of event data.

"There are lots of alternatives, but most are too resource-intensive," Giuseppe explains. "TimescaleDB was a perfect choice."

> _There are lots of alternatives, but most are too resource-intensive. TimescaleDB was a perfect choice. - Giuseppe Pollio, Founder, LogTide_

The appeal was both technical and practical. TimescaleDB is Postgres. It uses the same wire protocol, the same SQL syntax, the same tooling, and the same extension ecosystem. For a solo developer building a platform that has to run on minimal hardware, that meant no operational surprises, no vendor-specific APIs, and no migration work if users already had Postgres running. 

“If Postgres can run on your machine, TimescaleDB can run,” notes Giuseppe,”and you can deploy LogTide for inexpensive observability at scale.” 

## The LogTide Stack

LogTide's architecture is simple by design. “Simple architecture means it's easier to manage, easier to maintain,” said Giuseppe.

> _Simple architecture means it’s easier to manage, easier to maintain. - Giuseppe Pollio_

Logs enter the system from one of three client sources: OpenTelemetry-instrumented services, Fluent Bit agents, or one of LogTide's native SDKs. All three routes converge on a single ingest endpoint. The endpoint handles format variations including OTEL format and a handful of special-case adapters so the ingestion path stays unified regardless of how the log was generated.

From the ingest endpoint, log payloads enter a job queue backed by Redis. Redis is optional: if it is not available, the ingestion path routes directly to the worker. The worker is where the platform earns its SIEM designation. It evaluates Sigma rules against incoming logs, generates alerts, dispatches notifications, and runs the full analysis pipeline. 

After processing, logs pass through what Giuseppe calls the LogTide Reservoir: a storage abstraction layer that keeps the backend pluggable. In practice, only one backend is truly necessary.

"TimescaleDB is our unique persistent database," Giuseppe explains. "All the aggregation that populates our dashboards is powered by TimescaleDB."

> _TimescaleDB is our unique persistent database. All the aggregation that populates our dashboards is powered by TimescaleDB. - Giuseppe Pollio_

Inside TimescaleDB, LogTide maintains three hypertable families: raw logs, distributed traces (spans), and detection events. Retention policies run automatically with no manual intervention or cron jobs. Continuous aggregates sit on top of the raw log hypertable and are what make the platform fast at scale.

From `packages/backend/src/modules/retention/service.ts`:

```typescript
/**
 * Execute retention cleanup for all organizations.
 *
 * Strategy (scales with number of distinct retention values, not orgs):
 * 1. drop_chunks for max retention — instant, drops entire files
 * 2. Group orgs by retention_days, collect all project_ids per group
 * 3. For each group with retention < max: batch-delete their logs
 */
async executeRetentionForAllOrganizations(): Promise<RetentionExecutionSummary> {
  const startTime = Date.now();
  const logging = isInternalLoggingEnabled();

  // Get all organizations with their retention + projects
  const organizations = await db
    .selectFrom('organizations')
    .select(['id', 'name', 'retention_days'])
    .execute();

  const orgProjects = await db
    .selectFrom('projects')
    .select(['id', 'organization_id'])
    .execute();

  // Build org -> projectIds map
  const projectsByOrg = new Map<string, string[]>();
  for (const p of orgProjects) {
    const list = projectsByOrg.get(p.organization_id) || [];
    list.push(p.id);
    projectsByOrg.set(p.organization_id, list);
  }

  // Find max retention (used for drop_chunks)
  const maxRetention = Math.max(...organizations.map(o => o.retention_days));
  const maxCutoff = new Date(Date.now() - maxRetention * 24 * 60 * 60 * 1000);

  // Step 1: drop_chunks older than max retention (TimescaleDB only — instant, no decompression)
  // For ClickHouse, TTL policies handle this natively or deleteByTimeRange in step 3
  let chunksDropped = 0;
  if (reservoir.getEngineType() === 'timescale') {
    try {
      const dropResult = await sql`
        SELECT drop_chunks('logs', older_than => ${maxCutoff}::timestamptz)
      `.execute(db);
      chunksDropped = dropResult.rows.length;

      /* v8 ignore next 6 -- telemetry, disabled in tests */
      if (chunksDropped > 0 && logging) {
        hub.captureLog('info', `Dropped ${chunksDropped} chunks older than ${maxRetention} days`, {
          maxRetentionDays: maxRetention,
          cutoffDate: maxCutoff.toISOString(),
          chunksDropped,
        });
      }
    } catch (err) {
      // drop_chunks may fail if no chunks to drop — that's fine
      /* v8 ignore next 4 -- telemetry, disabled in tests */
      if (logging) {
        const msg = err instanceof Error ? err.message : String(err);
        hub.captureLog('debug', `drop_chunks: ${msg}`);
      }
    }
  }

  // Step 2: Group orgs by retention_days (only those with retention < max need per-row deletes)
  const retentionGroups = new Map<number, { orgs: typeof organizations; projectIds: string[] }>();
  for (const org of organizations) {
    if (org.retention_days >= maxRetention) continue; // already handled by drop_chunks

    const group = retentionGroups.get(org.retention_days) || { orgs: [], projectIds: [] };
    group.orgs.push(org);
    const orgProjectIds = projectsByOrg.get(org.id) || [];
    group.projectIds.push(...orgProjectIds);
    retentionGroups.set(org.retention_days, group);
  }

  // Step 3: Batch-delete per retention group
  const results: RetentionExecutionResult[] = [];
  let totalDeleted = 0;
  let failedCount = 0;

  for (const [retentionDays, group] of retentionGroups) {
    if (group.projectIds.length === 0) {
      for (const org of group.orgs) {
        results.push({
          organizationId: org.id,
          organizationName: org.name,
          retentionDays,
          logsDeleted: 0,
          executionTimeMs: 0,
        });
      }
      continue;
    }

    const groupStart = Date.now();
    const cutoffDate = new Date(Date.now() - retentionDays * 24 * 60 * 60 * 1000);

    try {
      const oldestResult = await reservoir.query({
        projectId: group.projectIds,
        from: new Date(0),
        to: cutoffDate,
        limit: 1,
        sortOrder: 'asc',
      });

      if (oldestResult.logs.length === 0) {
        for (const org of group.orgs) {
          results.push({
            organizationId: org.id,
            organizationName: org.name,
            retentionDays,
            logsDeleted: 0,
            executionTimeMs: Date.now() - groupStart,
          });
        }
        continue;
      }

      const deleted = await this.batchDeleteLogs(
        group.projectIds,
        cutoffDate,
        new Date(oldestResult.logs[0].time)
      );
      totalDeleted += deleted;
    } catch (error) {
      failedCount += group.orgs.length;
    }
  }
}
```

"The aggregates are necessary," said Giuseppe. "If you have five million, ten million logs every day, and you need to see how many logs you received every hour, you can't run that query on 10 million logs. The aggregates give you query results in milliseconds instead of 30 or 40 seconds."

**Continuous aggregate definition**, from `packages/backend/migrations/004_performance_optimization.sql`:

```SQL
CREATE MATERIALIZED VIEW logs_hourly_stats
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) AS bucket,
  project_id,
  level,
  service,
  COUNT(*) AS log_count
FROM logs
GROUP BY bucket, project_id, level, service
WITH NO DATA;

-- Refreshes automatically every hour
SELECT add_continuous_aggregate_policy('logs_hourly_stats',
  start_offset => INTERVAL '3 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour',
  if_not_exists => TRUE
);

CREATE INDEX IF NOT EXISTS idx_logs_hourly_stats_project_bucket
  ON logs_hourly_stats (project_id, bucket DESC);
```

**Hybrid query at runtime**, from `packages/backend/src/modules/dashboard/service.ts`

```typescript
const [todayAggregateStats, recentTotal, recentErrors, recentServices, yesterdayAggregateStats, prevHourCount] = await Promise.all([
  // Today's historical stats from aggregate (today start to 1 hour ago)
  db
    .selectFrom('logs_hourly_stats')
    .select([
      sql<string>`COALESCE(SUM(log_count), 0)`.as('total'),
      sql<string>`COALESCE(SUM(log_count) FILTER (WHERE level IN ('error', 'critical')), 0)`.as('errors'),
      sql<string>`COUNT(DISTINCT service)`.as('services'),
    ])
    .where('project_id', 'in', projectIds)
    .where('bucket', '>=', todayStart)
    .where('bucket', '<', lastHourStart)
    .executeTakeFirst(),

  // Recent stats from reservoir (last hour)
  reservoir.count({ projectId: projectIds, from: lastHourStart, to: new Date() }),
  reservoir.count({ projectId: projectIds, from: lastHourStart, to: new Date(), level: ['error', 'critical'] }),
  reservoir.distinct({ field: 'service', projectId: projectIds, from: lastHourStart, to: new Date() }),

  // Yesterday's stats from aggregate
  db
    .selectFrom('logs_hourly_stats')
    .select([
      sql<string>`COALESCE(SUM(log_count), 0)`.as('total'),
      sql<string>`COALESCE(SUM(log_count) FILTER (WHERE level IN ('error', 'critical')), 0)`.as('errors'),
      sql<string>`COUNT(DISTINCT service)`.as('services'),
    ])
    .where('project_id', 'in', projectIds)
    .where('bucket', '>=', yesterdayStart)
    .where('bucket', '<', todayStart)
    .executeTakeFirst(),

  // Previous hour from reservoir (for throughput trend)
  reservoir.count({ projectId: projectIds, from: prevHourStart, to: lastHourStart }),
]);
```

![LogTide's architecture. Logs flow from client SDKs and agents through a single ingest endpoint, into a processing worker, and into TimescaleDB hypertables via the LogTide Reservoir storage abstraction.](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/04/LOGTIDE-architecture-diagram-1.png)

__LogTide's architecture. Logs flow from client SDKs and agents through a single ingest endpoint, into a processing worker, and into TimescaleDB hypertables via the LogTide Reservoir storage abstraction.__

## What We've Seen

### 220GB Down to 25GB

In production, LogTide's TimescaleDB deployment compressed 220GB of raw log data, 135GB of row data plus 85GB of indexes, down to 25GB. That is an 88.6% reduction, achieved using TimescaleDB's native columnar compression with a segmentby configuration on project\_id and log level, ordered by timestamp descending. Chunks older than seven days compress automatically.

From `packages/backend/migrations/001_initial_schema.sql`:

```SQL
-- Enable compression on logs hypertable
ALTER TABLE logs SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'project_id',
  timescaledb.compress_orderby = 'time DESC'
);

-- Add compression policy for logs (compress chunks older than 7 days)
SELECT add_compression_policy('logs', INTERVAL '7 days', if_not_exists => TRUE);

-- Global retention safety net
SELECT add_retention_policy('logs', INTERVAL '90 days', if_not_exists => TRUE);
```

Query performance did not degrade. Time-range filtering got 33% faster after compression. Aggregations got 41% faster. Only30 full-text search slowed slightly, by about 12%, because columnar storage requires scanning additional columns to reconstruct text fields. For a log management platform where engineers are far more likely to query a time window than to search a raw string, the tradeoff strongly favors compression.

In practice, 30 million logs stored in 15GB on a single 4-vCPU, 8GB RAM node, with a P95 query latency of 50ms. Learn more in Giuseppe’s [dev.to post on TimescaleDB compression](https://dev.to/polliog/timescaledb-compression-from-150gb-to-15gb-90-reduction-real-production-data-bnj). 

### TimescaleDB Bested MongoDB and ClickHouse in Head-to-Head Performance Benchmarks

Giuseppe built an open benchmark suite and ran it across 1K to 1M records, as outlined in his [AWS Builder Center article benchmarking ClickHouse and MongoDB vs TimescaleDB](https://builder.aws.com/content/3Aoryr85VEVzFKrFjDmzXpwRLkU/i-benchmarked-timescaledb-vs-clickhouse-vs-mongodb-for-observability-data). The ingestion story is straightforward: at batch sizes typical of real-world observability (100 events per call), TimescaleDB handles 14,200 inserts per second. ClickHouse handles 250 at the same batch size. The gap exists because ClickHouse buffers small writes and flushes on a 400ms timer, the right design for bulk analytics, the wrong design when a dozen microservices are logging in real time.

The query results are the main story. At 100,000 log records, TimescaleDB answers a filtered service query in 0.47ms. MongoDB answers the same query in 304ms, a 650x difference. Under 50 concurrent queries, TimescaleDB holds at 6.2ms whether the dataset is 1,000 or 1,000,000 records. The mechanism is hypertable partitioning: queries filter by time range and service, TimescaleDB routes them to the active chunk instead of scanning the full table, and continuous aggregates make count and dashboard queries nearly free because the work is already done at write time.

### A 2GB RAM Requirement Keeps Operations Lean 

The most important number is not the compression ratio or the write throughput. It is the 2GB RAM figure that defines where LogTide can actually run.

"If you have log management that can work with 2GB of RAM, it's really magic," Giuseppe says. "Because you can't do that with Datadog or Splunk or the other self-hosted programs and containers."

> _If you have log management that can work with 2GB of RAM, it's really magic.  You can't do that with Datadog or Splunk or the other self-hosted programs and containers. - Giuseppe Pollio_

That 2GB ceiling is what makes LogTide viable for home labs running NAS, small businesses on shared hosting, or a developer who wants to know when their Raspberry Pi's services throw errors. The entire LogTide platform, including API, worker, dashboard, and TimescaleDB storage, runs on the same hardware that already runs Postgres. 

## Looking Ahead

The LogTide Cloud Platform alpha prototype is now open to trial users.  Meanwhile, LogTide’s open-source project is growing fast. Hundreds of GitHub stars and 1k+ clones per day signal a developer community that has found the project and is actively building with it. The next phase is expanding SDK coverage and continuing to stress-test the storage layer. TimescaleDB runs anywhere Postgres runs. The goal is to make sure LogTide does too.