---
title: "Complete Guide: Migrating from MongoDB"
description: "Complete Guide: Migrating from MongoDB. Learn more about PostgreSQL time-series data, real-time analytics, and AI applications with Tiger Data."
section: "Timescale basics"
---

*Written by Jakkie Koekemoer*

Many production systems run both MongoDB and PostgreSQL concurrently. The typical pattern: MongoDB handles flexible schema requirements and rapid development, while PostgreSQL manages relational data, complex queries, and analytics workloads. This dual-database architecture creates operational overhead; two systems to monitor, backup, scale, and maintain.

Tiger Data consolidates these workloads into a unified platform. The system provides JSONB support for schema flexibility (equivalent to MongoDB's document model), Hypertables for time-series data ingestion and queries, and full SQL capabilities for relational operations. This architecture eliminates the need to maintain separate database systems while preserving the functional capabilities of each.

This guide documents the complete migration process from MongoDB to Tiger Data.

Here is a glimpse of the key takeaways: 

| **Key Takeaway** | **Details** |
| --- | --- |
| **Migration Time** | 8-16 hours for typical workloads |
| **Primary Benefit** | 3-10x faster aggregations; unified SQL interface |
| **Complexity** | Medium (Schema design is the main effort) |
| **Cost Impact** | Eliminate dual-database operational overhead |
| **Data Model** | JSONB for flexible schemas, Hypertables for time-series, standard tables for relational data |
| **Target Use Cases** | Event logs, user profiles, sensor data, transactional records |
| **Query Language** | Standard SQL with JSON operators |

The following sections detail stepwise process along with common pitfalls and how to avoid them.

## When to Migrate from MongoDB to Tiger Data

**Should you migrate from MongoDB to Tiger Data?** Migrate if you're running MongoDB alongside PostgreSQL for analytics, experiencing slow aggregation queries (>2 seconds), managing time-series data with manual partitioning, or need complex JOIN operations across collections. MongoDB's `$lookup` (join operations) is typically slower than PostgreSQL's native joins, especially for complex multi-table joins used in analytical queries—benchmarks show PostgreSQL delivers 4-15x faster performance on OLTP workloads and complex aggregations when data is properly structured.

Stay with MongoDB if your workload consists primarily of simple document lookups with minimal cross-collection queries, your schema changes multiple times per week, your application benefits from MongoDB's native sharding and replica set tooling, or your team has deep MongoDB expertise and limited need for complex relational queries or SQL-based analytics.

## Understanding the Architecture Difference

| Feature | MongoDB | Tiger Data |
| --- | --- | --- |
| **Data Model** | Document collections (BSON) | Relational tables + JSONB support |
| **Partitioning** | Manual sharding with shard keys | Automatic time-based chunking (Hypertables) |
| **Aggregations** | Aggregation pipeline | Standard SQL GROUP BY with query planner |
| **Joins** | `$lookup` operator (limited) | Native SQL JOINs with foreign keys |
| **Time-Series** | Time-series collections (block processing) | Hypertables with columnar compression |
| **Compression** | WiredTiger compression | Native columnar compression (90%+ reduction) |
| **Foundation** | MongoDB 8.2+ (December 2025) | PostgreSQL 18 + TimescaleDB 2.24.x |

**Key architectural advantage:** Tiger Data is built 100% on PostgreSQL with the TimescaleDB extension, meaning you get full PostgreSQL features plus specialized time-series optimizations. When you create a Hypertable, Tiger Data automatically partitions data into time-based chunks (default: 7 days) and manages them transparently. Queries use standard SQL, and the query planner automatically scans only relevant chunks.

## Step-by-Step Migration Process

### 1. Analyze your MongoDB schema and query patterns

Before migrating, document your current architecture:

- Identify collections with time-series data (require Hypertables)
- Map relationships between collections (candidates for foreign keys)
- Analyze aggregation pipeline queries (convert to SQL GROUP BY)
- Determine which fields are truly schemaless (use JSONB) vs structured

**Prerequisites:**

- MongoDB 4.0+ (mongodump compatibility)
- PostgreSQL 15, 16, or 17
- Tiger Data (TimescaleDB 2.20.0+ extension)
- Python 3.8+ with `pymongo `and `psycopg2 `libraries

### 2. Install and configure Tiger Data

`-- On PostgreSQL 15, 16, or 17, install the TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb;`

`-- Verify installation
SELECT extversion FROM pg_extension WHERE extname = 'timescaledb';
-- Expected: 2.24.0 or later (December 2025)`

### 3. Design Your PostgreSQL schema

**For time-series data (IoT sensors, metrics, logs):** 

`-- Create table, then convert to Hypertable
CREATE TABLE sensor_readings (
  time TIMESTAMPTZ NOT NULL,
  sensor_id TEXT NOT NULL,
  location TEXT NOT NULL,
  temperature NUMERIC(5,2),
  humidity NUMERIC(5,2)
);`

`-- Convert to Hypertable (automatic partitioning by time)
SELECT create_hypertable('sensor_readings', 'time');`

`-- Create indexes for query patterns
CREATE INDEX idx_sensor_location ON sensor_readings (sensor_id, location, time DESC);`

**For flexible document data:** 

`-- JSONB approach for rapid migration
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL UNIQUE,
  data JSONB NOT NULL,
  created_at TIMESTAMPTZ NOT NULL
);`

`-- GIN index for JSONB queries
CREATE INDEX idx_users_data ON users USING GIN (data);`

**For relational data (orders, customers):** 

`-- Normalized relational schema
CREATE TABLE users (
  user_id TEXT PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);`

`CREATE TABLE orders (
  order_id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
  amount NUMERIC(10,2) NOT NULL,
  status TEXT NOT NULL,
  order_date TIMESTAMPTZ NOT NULL
);`

`CREATE INDEX idx_orders_user_date ON orders(user_id, order_date DESC);`

### 4. Export and transform MongoDB data

`from pymongo import MongoClient
import psycopg2
from psycopg2.extras import execute_batch
import json
import os`

`# MongoDB connection
mongo_client = MongoClient('mongodb://localhost:27017/')
mongo_db = mongo_client['production_db']`

`# Tiger Data connection
pg_conn = psycopg2.connect(
    host='your-instance.tigerdata.cloud',
    database='production',
    user='postgres',
    password=os.getenv('TIGERDATA_PASSWORD')
)
pg_cursor = pg_conn.cursor()`

`# Batch insert for time-series data
batch_size = 1000
batch = []
total_inserted = 0`

`INSERT_QUERY = """
    INSERT INTO sensor_readings (time, sensor_id, location, temperature, humidity)
    VALUES (%s, %s, %s, %s, %s)
"""`

`try:
    # Get total count for progress tracking
    total_docs = mongo_db['sensor_readings'].count_documents({})
    print(f"Total documents to migrate: {total_docs}")
    
    for doc in mongo_db['sensor_readings'].find().batch_size(batch_size):
        # Handle missing fields with defaults
        batch.append((
            doc.get('timestamp'),
            doc.get('sensorId'),
            doc.get('location'),
            doc.get('temperature'),
            doc.get('humidity')
        ))
        
        if len(batch) >= batch_size:
            execute_batch(pg_cursor, INSERT_QUERY, batch)
            pg_conn.commit()
            total_inserted += len(batch)
            print(f"Inserted {total_inserted}/{total_docs} records ({total_inserted/total_docs*100:.1f}%)")
            batch = []
    
    # Insert remaining records
    if batch:
        execute_batch(pg_cursor, INSERT_QUERY, batch)
        pg_conn.commit()
        total_inserted += len(batch)
        print(f"Migration complete: {total_inserted} records inserted")`

`except Exception as e:
    print(f"Error during migration: {e}")
    pg_conn.rollback()
    raise`

`finally:
    pg_cursor.close()
    pg_conn.close()
    mongo_client.close()`

### 5. Configure compression and retention

`-- Enable compression on Hypertable
ALTER TABLE sensor_readings SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'sensor_id',
  timescaledb.compress_orderby = 'time DESC'
);`

`-- Automatic compression policy (compress data older than 7 days)
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');`

`-- Data retention policy (drop data older than 1 year)
SELECT add_retention_policy('sensor_readings', INTERVAL '1 year');`

`-- Verify compression status (run after compression has executed)
SELECT 
  chunk_schema,
  chunk_name,
  compression_status,
  pg_size_pretty(before_compression_total_bytes) AS before,
  pg_size_pretty(after_compression_total_bytes) AS after,
  ROUND(100 - (after_compression_total_bytes::numeric / 
    before_compression_total_bytes * 100), 1) AS compression_ratio
FROM chunk_compression_stats('sensor_readings');
`

### 6. Create continuous aggregates for dashboard queries

`-- Pre-compute hourly averages
CREATE MATERIALIZED VIEW sensor_hourly_avg
WITH (timescaledb.continuous) AS
SELECT 
  time_bucket('1 hour', time) AS bucket,
  sensor_id,
  AVG(temperature) AS avg_temp,
  AVG(humidity) AS avg_humidity
FROM sensor_readings
GROUP BY bucket, sensor_id;`

`-- Add refresh policy (refresh every hour, materialized data from 2 hours ago to 1 hour ago)
SELECT add_continuous_aggregate_policy('sensor_hourly_avg',
  start_offset => INTERVAL '2 hours',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);`

`-- Verify real-time aggregates are enabled (default behavior in TimescaleDB 2.0+)
SELECT view_name, materialized_only 
FROM timescaledb_information.continuous_aggregates
WHERE view_name = 'sensor_hourly_avg';
-- Expected: materialized_only = false (real-time enabled)`

`-- Query continuous aggregate 
-- Combines pre-computed data with real-time data automatically
SELECT * FROM sensor_hourly_avg
WHERE bucket > NOW() - INTERVAL '24 hours'
  AND sensor_id = 'temp_sensor_01'
ORDER BY bucket DESC;`

### 7. Update application queries

**MongoDB aggregation pipeline:**

`// MongoDB
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: {
      _id: "$category",
      totalSales: { $sum: "$amount" },
      avgOrderValue: { $avg: "$amount" }
    }
  },
  { $sort: { totalSales: -1 } }
])`

**Tiger Data SQL equivalent:** 

`-- Tiger Data (significantly faster for complex multi-table aggregations)
SELECT 
  category,
  SUM(amount) as total_sales,
  AVG(amount) as avg_order_value,
  COUNT(*) as order_count
FROM orders
WHERE status = 'completed'
GROUP BY category
ORDER BY total_sales DESC;`


**JSONB queries:** 

`-- Query nested JSONB fields
SELECT * FROM users 
WHERE data->'profile'->>'name' = 'Alex Chen';`

`-- Query array elements
SELECT * FROM users 
WHERE data->'tags' ? 'premium';`

`-- Extract specific fields
SELECT 
  user_id,
  data->>'email' as email,
  data->'profile'->>'name' as name
FROM users;`

### 8. Validate and monitor performance


`-- Compare record counts
SELECT COUNT(*) FROM sensor_readings;`

`-- Analyze query performance
EXPLAIN ANALYZE
SELECT 
  time_bucket('1 hour', time) AS hour,
  AVG(temperature) as avg_temp
FROM sensor_readings
WHERE time > NOW() - INTERVAL '24 hours'
  AND location = 'warehouse_a'
GROUP BY hour
ORDER BY hour DESC;`

`-- Monitor compression effectiveness
SELECT 
  hypertable_name,
  total_chunks,
  number_compressed_chunks,
  pg_size_pretty(before_compression_total_bytes) as uncompressed,
  pg_size_pretty(after_compression_total_bytes) as compressed
FROM hypertable_compression_stats('sensor_readings');`

## Query Performance Comparison

| Query Type | MongoDB Performance | Tiger Data Performance | Improvement |
| --- | --- | --- | --- |
| Simple document fetch by ID | ~1ms (optimal) | ~1ms (comparable) | Equal |
| Time-series aggregation (1M rows) | 2-5 seconds | 300-500ms | 4-10x faster |
| Complex JOIN across 3+ tables | Not supported (requires app-level joins) | 100-300ms (native joins) | N/A |
| Full-text search | Text indexes required | GIN indexes + tsvector | Comparable |
| Geo-spatial queries | Supported via 2dsphere | PostGIS extension required | Depends on workload |

**Note:** Performance varies by query complexity, data distribution, and indexing strategy. MongoDB 8.0 introduced block processing for time-series collections, improving aggregation performance by 200%, but PostgreSQL’s query optimizer remains superior for analytical workloads requiring multiple joins and complex aggregations.

## Storage Optimization Results

**Tiger Data’s columnar compression typically achieves 80-97% storage reduction** for time-series data, with results depending on data characteristics and configuration.

**Verified examples**:

- **Industrial IoT**: 64 GB daily → ~2 GB (97% reduction)
- **OpenStreetMap data**: 607 MB → 45 MB (93% reduction)
- **Average savings**: 5x compression ratio (80% reduction)

**Best compression candidates**: Repetitive sensor readings, sequential timestamps, low-cardinality fields (device types, status codes), numeric measurements with predictable ranges.

**Lower compression**: High-cardinality text (UUIDs, random strings), encrypted data, spatial coordinates, frequently updated records.

**Configuration tip**: Set `compress_segmentby` to match WHERE clause filters and `compress_orderby` to time-based columns. See [compression best practices](https://docs.tigerdata.com/use-timescale/latest/compression/about-compression/) for detailed guidance.

**Reality check**: The 90%+ rates represent optimal scenarios. Production deployments typically see 60-85% compression for mixed workloads. Always test on representative data samples.

## Common Pitfalls and Solutions

1. **Over-normalizing flexible data:** Not all MongoDB documents need relational normalization. Use JSONB for genuinely variable schema fields.
2. **Incorrect compression settings: **Match `compress_segmentby` to your WHERE clause patterns. Poor segmentation forces decompression of unnecessary data. See best practices for using `compress_segmentby `[here](https://forum.tigerdata.com/forum/t/what-are-the-best-practices-for-compress-segmentby/2229).
3. **Missing indexes: **Tiger Data relies on indexes for query performance. Create indexes on Hypertables before large data loads (or defer until after bulk insert).
4. **Ignoring continuous aggregate limitations: **Continuous aggregates don’t support DISTINCT, ORDER BY within aggregates, or window functions. Use standard materialized views for complex queries.
5. **Not testing in staging: **Schema design decisions are difficult to reverse. Test migration approaches with representative data before production deployment.

## When Dual-Database Architecture Still Makes Sense

In some cases, you may not need to eliminate MongoDB entirely. Consider hybrid approaches:

- **MongoDB: **Operational data with frequent schema changes, user profiles, content management
- **Tiger Data: **Analytical queries, time-series data, reporting, dashboards
- **Sync strategy: **Use change data capture (CDC) tools like Debezium to replicate MongoDB changes to Tiger Data for analytics

## Conclusion

Migrating from MongoDB to Tiger Data consolidates infrastructure while delivering measurable performance improvements for time-series and analytical workloads. The migration process requires careful schema design and query translation, but the operational simplification—one database, one query language, one monitoring system—reduces long-term maintenance burden. Tiger Data’s PostgreSQL foundation provides mature ecosystem integration, while its time-series extensions (Hypertables, compression, continuous aggregates) match or exceed specialized time-series databases.

**Ready to consolidate your database infrastructure?** Eliminate dual-database complexity and reduce operational costs by 60-80%. [Start your free Tiger Data migration](https://www.tigerdata.com/signup)—most teams complete their first production cutover within 2 weeks following structured approaches. Access [step-by-step migration guides](https://docs.tigerdata.com/migrate/latest/), schema design templates, and dedicated support at no cost during your trial.

## FAQ About MongoDB to Tiger Data Migration

**Q: How long does MongoDB to Tiger Data migration take?**

A: Migration duration varies significantly based on schema complexity, data volume, transformation requirements, and infrastructure. Factors include: document structure complexity, number of collections, required schema normalization, indexing strategy, and network bandwidth. Plan for thorough testing in a staging environment before estimating production timelines.

**Q: Can I migrate MongoDB to Tiger Data without downtime?**

A: Yes, using a [dual-write strategy](https://www.tigerdata.com/docs/migrate/latest/dual-write-and-backfill). Write to both databases simultaneously, backfill historical data, validate consistency, switch read traffic to Tiger Data, then decommission MongoDB. This requires application code changes to handle dual writes and conflict resolution.

**Q: How do I convert MongoDB documents to SQL tables?**

A: Three approaches: (1) Store entire documents in JSONB columns for rapid migration, (2) Convert to Hypertables for time-series data with automatic partitioning, (3) Normalize into relational tables with foreign keys for analytical workloads. Choose based on query patterns and data relationships.

**Q: Is Tiger Data faster than MongoDB for time-series data?**

A: For analytical aggregations on time-series data, Tiger Data typically outperforms MongoDB by [4-15x](https://www.tigerdata.com/blog/why-mongodb-is-an-architectural-dead-end#:~:text=4%E2%80%9315%C3%97%20higher%20throughput) due to columnar compression, automatic chunk exclusion, and SQL query optimization. MongoDB 8.0 improved time-series performance with block processing (200% faster aggregations), but PostgreSQL’s mature query planner excels at complex analytical queries. MongoDB may perform better for simple document retrieval and denormalized data access patterns.

**Q: What’s the storage difference between MongoDB and Tiger Data?**

A: Tiger Data’s columnar compression typically achieves [90-98% storage reduction](https://www.tigerdata.com/compression#:~:text=%20+90%25%20compression%20rates) compared to uncompressed PostgreSQL tables for time-series data. Real-world example: a 10TB MongoDB time-series collection often compresses to 1-2TB in Tiger Data while maintaining or improving query performance due to reduced I/O.

---

*About the author: Jakkie Koekemoer is the Engineering Manager at Draft.dev. With more than 20 years of experience as a developer, data engineer, and manager, he's committed to solving real problems and bringing the best out of people. He lives in Cape Town, South Africa. *