---
title: Migrate from PostgreSQL using dual-write and backfill | Tiger Data Docs
description: Migrate from a PostgreSQL database to Tiger Cloud using the low-downtime dual-write and backfill method
---

This document provides detailed step-by-step instructions to migrate data using the [dual-write and backfill](/docs/migrate/dual-write-and-backfill/index.md) migration method from a source database which is using PostgreSQL to Tiger Cloud.

Note

In the context of migrations, your existing production database is referred to as the SOURCE database, the Tiger Cloud service that you are migrating your data to is the TARGET.

In detail, the migration process consists of the following steps:

1. Set up a target Tiger Cloud service.
2. Modify the application to write to the target database.
3. Migrate schema and relational data from source to target.
4. Start the application in dual-write mode.
5. Determine the completion point `T`.
6. Backfill time-series data from source to target.
7. Validate that all data is present in target database.
8. Validate that target database can handle production load.
9. Switch application to treat target database as primary (potentially continuing to write into source database, as a backup).

Note

If you get stuck, you can get help by either opening a support request, or take your issue to the `#migration` channel in the [community slack](https://slack.timescale.com/), where the developers of this migration method are there to help.

You can open a support request directly from [Tiger Console](https://console.cloud.tigerdata.com/dashboard/support/cases), or by email to <support@tigerdata.com>.

## 1. Set up a target database instance in Tiger Cloud

[Create a Tiger Cloud service](/docs/get-started/quickstart/create-service/index.md).

If you intend on migrating more than 400 GB, open a support request to ensure that enough disk is pre-provisioned on your Tiger Cloud service.

You can open a support request directly from [Tiger Console](https://console.cloud.tigerdata.com/dashboard/support/cases), or by email to <support@tigerdata.com>.

## 2. Modify the application to write to the target database

How exactly to do this is dependent on the language that your application is written in, and on how exactly your ingestion and application function. In the simplest case, you simply execute two inserts in parallel. In the general case, you must think about how to handle the failure to write to either the source or target database, and what mechanism you want to or can build to recover from such a failure.

Should your time-series data have foreign-key references into a plain table, you must ensure that your application correctly maintains the foreign key relations. If the referenced column is a `*SERIAL` type, the same row inserted into the source and target *may not* obtain the same autogenerated id. If this happens, the data backfilled from the source to the target is internally inconsistent. In the best case it causes a foreign key violation, in the worst case, the foreign key constraint is maintained, but the data references the wrong foreign key. To avoid these issues, best practice is to follow [live migration](/docs/migrate/live-migration/index.md).

You may also want to execute the same read queries on the source and target database to evaluate the correctness and performance of the results which the queries deliver. Bear in mind that the target database spends a certain amount of time without all data being present, so you should expect that the results are not the same for some period (potentially a number of days).

## 3. set up schema and migrate relational data to target database

You would probably like to convert some of your large tables which contain time-series data into hypertables. This step consists of identifying those tables, excluding their data from the database dump, copying the database schema and tables, and setting up the time-series tables as hypertables. The data is backfilled into these hypertables in a subsequent step.

Note

For the sake of convenience, connection strings to the source and target databases are referred to as `$SOURCE` and `$TARGET` throughout this guide.

This can be set in your shell, for example:

Terminal window

```
export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
export TARGET="postgres://<user>:<password>@<target host>:<target port>/<db_name>"
```

### 3a. dump the database roles from the source database

Terminal window

```
pg_dumpall -d "$SOURCE" \
  -l $DB_NAME \
  --quote-all-identifiers \
  --roles-only \
  --file=roles.sql
```

Tiger Cloud services do not support roles with superuser access. If your SQL dump includes roles that have such permissions, you’ll need to modify the file to be compliant with the security model.

You can use the following `sed` command to remove unsupported statements and permissions from your roles.sql file:

Terminal window

```
sed -i -E \
-e '/CREATE ROLE "postgres";/d' \
-e '/ALTER ROLE "postgres"/d' \
-e '/CREATE ROLE "tsdbadmin";/d' \
-e '/ALTER ROLE "tsdbadmin"/d' \
-e 's/(NO)*SUPERUSER//g' \
-e 's/(NO)*REPLICATION//g' \
-e 's/(NO)*BYPASSRLS//g' \
-e 's/GRANTED BY "[^"]*"//g' \
roles.sql
```

Note

This command works only with the GNU implementation of sed (sometimes referred to as gsed). For the BSD implementation (the default on macOS), you need to add an extra argument to change the `-i` flag to `-i ''`.

To check the sed version, you can use the command `sed --version`. While the GNU version explicitly identifies itself as GNU, the BSD version of sed generally doesn’t provide a straightforward —version flag and simply outputs an “illegal option” error.

A brief explanation of this script is:

- `CREATE ROLE "postgres"`; and `ALTER ROLE "postgres"`: These statements are removed because they require superuser access, which is not supported by Timescale.

- `(NO)SUPERUSER` | `(NO)REPLICATION` | `(NO)BYPASSRLS`: These are permissions that require superuser access.

- `GRANTED BY role_specification`: The GRANTED BY clause can also have permissions that require superuser access and should therefore be removed. Note: according to the TimescaleDB documentation, the GRANTOR in the GRANTED BY clause must be the current user, and this clause mainly serves the purpose of SQL compatibility. Therefore, it’s safe to remove it.

### 3b. determine which tables to convert to hypertables

Ideal candidates for hypertables are large tables containing time-series data. This is usually data with some form of timestamp value (`TIMESTAMPTZ`, `TIMESTAMP`, `BIGINT`, `INT` etc.) as the primary dimension, and some other measurement values.

### 3c. dump all tables from the source database, excluding data from hypertable candidates

```
pg_dump -d "$SOURCE" \
  --format=plain \
  --quote-all-identifiers \
  --no-tablespaces \
  --no-owner \
  --no-privileges \
  --exclude-table-data=<table name or pattern> \
  --file=dump.sql
```

- `--exclude-table-data` is used to exclude all data from hypertable candidates. You can either specify a table pattern, or specify `--exclude-table-data` multiple times, once for each table to be converted.

* `--no-tablespaces` is required because Tiger Cloud does not support tablespaces other than the default. This is a known limitation.

* `--no-owner` is required because Tiger Cloud‘s `tsdbadmin` user is not a superuser and cannot assign ownership in all cases. This flag means that everything is owned by the user used to connect to the target, regardless of ownership in the source. This is a known limitation.

* `--no-privileges` is required because the `tsdbadmin` user for your Tiger Cloud service is not a superuser and cannot assign privileges in all cases. This flag means that privileges assigned to other users must be reassigned in the target database as a manual clean-up task. This is a known limitation.

### 3d. load the roles and schema into the target database

```
psql -X -d "$TARGET" \
  -v ON_ERROR_STOP=1 \
  --echo-errors \
  -f roles.sql \
  -f dump.sql
```

### 3e. convert the plain tables to hypertables, optionally compress data in the columnstore

For each table which should be converted to a hypertable in the target database, execute:

```
SELECT create_hypertable('<table name>', by_range('<time column name>'));
```

Tips

The `by_range` dimension builder is an addition to TimescaleDB 2.13. For simpler cases, like this one, you can also create the hypertable using the old syntax:

```
SELECT create_hypertable('<table name>', '<time column name>');
```

For more information about the options which you can pass to `create_hypertable`, consult the [create\_table API reference](/docs/reference/timescaledb/hypertables/create_hypertable/index.md). For more information about hypertables in general, consult the [hypertable documentation](/docs/learn/hypertables/understand-hypertables/index.md).

You may also wish to consider taking advantage of some of Tiger Cloud‘s killer features, such as:

- [Retention policies](/docs/learn/data-lifecycle/data-retention/about-data-retention/index.md) to automatically drop unneeded data
- [Tiered storage](/docs/build/data-management/storage/manage-storage/index.md) to automatically move data to Tiger Cloud‘s low-cost bottomless object storage tier
- [Hypercore](/docs/learn/columnar-storage/understand-hypercore/index.md) to reduce the size of your hypertables by compressing data in the columnstore
- [Continuous aggregates](/docs/build/continuous-aggregates/create-a-continuous-aggregate/index.md) to write blisteringly fast aggregate queries on your data

## 4. Start application in dual-write mode

With the target database set up, your application can now be started in dual-write mode.

## 5. Determine the completion point `T`

After dual-writes have been executing for a while, the target hypertable contains data in three time ranges: missing writes, late-arriving data, and the “consistency” range

![Hypertable dual-write time ranges: missing writes, late-arriving data, and consistency range](/docs/_astro/hypertable_backfill_consistency.DVwajHFy_Z1veL2z.webp)

### Missing writes

If the application is made up of multiple writers, and these writers did not all simultaneously start writing into the target hypertable, there is a period of time in which not all writes have made it into the target hypertable. This period starts when the first writer begins dual-writing, and ends when the last writer begins dual-writing.

### Late-arriving data

Some applications have late-arriving data: measurements which have a timestamp in the past, but which weren’t written yet (for example from devices which had intermittent connectivity issues). The window of late-arriving data is between the present moment, and the maximum lateness.

### Consistency range

The consistency range is the range in which there are no missing writes, and in which all data has arrived, that is between the end of the missing writes range and the beginning of the late-arriving data range.

The length of these ranges is defined by the properties of the application, there is no one-size-fits-all way to determine what they are.

### Completion point

The completion point `T` is an arbitrarily chosen time in the consistency range. It is the point in time to which data can safely be backfilled, ensuring that there is no data loss.

The completion point should be expressed as the type of the `time` column of the hypertables to be backfilled. For instance, if you’re using a `TIMESTAMPTZ` `time` column, then the completion point may be `2023-08-10T12:00:00.00Z`. If you’re using a `BIGINT` column it may be `1695036737000`.

If you are using a mix of types for the `time` columns of your hypertables, you must determine the completion point for each type individually, and backfill each set of hypertables with the same type independently from those of other types.

## 6. backfill data from source to target

Dump the data from your source database on a per-table basis into CSV format, and restore those CSVs into the target database using the `timescaledb-parallel-copy` tool.

### 6a. Determine the time range of data to be copied

Determine the window of data that to be copied from the source database to the target. Depending on the volume of data in the source table, it may be sensible to split the source table into multiple chunks of data to move independently. In the following steps, this time range is called `<start>` and `<end>`.

Usually the `time` column is of type `timestamp with time zone`, so the values of `<start>` and `<end>` must be something like `2023-08-01T00:00:00Z`. If the `time` column is not a `timestamp with time zone` then the values of `<start>` and `<end>` must be the correct type for the column.

If you intend to copy all historic data from the source table, then the value of `<start>` can be `'-infinity'`, and the `<end>` value is the value of the completion point `T` that you determined.

### 6b. Remove overlapping data in the target

The dual-write process may have already written data into the target database in the time range that you want to move. In this case, the dual-written data must be removed. This can be achieved with a `DELETE` statement, as follows:

Terminal window

```
psql $TARGET -c "DELETE FROM <hypertable> WHERE time >= <start> AND time < <end>);"
```

Tips

The BETWEEN operator is inclusive of both the start and end ranges, so it is not recommended to use it.

### 6c. copy the data with a streaming copy

Execute the following command, replacing `<source table>` and `<hypertable>` with the fully qualified names of the source table and target hypertable respectively:

Terminal window

```
psql $SOURCE -f - <<EOF
  \copy ( \
      SELECT * FROM <source table> WHERE time >= <start> AND time < <end> \
    ) TO stdout WITH (format CSV);" | timescaledb-parallel-copy \
  --connection $TARGET \
  --table <hypertable> \
  --log-batches \
  --batch-size=1000 \
  --workers=4
EOF
```

The above command is not transactional. If there is a connection issue, or some other issue which causes it to stop copying, the partially copied rows must be removed from the target (using the instructions in step 6b above), and then the copy can be restarted.

### 6e. Enable policies that compress data in the target hypertable

In the following command, replace `<hypertable>` with the fully qualified table name of the target hypertable, for example `public.metrics`:

Terminal window

```
psql -d $TARGET -f -v hypertable=<hypertable> - <<'EOF'
SELECT public.alter_job(j.id, scheduled=>true)
FROM _timescaledb_config.bgw_job j
JOIN _timescaledb_catalog.hypertable h ON h.id = j.hypertable_id
WHERE j.proc_schema IN ('_timescaledb_internal', '_timescaledb_functions')
  AND j.proc_name = 'policy_compression'
  AND j.id >= 1000
  AND format('%I.%I', h.schema_name, h.table_name)::text::regclass = :'hypertable'::text::regclass;
EOF
```

## 7. validate that all data is present in target database

Now that all data has been backfilled, and the application is writing data to both databases, the contents of both databases should be the same. How exactly this should best be validated is dependent on your application.

If you are reading from both databases in parallel for every production query, you could consider adding an application-level validation that both databases are returning the same data.

Another option is to compare the number of rows in the source and target tables, although this reads all data in the table which may have an impact on your production workload.

Another option is to run `ANALYZE` on both the source and target tables and then look at the `reltuples` column of the `pg_class` table. This is not exact, but doesn’t require reading all rows from the table. Note: for hypertables, the reltuples value belongs to the chunk table, so you must take the sum of `reltuples` for all chunks belonging to the hypertable. If the chunk is compressed in one database, but not the other, then this check cannot be used.

## 8. validate that target database can handle production load

Now that dual-writes have been in place for a while, the target database should be holding up to production write traffic. Now would be the right time to determine if the target database can serve all production traffic (both reads *and* writes). How exactly this is done is application-specific and up to you to determine.

## 9. switch production workload to target database

Once you’ve validated that all the data is present, and that the target database can handle the production workload, the final step is to switch to the target database as your primary. You may want to continue writing to the source database for a period, until you are certain that the target database is holding up to all production traffic.
