---
title: Integrate data lakes with Tiger Cloud | Tiger Data Docs
description: Unify the Tiger Cloud operational architecture with data lake architectures
---

Tiger Cloud Iceberg connector enables you to build real-time applications alongside efficient data pipeline management within a single system.

![Tiger Lake architecture: Iceberg connector in Tiger Cloud](/docs/_astro/iceberg-connector-tiger-cloud.DVgGpPun.svg)

Tiger Cloud Iceberg connector is a native integration enabling synchronization between hypertables and relational tables running in Tiger Cloud services to Iceberg tables running in [Amazon S3 Tables](https://aws.amazon.com/s3/features/tables/) in your AWS account.

## Prerequisites

To follow the steps on this page:

- Create a target [Tiger Cloud service](/docs/get-started/quickstart/create-service/index.md) with the Real-time analytics capability.

  You need your [connection details](/docs/integrate/find-connection-details/index.md).

Note

This feature is currently not supported for Tiger Cloud on Microsoft Azure.

## Integrate a data lake with your service

To connect a Tiger Cloud service to your data lake:

- [AWS Management Console](#tab-panel-616)
- [AWS CloudFormation CLI](#tab-panel-617)
- [Manual configuration](#tab-panel-618)

1. **Set the AWS region to host your table bucket**

   1. In [AWS CloudFormation](https://console.aws.amazon.com/cloudformation/), select the current AWS region at the top-right of the page.
   2. Set it to the Region you want to create your table bucket in.

   **This must match the region your Tiger Cloud service is running in**: if the regions do not match AWS charges you for cross-region data transfer.

2. **Create your CloudFormation stack**

   1. Click `Create stack`, then select `With new resources (standard)`.

   2. In `Amazon S3 URL`, paste the following URL, then click `Next`.

      ```
      https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml
      ```

   3. In `Specify stack details`, enter the following details, then click `Next`:

      - `Stack Name`: a name for this CloudFormation stack
      - `BucketName`: a name for this S3 table bucket
      - `ProjectID` and `ServiceID`: enter the [connection details](/docs/integrate/find-connection-details#find-your-project-and-service-id/index.md) for your Tiger Cloud Iceberg connector service

   4. In `Configure stack options` check `I acknowledge that AWS CloudFormation might create IAM resources`, then click `Next`.

   5. In `Review and create`, click `Submit`, then wait for the deployment to complete. AWS deploys your stack and creates the S3 table bucket and IAM role.

   6. Click `Outputs`, then copy all four outputs.

3. **Connect your service to the data lake**

   1. In [Tiger Console](https://console.cloud.tigerdata.com/dashboard/services), select the service you want to integrate with AWS S3 Tables, then click `Connectors`.

   2. Select the Apache Iceberg connector and supply the:

      - ARN of the S3Table bucket
      - ARN of a role with permissions to write to the table bucket

   Provisioning takes a couple of minutes.

1) **Create your CloudFormation stack**

   Replace the following values in the command, then run it from the terminal:

   - `Region`: region of the S3 table bucket
   - `StackName`: the name for this CloudFormation stack
   - `BucketName`: the name of the S3 table bucket to create
   - `ProjectID`: enter your Tiger Cloud service [connection details](/docs/integrate/find-connection-details#find-your-project-and-service-id/index.md)
   - `ServiceID`: enter your Tiger Cloud service [connection details](/docs/integrate/find-connection-details#find-your-project-and-service-id/index.md)

   Terminal window

   ```
   aws cloudformation create-stack \
    --capabilities CAPABILITY_IAM \
    --template-url https://tigerlake.s3.us-east-1.amazonaws.com/tigerlake-connect-cloudformation.yaml \
    --region <Region> \
    --stack-name <StackName> \
    --parameters \
      ParameterKey=BucketName,ParameterValue="<BucketName>" \
      ParameterKey=ProjectID,ParameterValue="<ProjectID>" \
      ParameterKey=ServiceID,ParameterValue="<ServiceID>"
   ```

Setting up the integration through Tiger Console in Tiger Cloud, provides a convenient copy-paste option with the placeholders populated.

1. **Connect your service to the data lake**

   1. In [Tiger Console](https://console.cloud.tigerdata.com/dashboard/services), select the service you want to integrate with AWS S3 Tables, then click `Connectors`.

   2. Select the Apache Iceberg connector and supply the:

      - ARN of the S3Table bucket
      - ARN of a role with permissions to write to the table bucket

   Provisioning takes a couple of minutes.

1) **Create a S3 Bucket**

   1. Set the AWS region to host your table bucket

      1. In [Amazon S3 console](https://console.aws.amazon.com/s3/), select the current AWS region at the top-right of the page.
      2. Set it to the Region your you want to create your table bucket in.

      **This must match the region your Tiger Cloud service is running in**: if the regions do not match AWS charges you for cross-region data transfer.

   2. In the left navigation pane, click `Table buckets`, then click `Create table bucket`.

   3. Enter `Table bucket name`, then click `Create table bucket`.

   4. Copy the `Amazon Resource Name (ARN)` for your table bucket.

2) **Create an ARN role**

   1. In [IAM Dashboard](https://console.aws.amazon.com/iamv2/home), click `Roles` then click `Create role`

   2. In `Select trusted entity`, click `Custom trust policy`, replace the **Custom trust policy** code block with the following:

      ```
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "arn:aws:iam::142548018081:root"
                  },
                  "Action": "sts:AssumeRole",
                  "Condition": {
                      "StringEquals": {
                          "sts:ExternalId": "<ProjectID>/<ServiceID>"
                      }
                  }
              }
          ]
      }
      ```

      `"Principal": { "AWS": "arn:aws:iam::123456789012:root" }` does not mean `root` access. This delegates permissions to the entire AWS account, not just the root user.

   3. Replace `<ProjectID>` and `<ServiceID>` with the the [connection details](/docs/integrate/find-connection-details#find-your-project-and-service-id/index.md) for your Tiger Cloud Iceberg connector service, then click `Next`.

   4. In `Permissions policies`. click `Next`.

   5. In `Role details`, enter `Role name`, then click `Create role`.

   6. In `Roles`, select the role you just created, then click `Add Permissions` > `Create inline policy`.

   7. Select `JSON` then replace the `Policy editor` code block with the following:

      ```
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "BucketOps",
            "Effect": "Allow",
            "Action": [
              "s3tables:*"
            ],
            "Resource": "<S3TABLE_BUCKET_ARN>"
          },
          {
            "Sid": "BucketTableOps",
            "Effect": "Allow",
            "Action": [
              "s3tables:*"
            ],
            "Resource": "<S3TABLE_BUCKET_ARN>/table/*"
          }
        ]
      }
      ```

   8. Replace `<S3TABLE_BUCKET_ARN>` with the `Amazon Resource Name (ARN)` for the table bucket you just created.

   9. Click `Next`, then give the inline policy a name and click `Create policy`.

3) **Connect your service to the data lake**

   1. In [Tiger Console](https://console.cloud.tigerdata.com/dashboard/services), select the service you want to integrate with AWS S3 Tables, then click `Connectors`.

   2. Select the Apache Iceberg connector and supply the:

      - ARN of the S3Table bucket
      - ARN of a role with permissions to write to the table bucket

   Provisioning takes a couple of minutes.

## Stream data from your service to your data lake

Records are imported in time order, from oldest to newest. Your hypertable or relational table must have a primary key, or composite primary keys as a prerequisite to sync to Iceberg.

When you start syncing, all data in the table is streamed to Iceberg in the following processes:

- Table snapshot: stream data from a snapshot of the source table to the destination Iceberg table at approximately 300.000 records a second. For larger tables, import speeds are approximately 1 billion records or 100 GB of data an hour. However, these numbers vary on table width and the complexity of the schema.
- Table changes: stream changes made to the source table (CDC) after the snapshot is taken to a branch of the destination Iceberg table. This happens at approximately 30.000 events a second. Ingest bursts exceeding this can be handled for a certain amount of time and feathered out over time. This depends on duration of the ingestion burst, and the amount of extra events to be handled.

Once the snapshot is fully imported, the snapshot and CDC Iceberg table branches are merged. Merging takes from a couple of seconds, to ten minutes for larger tables of 5TB or more. During this time, new events are held on the WAL. Once the merge is completed, events in the WAL are CDC’d to Iceberg. This implies eventual consistency of the Iceberg table after you started the the sync.

To stream data from a PostgreSQL relational table, or a hypertable in your Tiger Cloud service to your data lake, run the following statement:

```
ALTER TABLE <table_name> SET (
  tigerlake.iceberg_sync = true | false,
  tigerlake.iceberg_partitionby = '<partition_specification>',
  tigerlake.iceberg_namespace = '<namespace>',
  tigerlake.iceberg_table = '<table>'
)
```

- `tigerlake.iceberg_sync`: `boolean`, set to `true` to start streaming, or `false` to stop the stream. A stream **cannot** resume after being stopped.
- `tigerlake.iceberg_partitionby`: optional property to define a partition specification in Iceberg. By default the Iceberg table is partitioned as `day(<time-column of {HYPERTABLE}>)`. This default behavior is only applicable to hypertables. For more information, see [partitioning](/docs/integrate/connectors/destination/tigerlake#partitioning-intervals/index.md).
- `tigerlake.iceberg_namespace`: optional property to set a namespace, the default is `timescaledb`.
- `tigerlake.iceberg_table`: optional property to specify a different table name. If no name is specified the PostgreSQL table name is used.

### Partitioning intervals

By default, the partition interval for an Iceberg table is one day(time-column) for a Hypertable. PostgreSQL table sync does not enable any partitioning in Iceberg for non-hypertables. You can set it using [tigerlake.iceberg\_partitionby](/docs/integrate/connectors/destination/tigerlake#sample-code/index.md). The following partition intervals and specifications are supported:

| Interval      | Description                                                                                            | Source types                       |
| ------------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------- |
| `hour`        | Extract a date or timestamp day, as days from epoch. Epoch is 1970-01-01.                              | `date`, `timestamp`, `timestamptz` |
| `day`         | Extract a date or timestamp day, as days from epoch.                                                   | `date`, `timestamp`, `timestamptz` |
| `month`       | Extract a date or timestamp day, as days from epoch.                                                   | `date`, `timestamp`, `timestamptz` |
| `year`        | Extract a date or timestamp day, as days from epoch.                                                   | `date`, `timestamp`, `timestamptz` |
| `truncate[W]` | Value truncated to width W, see [options](https://iceberg.apache.org/spec/#truncate-transform-details) |                                    |

These partitions define the behavior using the [Iceberg partition specification](https://iceberg.apache.org/spec/#partition-transforms).

### Sample code

The following samples show you how to tune data sync from a hypertable or a PostgreSQL relational table to your data lake:

- **Sync a hypertable with the default one-day partitioning interval on the `ts_column` column**

  To start syncing data from a hypertable to your data lake using the default one-day chunk interval as the partitioning scheme to the Iceberg table, run the following statement:

  ```
  ALTER TABLE my_hypertable SET (tigerlake.iceberg_sync = true);
  ```

  This is equivalent to `day(ts_column)`.

- **Specify a custom partitioning scheme for a hypertable**

  You use the `tigerlake.iceberg_partitionby` property to specify a different partitioning scheme for the Iceberg table at sync start. For example, to enforce an hourly partition scheme from the chunks on `ts_column` on a hypertable, run the following statement:

  ```
  ALTER TABLE my_hypertable SET (
    tigerlake.iceberg_sync = true,
    tigerlake.iceberg_partitionby = 'hour(ts_column)'
  );
  ```

- **Set the partition to sync relational tables**

  PostgreSQL relational tables do not forward a partitioning scheme to Iceberg, you must specify the partitioning scheme using `tigerlake.iceberg_partitionby` when you start the sync. For example, for a standard PostgreSQL table to sync to the Iceberg table with daily partitioning , run the following statement:

  ```
  ALTER TABLE my_postgres_table SET (
    tigerlake.iceberg_sync = true,
    tigerlake.iceberg_partitionby = 'day(timestamp_col)'
  );
  ```

- **Stop sync to an Iceberg table for a hypertable or a PostgreSQL relational table**

  ```
  ALTER TABLE my_hypertable SET (tigerlake.iceberg_sync = false);
  ```

- **Update or add the partitioning scheme of an Iceberg table**

  To change the partitioning scheme of an Iceberg table, you specify the desired partitioning scheme using the `tigerlake.iceberg_partitionby` property. For example. if the `samples` table has an hourly (`hour(ts)`) partition on the `ts` timestamp column, to change to daily partitioning, call the following statement:

  ```
  ALTER TABLE samples SET (tigerlake.iceberg_partitionby = 'day(ts)');
  ```

  This statement is also correct for Iceberg tables without a partitioning scheme. When you change the partition, you **do not** have to pause the sync to Iceberg. Apache Iceberg handles the partitioning operation in function of the internal implementation.

**Specify a different namespace**

By default, tables are created in the the `timescaledb` namespace. To specify a different namespace when you start the sync, use the `tigerlake.iceberg_namespace` property. For example:

```
ALTER TABLE my_hypertable SET (
  tigerlake.iceberg_sync = true,
  tigerlake.iceberg_namespace = 'my_namespace'
);
```

**Specify a different Iceberg table name**

The table name in Iceberg is the same as the source table in Tiger Cloud. Some services do not allow mixed case, or have other constraints for table names. To define a different table name for the Iceberg table at sync start, use the `tigerlake.iceberg_table` property. For example:

```
ALTER TABLE Mixed_CASE_TableNAME SET (
  tigerlake.iceberg_sync = true,
  tigerlake.iceberg_table = 'my_table_name'
);
```

## Limitations

- Service requires PostgreSQL 17.6 and above is supported.
- [Amazon S3 Tables Iceberg REST](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html) catalog only is supported.
- In order to collect deletes made to data in the columstore, certain columnstore optimizations are disabled for hypertables, this includes [Direct Compress](/docs/learn/hypertables/optimize-data-in-hypertables#speed-up-data-ingestion/index.md).
- The `TRUNCATE` statement is not supported, and does not truncate data in the corresponding Iceberg table.
- Data in a hypertable that has been moved to the [low-cost object storage tier](/docs/learn/data-lifecycle/storage/about-storage-tiers/index.md) is not synced.
- Writing to the same S3 table bucket from multiple services is not supported, bucket-to-service mapping is one-to-one.
- Iceberg snapshots are pruned automatically if the amount exceeds 2500.
- A hypertable with long running continuous aggregates refresh transactions, plus 30 minutes, can cause issues with holding the replication slot too long. Please consider batching in these cases.
