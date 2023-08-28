Tiger Cloud: Performance, Scale, Enterprise, Free Self-hosted products MST

The tiered storage architecture in Tiger Cloud includes a high-performance storage tier and a low-cost object storage tier. You use the high-performance tier for data that requires quick access, and the object tier for rarely used historical data. Tiering policies move older data asynchronously and periodically from high-performance to low-cost storage, sparing you the need to do it manually. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers.

Expand image

High-performance storage is where your data is stored by default, until you enable tiered storage and move older data to the low-cost tier. In the high-performance storage, your data is stored in the block format and optimized for frequent querying. The hypercore row-columnar storage engine available in this tier is designed specifically for real-time analytics. It enables you to compress the data in the high-performance storage by up to 90%, while improving performance. Coupled with other optimizations, Tiger Cloud high-performance storage makes sure your data is always accessible and your queries run at lightning speed.

Tiger Cloud high-performance storage comes in the following types:

Standard (default): based on AWS EBS gp3 and designed for general workloads. Provides up to 16 TB of storage and 16,000 IOPS.

(default): based on and designed for general workloads. Provides up to 16 TB of storage and 16,000 IOPS. Enhanced: based on EBS io2 and designed for high-scale, high-throughput workloads. Provides up to 64 TB of storage and 32,000 IOPS.

See the differences in the underlying AWS storage. You enable enhanced storage as needed in Tiger Console.

Tiger Cloud: Enterprise, Scale Self-hosted products MST

Once you enable tiered storage, you can start moving rarely used data to the object tier. The object tier is based on AWS S3 and stores your data in the Apache Parquet format. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together. The original size of the data in your service, compressed or uncompressed, does not correspond directly to its size in S3. A compressed hypertable may even take more space in S3 than it does in Tiger Cloud.

For low-cost storage, Tiger Data charges for data tiered based on its original uncompressed size in the high-performance storage tier. There are no additional expenses, such as data transfer or compute.

Note This feature is currently not supported for Tiger Cloud on Microsoft Azure.

Apache Parquet allows for more efficient scans across longer time periods, and Tiger Cloud uses other metadata and query optimizations to reduce the amount of data that needs to be fetched to satisfy a query, such as:

Chunk skipping : exclude the chunks that fall outside the query time window.

: exclude the chunks that fall outside the query time window. Row group skipping : identify the row groups within the Parquet object that satisfy the query.

: identify the row groups within the Parquet object that satisfy the query. Column skipping: fetch only columns that are requested by the query.

The following query is against a tiered dataset and illustrates the optimizations:

EXPLAIN ANALYZE SELECT count ( * ) FROM ( SELECT device_uuid , sensor_id FROM public . device_readings WHERE observed_at > '2023-08-28 00:00+00' and observed_at < '2023-08-29 00:00+00' GROUP BY device_uuid , sensor_id ) q ; QUERY PLAN Aggregate ( cost = 7277226.78 . .7277226 .79 rows = 1 width = 8 ) ( actual time = 234993.749 . .234993 .750 rows = 1 loops = 1 ) - > HashAggregate ( cost = 4929031.23 . .7177226 .78 rows = 8000000 width = 68 ) ( actual time = 184256.546 . .234913 .067 rows = 1651523 loops = 1 ) Group Key : osm_chunk_1 . device_uuid , osm_chunk_1 . sensor_id Planned Partitions: 128 Batches: 129 Memory Usage : 20497 kB Disk Usage : 4429832 kB - > Foreign Scan on osm_chunk_1 ( cost = 0.00 . .0 .00 rows = 92509677 width = 68 ) ( actual time = 345.890 . .128688 .459 rows = 92505457 loops = 1 ) Filter: ( ( observed_at > '2023-08-28 00:00:00+00' :: timestamp with time zone ) AND ( observed_at < '2023-08-29 00:00:00+00' :: timestamp with t ime zone ) ) Rows Removed by Filter: 4220 Match tiered objects: 3 Row Groups: _timescaledb_internal . _hyper_1_42_chunk: 0 - 74 _timescaledb_internal . _hyper_1_43_chunk: 0 - 29 _timescaledb_internal . _hyper_1_44_chunk: 0 - 71 S3 requests: 177 S3 data : 224423195 bytes Planning Time : 6.216 ms Execution Time : 235372.223 ms ( 16 rows ) Copy

EXPLAIN illustrates which chunks are being pulled in from the object storage tier:

Fetch data from chunks 42, 43, and 44 from the object storage tier. Skip row groups and limit the fetch to a subset of the offsets in the Parquet object that potentially match the query filter. Only fetch the data for device_uuid , sensor_id , and observed_at as the query needs only these 3 columns.

The object storage tier is more than an archiving solution. It is also:

Cost-effective: store high volumes of data at a lower cost. You pay only for what you store, with no extra cost for queries.

store high volumes of data at a lower cost. You pay only for what you store, with no extra cost for queries. Scalable: scale past the restrictions of even the enhanced high-performance storage tier.

scale past the restrictions of even the enhanced high-performance storage tier. Online: your data is always there and can be queried when needed.

By default, tiered data is not included when you query from a Tiger Cloud service. To access tiered data, you enable tiered reads for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both. You can JOIN against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well.

The low-cost storage tier comes with the following limitations:

Limited schema modifications : some schema modifications are not allowed on hypertables with tiered chunks. Allowed modifications include: renaming the hypertable, adding columns with NULL defaults, adding indexes, changing or renaming the hypertable schema, and adding CHECK constraints. For CHECK constraints, only untiered data is verified. Columns can also be deleted, but you cannot subsequently add a new column to a tiered hypertable with the same name as the now-deleted column. Disallowed modifications include: adding a column with non- NULL defaults, renaming a column, changing the data type of a column, and adding a NOT NULL constraint to the column.

Limited data changes : you cannot insert data into, update, or delete a tiered chunk. These limitations take effect as soon as the chunk is scheduled for tiering.

Inefficient query planner filtering for non-native data types: the query planner speeds up reads from our object storage tier by using metadata to filter out columns and row groups that don't satisfy the query. This works for all native data types, but not for non-native types, such as JSON , JSONB , and GIS .

Latency : S3 has higher access latency than local storage. This can affect the execution time of queries in latency-sensitive environments, especially lighter queries.

Number of dimensions: you cannot use tiered storage with hypertables partitioned on more than one dimension. Make sure your hypertables are partitioned on time only, before you enable tiered storage.

The typical workflow to use tiered storage in Tiger Cloud is: