---
title: Improve query and upsert performance | Tiger Data Docs
description: Use segmenting and ordering data in the columnstore to make lookup queries and upserts faster
---

Wide analytical scans are only half the story; many workloads still need fast point lookups, uniqueness checks, and upserts on compressed data. Segmenting and ordering columnstore data is how you tune those paths without giving up analytics performance.

To improve query performance using indexes, see [About indexes](/docs/learn/data-model/understand-database-schemas/index.md) and [Indexing data](/docs/build/performance-optimization/indexing/index.md).

## Segmenting and ordering data

To optimize query performance, TimescaleDB enables you to explicitly control the way your data is physically organized in the columnstore. By structuring data effectively, queries can minimize disk reads and execute more efficiently, using vectorized execution for parallel batch processing where possible.

![Columnstore data organized with SEGMENTBY for query optimization](/docs/_astro/columnstore-segmentby.CXE-pjSO_ZuA6Es.webp)

- **Group related data together to improve scan efficiency**: organizing rows into logical segments ensures that queries filtering by a specific value only scan relevant data sections. For example, in the above, querying for a specific ID is particularly fast.
- **Sort data within segments to accelerate range queries**: defining a consistent order reduces the need for post-query sorting, making time-based queries and range scans more efficient.
- **Reduce disk reads and maximize vectorized execution**: a well-structured storage layout enables efficient batch processing (Single Instruction, Multiple Data, or SIMD vectorization) and parallel execution, optimizing query performance.

By combining segmentation and ordering, TimescaleDB ensures that columnar queries are not only fast but also resource-efficient, enabling high-performance real-time analytics.

### Improve performance in the columnstore by segmenting and ordering data

Ordering data in the columnstore has a large impact on the compression ratio and performance of your queries. Rows that change over a dimension should be close to each other. As hypertables contain time-series data, they are partitioned by time. This makes the time column a perfect candidate for ordering your data since the measurements evolve as time goes on.

If you use `orderby` as your only columnstore setting, you get a good enough compression ratio to save a lot of storage and your queries are faster. However, if you only use `orderby`, you always have to access your data using the time dimension, then filter the rows returned on other criteria.

Accessing the data effectively depends on your use case and your queries. You segment data in the columnstore to match the way you want to access it. That is, in a way that makes it easier for your queries to fetch the right data at the right time. When you segment your data to access specific columns, your queries are optimized and yield even better performance.

For example, to access information about a single device with a specific `device_id`, you segment on the `device_id` column. This enables you to run analytical queries on compressed data in the columnstore much faster.

To illustrate, run the same query on a hypertable, first without, then with optimizations:

1. **Create a hypertable**

   Create a `metrics` hypertable with the following command:

   ```
   CREATE TABLE metrics (
     time TIMESTAMPTZ,
     user_id INT,
     device_id INT,
     cpu DOUBLE PRECISION,
     disk_io DOUBLE PRECISION
   ) WITH (
     tsdb.hypertable
   );
   ```

   When you create a hypertable using [CREATE TABLE … WITH …](/docs/reference/timescaledb/hypertables/create_table/index.md), the default partitioning column is automatically the first column with a timestamp data type. Also, TimescaleDB creates a [columnstore policy](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md) that automatically converts your data to the columnstore, after an interval equal to the value of the [chunk\_interval](/docs/reference/timescaledb/hypertables/set_chunk_time_interval/index.md), defined through `after` in the policy. This columnar format enables fast scanning and aggregation, optimizing performance for analytical workloads while also saving significant storage space. In the columnstore conversion, hypertable chunks are compressed by up to 98%, and organized for efficient, large-scale queries.

   You can customize this policy later using [alter\_job](/docs/reference/timescaledb/jobs-automation/alter_job/index.md). However, to change `after` or `created_before`, the compression settings, or the hypertable the policy is acting on, you must [remove the columnstore policy](/docs/reference/timescaledb/hypercore/remove_columnstore_policy/index.md) and [add a new one](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md).

   You can also manually [convert chunks](/docs/reference/timescaledb/hypercore/convert_to_columnstore/index.md) in a hypertable to the columnstore.

2. **Execute a query on the hypertable without optimizations**

   ```
   SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io
   FROM metrics
   WHERE time >= '2024-03-01 00:00:00+01'
     AND time < '2024-03-02 00:00:00+01'
     AND device_id = 5
   GROUP BY device_id;
   ```

   Gives the following result:

   ```
   device_id |      avg_cpu       |     avg_disk_io
   -----------+--------------------+---------------------
   5 | 0.4954351575883885 | 0.49725603413909114
   (1 row)
   Time: 29.216 ms
   ```

3. **Execute a query on the same data segmented and ordered in the columnstore**

   1. Control the way your data is ordered and segmented in the columnstore:
      ```
      ALTER TABLE metrics SET (
      timescaledb.orderby = 'time',
      timescaledb.segmentby = 'device_id'
      );
      ```
   2. Query your data:
      ```
      SELECT device_id, AVG(cpu) AS avg_cpu, AVG(disk_io) AS avg_disk_io
      FROM metrics
      WHERE time >= '2024-03-01 00:00:00+01'
      AND time < '2024-03-02 00:00:00+01'
      AND device_id = 5
      GROUP BY device_id;
      ```

   Gives the following result:

   ```
   device_id |      avg_cpu       |     avg_disk_io
   -----------+--------------------+---------------------
   5 | 0.4954351575883885 | 0.49725603413909114
   (1 row)
   Time: 1.828 ms
   ```

   As you see, using `orderby` and `segmentby` not only reduces the amount of space taken by your data, but also vastly improves query speed.

The number of rows that are compressed together in a single batch (like the ones we see above) is 1000. If your chunk does not contain enough data to create big enough batches, your compression ratio will be reduced. This needs to be taken into account when you define your columnstore settings.
