---
title: Analyze transport and geospatial data | Tiger Data Docs
description: Simulate and analyze a transport dataset in Tiger Cloud
---

This walkthrough uses NYC taxi-style geospatial time series in Tiger Cloud service, then layers Grafana on top so you can see movement and hotspots as they change instead of after the fact.

![Grafana heatmap showing NYC taxi trip pickup locations over time](/docs/_astro/use-case-rta-grafana-heatmap.Brk4prT1_ZqnYIW.webp)

This page shows you how to integrate [Grafana](https://grafana.com/docs/) with a Tiger Cloud service and make insights based on visualization of data optimized for size and speed in the columnstore.

## Prerequisites for this tutorial

To follow the procedure on this page you need to:

- Create a [target Tiger Cloud service](/docs/get-started/quickstart/create-service/index.md).

  This procedure also works for [self-hosted TimescaleDB](/docs/get-started/choose-your-path/install-timescaledb/index.md).

* Install and run [self-managed Grafana](https://grafana.com/get/?tab=self-managed), or sign up for [Grafana Cloud](https://grafana.com/get/).

## Optimize time-series data in hypertables

Hypertables are PostgreSQL tables in TimescaleDB that automatically partition your time-series data by time. Time-series data represents the way a system, process, or behavior changes over time. Hypertables enable TimescaleDB to work efficiently with time-series data. Each hypertable is made up of child tables called chunks. Each chunk is assigned a range of time, and only contains data from that range. When you run a query, TimescaleDB identifies the correct chunk and runs the query on it, instead of going through the entire table.

[Hypercore](/docs/learn/columnar-storage/understand-hypercore/index.md) is the hybrid row-columnar storage engine in TimescaleDB used by hypertables. Traditional databases force a trade-off between fast inserts (row-based storage) and efficient analytics (columnar storage). Hypercore eliminates this trade-off, allowing real-time analytics without sacrificing transactional capabilities.

Hypercore dynamically stores data in the most efficient format for its lifecycle:

![Move from rowstore to columstore in hypercore](/docs/_astro/hypercore_intro.DutS1jP2.svg)

- **Row-based storage for recent data**: the most recent chunk (and possibly more) is always stored in the rowstore, ensuring fast inserts, updates, and low-latency single record queries. Additionally, row-based storage is used as a writethrough for inserts and updates to columnar storage.
- **Columnar storage for analytical performance**: chunks are automatically compressed into the columnstore, optimizing storage efficiency and accelerating analytical queries.

Unlike traditional columnar databases, hypercore allows data to be inserted or modified at any stage, making it a flexible solution for both high-ingest transactional workloads and real-time analytics, within a single database.

Because TimescaleDB is 100% PostgreSQL, you can use all the standard PostgreSQL tables, indexes, stored procedures, and other objects alongside your hypertables. This makes creating and working with hypertables similar to standard PostgreSQL.

1. **Import time-series data into a hypertable**

   1. Unzip [nyc\_data.tar.gz](https://assets.timescale.com/docs/downloads/nyc_data.tar.gz) to a `<local folder>`.

      This test dataset contains historical data from New York’s yellow taxi network.

      To import up to 100GB of data directly from your current PostgreSQL-based database, [migrate with downtime](/docs/migrate/migrate-with-downtime/index.md) using native PostgreSQL tooling. To seamlessly import 100GB-10TB+ of data, use the [live migration](/docs/migrate/live-migration/index.md) tooling supplied by Tiger Data. To add data from non-PostgreSQL data sources, see [Import and ingest data](/docs/migrate/import-terminal/index.md).

   2. In Terminal, navigate to `<local folder>` and update the following string with [your connection details](/docs/integrate/find-connection-details/index.md) to connect to your service.

      Terminal window

      ```
      psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>?sslmode=require"
      ```

   3. Create an optimized hypertable for your time-series data:

      1. Create a [hypertable](/docs/learn/hypertables/understand-hypertables/index.md) with [hypercore](/docs/learn/columnar-storage/understand-hypercore/index.md) enabled by default for your time-series data using [CREATE TABLE](/docs/reference/timescaledb/index.md). For [efficient queries](/docs/build/performance-optimization/secondary-indexes/index.md) on data in the columnstore, remember to `segmentby` the column you will use most often to filter your data.

         In your SQL client, run the following command:

         ```
         CREATE TABLE "rides"(
             vendor_id TEXT,
             pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
             dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
             passenger_count NUMERIC,
             trip_distance NUMERIC,
             pickup_longitude  NUMERIC,
             pickup_latitude   NUMERIC,
             rate_code         INTEGER,
             dropoff_longitude NUMERIC,
             dropoff_latitude  NUMERIC,
             payment_type INTEGER,
             fare_amount NUMERIC,
             extra NUMERIC,
             mta_tax NUMERIC,
             tip_amount NUMERIC,
             tolls_amount NUMERIC,
             improvement_surcharge NUMERIC,
             total_amount NUMERIC
         ) WITH (
             tsdb.hypertable,
             tsdb.create_default_indexes=false,
             tsdb.segmentby='vendor_id',
             tsdb.orderby='pickup_datetime DESC'
         );
         ```

         When you create a hypertable using [CREATE TABLE … WITH …](/docs/reference/timescaledb/hypertables/create_table/index.md), the default partitioning column is automatically the first column with a timestamp data type. Also, TimescaleDB creates a [columnstore policy](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md) that automatically converts your data to the columnstore, after an interval equal to the value of the [chunk\_interval](/docs/reference/timescaledb/hypertables/set_chunk_time_interval/index.md), defined through `after` in the policy. This columnar format enables fast scanning and aggregation, optimizing performance for analytical workloads while also saving significant storage space. In the columnstore conversion, hypertable chunks are compressed by up to 98%, and organized for efficient, large-scale queries.

         You can customize this policy later using [alter\_job](/docs/reference/timescaledb/jobs-automation/alter_job/index.md). However, to change `after` or `created_before`, the compression settings, or the hypertable the policy is acting on, you must [remove the columnstore policy](/docs/reference/timescaledb/hypercore/remove_columnstore_policy/index.md) and [add a new one](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md).

         You can also manually [convert chunks](/docs/reference/timescaledb/hypercore/convert_to_columnstore/index.md) in a hypertable to the columnstore.

      2. Add another dimension to partition your hypertable more efficiently:

         ```
         SELECT add_dimension('rides', by_hash('payment_type', 2));
         ```

      3. Create an index to support efficient queries by vendor, rate code, and passenger count:

         ```
         CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);
         CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
         CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);
         ```

   4. Create PostgreSQL tables for relational data:

      1. Add a table to store the payment types data:

         ```
         CREATE TABLE IF NOT EXISTS "payment_types"(
             payment_type INTEGER,
             description TEXT
         );
         INSERT INTO payment_types(payment_type, description) VALUES
         (1, 'credit card'),
         (2, 'cash'),
         (3, 'no charge'),
         (4, 'dispute'),
         (5, 'unknown'),
         (6, 'voided trip');
         ```

      2. Add a table to store the rates data:

         ```
         CREATE TABLE IF NOT EXISTS "rates"(
             rate_code   INTEGER,
             description TEXT
         );
         INSERT INTO rates(rate_code, description) VALUES
         (1, 'standard rate'),
         (2, 'JFK'),
         (3, 'Newark'),
         (4, 'Nassau or Westchester'),
         (5, 'negotiated fare'),
         (6, 'group ride');
         ```

   5. Upload the dataset to your service:

      ```
      \COPY rides FROM nyc_data_rides.csv CSV;
      ```

2. **Have a quick look at your data**

   You query hypertables in exactly the same way as you would a relational PostgreSQL table. Use one of the following SQL editors to run a query and see the data you uploaded:

   - **Data view**: write queries, visualize data, and share your results in [Tiger Console](https://console.cloud.tigerdata.com/dashboard/services?popsql) for all your Tiger Cloud services.
   - **SQL editor**: write, fix, and organize SQL faster and more accurately in [Tiger Console](https://console.cloud.tigerdata.com/dashboard/services) for a Tiger Cloud service.
   - **psql**: easily run queries on your Tiger Cloud services or self-hosted TimescaleDB deployment from Terminal.

   For example:

   - Display the number of rides for each fare type:

     ```
     SELECT rate_code, COUNT(vendor_id) AS num_trips
     FROM rides
     WHERE pickup_datetime < '2016-01-08'
     GROUP BY rate_code
     ORDER BY rate_code;
     ```

     This simple query runs in 3 seconds. You see something like:

     | rate\_code | num\_trips |
     | ---------- | ---------- |
     | 1          | 2266401    |
     | 2          | 54832      |
     | 3          | 4126       |
     | 4          | 967        |
     | 5          | 7193       |
     | 6          | 17         |
     | 99         | 42         |

   - To select all rides taken in the first week of January 2016, and return the total number of trips taken for each rate code:

     ```
     SELECT rates.description, COUNT(vendor_id) AS num_trips
     FROM rides
     JOIN rates ON rides.rate_code = rates.rate_code
     WHERE pickup_datetime < '2016-01-08'
     GROUP BY rates.description
     ORDER BY LOWER(rates.description);
     ```

     On this large amount of data, this analytical query on data in the rowstore takes about 59 seconds. You see something like:

     | description           | num\_trips |
     | --------------------- | ---------- |
     | group ride            | 17         |
     | JFK                   | 54832      |
     | Nassau or Westchester | 967        |
     | negotiated fare       | 7193       |
     | Newark                | 4126       |
     | standard rate         | 2266401    |

## Connect Grafana to Tiger Cloud

To visualize the results of your queries, enable Grafana to read the data in your service:

1. **Log in to Grafana**

   In your browser, log in to either:

   - Self-hosted Grafana: at `http://localhost:3000/`. The default credentials are `admin`, `admin`.
   - Grafana Cloud: use the URL and credentials you set when you created your account.

2. **Add your service as a data source**

   1. Open `Connections` > `Data sources`, then click `Add new data source`.

   2. Select `{C.PG}` from the list.

   3. Configure the connection:

      - `Host URL`, `Database name`, `Username`, and `Password`, configure using your [connection details](/docs/integrate/find-connection-details/index.md). `Host URL` is in the format `<host>:<port>`.
      - `TLS/SSL Mode`: select `require`.
      - `{C.PG} options`: enable `TimescaleDB`.
      - Leave the default setting for all other fields.

   4. Click `Save & test`.

      Grafana checks that your details are set correctly.

## Monitor performance over time

A Grafana dashboard represents a view into the performance of a system, and each dashboard consists of one or more panels, which represent information about a specific metric related to that system.

To visually monitor the volume of taxi rides over time:

1. **Create the dashboard**

   1. On the `Dashboards` page, click `New` and select `New dashboard`.

   2. Click `Add visualization`.

   3. Select the data source that connects to your Tiger Cloud service. The `Time series` visualization is chosen by default.

      ![Configuring a Grafana dashboard with a TimescaleDB data source](/docs/_astro/use-case-rta-grafana-timescale-configure-dashboard.PNpiH0JO_204Gcy.webp)

   4. In the `Queries` section, select `Code`, then select `Time series` in `Format`.

   5. Select the data range for your visualization: the data set is from 2016. Click the date range above the panel and set:

      - From: `2016-01-01 01:00:00`
      - To: `2016-01-30 01:00:00`

2. **Combine TimescaleDB and Grafana functionality to analyze your data**

   Combine a TimescaleDB [time\_bucket](/docs/learn/data-lifecycle/time-buckets/about-time-buckets/index.md), with the Grafana `$__timefilter()` function to set the `pickup_datetime` column as the filtering range for your visualizations.

   ```
   SELECT
     time_bucket('1 day', pickup_datetime) AS "time",
     COUNT(*)
   FROM rides
   WHERE $__timeFilter(pickup_datetime)
   GROUP BY time
   ORDER BY time;
   ```

   This query groups the results by day and orders them by time.

   ![Completed Grafana dashboard showing NYC taxi ride count and geospatial data](/docs/_astro/use-case-rta-grafana-timescale-final-dashboard.CXJQRRVZ_oS2UG.webp)

3. **Click `Save dashboard`**

## Optimize revenue potential

Having all this data is great but how do you use it? Monitoring data is useful to check what has happened, but how can you analyse this information to your advantage? This section explains how to create a visualization that shows how you can maximize potential revenue.

### Set up your data for geospatial queries

To add geospatial analysis to your ride count visualization, you need geospatial data to work out which trips originated where. As TimescaleDB is compatible with all PostgreSQL extensions, use [PostGIS](http://postgis.net/) to slice data by time and location.

1. **Connect to your Tiger Cloud service and add the PostGIS extension:**

   ```
   CREATE EXTENSION postgis;
   ```

2. **Add geometry columns for pick up and drop off locations:**

   ```
    ALTER TABLE rides ADD COLUMN pickup_geom geometry(POINT,2163);
    ALTER TABLE rides ADD COLUMN dropoff_geom geometry(POINT,2163);
   ```

3. **Convert the latitude and longitude points into geometry coordinates that work with PostGIS:**

   ```
   UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163),
      dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);
   ```

   This updates 10,906,860 rows of data on both columns, it takes a while. Coffee is your friend.

   You might run into this error while the update happens

   `Error: tuple decompression limit exceeded by operation Error Code: 53400 Details: current limit: 100000, tuples decompressed: 10906860 Hint: Consider increasing timescaledb.max_tuples_decompressed_per_dml_transaction or set to 0 (unlimited).`

   To fix this, use

   ```
   SET timescaledb.max_tuples_decompressed_per_dml_transaction TO 0;
   ```

### Visualize the area where you can make the most money

In this section you visualize a query that returns rides longer than 5 miles for trips taken within 2 km of Times Square. The data includes the distance travelled and is `GROUP BY` `trip_distance` and location so that Grafana can plot the data properly.

This enables you to see where a taxi driver is most likely to pick up a passenger who wants a longer ride, and make more money.

1. **Create a geolocalization dashboard**

   1. In Grafana, create a new dashboard that is connected to your Tiger Cloud service data source with a Geomap visualization.

   2. In the `Queries` section, select `Code`, then select the Time series `Format`.

      ![Configuring a Grafana dashboard with a TimescaleDB data source](/docs/_astro/use-case-rta-grafana-timescale-configure-dashboard.PNpiH0JO_204Gcy.webp)

   3. To find rides longer than 5 miles in Manhattan, paste the following query:

      ```
      SELECT time_bucket('5m', rides.pickup_datetime) AS time,
             rides.trip_distance AS value,
             rides.pickup_latitude AS latitude,
             rides.pickup_longitude AS longitude
      FROM rides
      WHERE rides.pickup_datetime BETWEEN '2016-01-01T01:41:55.986Z' AND '2016-01-01T07:41:55.986Z' AND
        ST_Distance(pickup_geom,
                    ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)
        ) < 2000
      GROUP BY time,
               rides.trip_distance,
               rides.pickup_latitude,
               rides.pickup_longitude
      ORDER BY time
      LIMIT 500;
      ```

      You see a world map with a dot on New York.

   4. Zoom into your map to see the visualization clearly.

2. **Customize the visualization**

   In the Geomap options, under `Map Layers`, click `+ Add layer` and select `Heatmap`. You now see the areas where a taxi driver is most likely to pick up a passenger who wants a longer ride, and make more money.

   ![Grafana heatmap showing NYC taxi trip pickup locations](/docs/_astro/use-case-rta-grafana-heatmap.Brk4prT1_ZqnYIW.webp)

You have integrated Grafana with a Tiger Cloud service and made insights based on visualization of your data.
