---
title: Simulate an IoT sensor dataset | Tiger Data Docs
description: Simulate an IoT sensor dataset for testing and development with Tiger Cloud
---

The Internet of Things (IoT) describes a trend where computing capabilities are embedded into IoT devices. That is, physical objects, ranging from light bulbs to oil wells. Many IoT devices collect sensor data about their environment and generate time-series datasets with relational metadata.

It is often necessary to simulate IoT datasets. For example, when you are testing a new system. This tutorial shows how to simulate a basic dataset in your Tiger Cloud service, and then run simple queries on it.

To simulate a more advanced dataset, see [Time-series Benchmarking Suite (TSBS)](https://github.com/timescale/tsbs).

## Prerequisites for this tutorial

To follow the steps on this page:

- Create a target [Tiger Cloud service](/docs/get-started/quickstart/create-service/index.md) with the Real-time analytics capability.

  You need [your connection details](/docs/integrate/find-connection-details/index.md). This procedure also works for [self-hosted TimescaleDB](/docs/get-started/choose-your-path/install-timescaledb/index.md).

## Simulate a dataset

To simulate a dataset, run the following queries:

1. **Create the sensors table**

   ```
   CREATE TABLE sensors(
     id SERIAL PRIMARY KEY,
     type VARCHAR(50),
     location VARCHAR(50)
   );
   ```

2. **Create the sensor\_data hypertable**

   ```
   CREATE TABLE sensor_data (
     time TIMESTAMPTZ NOT NULL,
     sensor_id INTEGER,
     temperature DOUBLE PRECISION,
     cpu DOUBLE PRECISION,
     FOREIGN KEY (sensor_id) REFERENCES sensors (id)
   ) WITH (
     tsdb.hypertable
   );
   ```

   When you create a hypertable using [CREATE TABLE … WITH …](/docs/reference/timescaledb/hypertables/create_table/index.md), the default partitioning column is automatically the first column with a timestamp data type. Also, TimescaleDB creates a [columnstore policy](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md) that automatically converts your data to the columnstore, after an interval equal to the value of the [chunk\_interval](/docs/reference/timescaledb/hypertables/set_chunk_time_interval/index.md), defined through `after` in the policy. This columnar format enables fast scanning and aggregation, optimizing performance for analytical workloads while also saving significant storage space. In the columnstore conversion, hypertable chunks are compressed by up to 98%, and organized for efficient, large-scale queries.

   You can customize this policy later using [alter\_job](/docs/reference/timescaledb/jobs-automation/alter_job/index.md). However, to change `after` or `created_before`, the compression settings, or the hypertable the policy is acting on, you must [remove the columnstore policy](/docs/reference/timescaledb/hypercore/remove_columnstore_policy/index.md) and [add a new one](/docs/reference/timescaledb/hypercore/add_columnstore_policy/index.md).

   You can also manually [convert chunks](/docs/reference/timescaledb/hypercore/convert_to_columnstore/index.md) in a hypertable to the columnstore.

3. **Populate the sensors table**

   ```
   INSERT INTO sensors (type, location) VALUES
   ('a','floor'),
   ('a', 'ceiling'),
   ('b','floor'),
   ('b', 'ceiling');
   ```

4. **Verify that the sensors have been added correctly**

   ```
   SELECT * FROM sensors;
   ```

   Sample output:

   ```
    id | type | location
   ----+------+----------
     1 | a    | floor
     2 | a    | ceiling
     3 | b    | floor
     4 | b    | ceiling
   (4 rows)
   ```

5. **Generate and insert a dataset for all sensors**

   ```
   INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
   SELECT
     time,
     sensor_id,
     random() AS cpu,
     random()*100 AS temperature
   FROM generate_series(now() - interval '24 hour', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
   ```

6. **Verify the simulated dataset**

   ```
   SELECT * FROM sensor_data ORDER BY time;
   ```

   Sample output:

   ```
                time              | sensor_id |    temperature     |         cpu
   -------------------------------+-----------+--------------------+---------------------
    2020-03-31 15:56:25.843575+00 |         1 |   6.86688972637057 |   0.682070567272604
    2020-03-31 15:56:40.244287+00 |         2 |    26.589260622859 |   0.229583469685167
    2030-03-31 15:56:45.653115+00 |         3 |   79.9925176426768 |   0.457779890391976
    2020-03-31 15:56:53.560205+00 |         4 |   24.3201029952615 |   0.641885648947209
    2020-03-31 16:01:25.843575+00 |         1 |   33.3203678019345 |  0.0159163917414844
    2020-03-31 16:01:40.244287+00 |         2 |   31.2673618085682 |   0.701185956597328
    2020-03-31 16:01:45.653115+00 |         3 |   85.2960689924657 |   0.693413889966905
    2020-03-31 16:01:53.560205+00 |         4 |   79.4769988860935 |   0.360561791341752
   ...
   ```

## Run basic queries

After you simulate a dataset, you can run some basic queries on it. For example:

- Average temperature and CPU by 30-minute windows:

  ```
  SELECT
    time_bucket('30 minutes', time) AS period,
    AVG(temperature) AS avg_temp,
    AVG(cpu) AS avg_cpu
  FROM sensor_data
  GROUP BY period;
  ```

  Sample output:

  ```
           period         |     avg_temp     |      avg_cpu
  ------------------------+------------------+-------------------
   2020-03-31 19:00:00+00 | 49.6615830013373 | 0.477344429974134
   2020-03-31 22:00:00+00 | 58.8521540844037 | 0.503637770501276
   2020-03-31 16:00:00+00 | 50.4250325243144 | 0.511075591299838
   2020-03-31 17:30:00+00 | 49.0742547437549 | 0.527267253802468
   2020-04-01 14:30:00+00 | 49.3416377226822 | 0.438027751864865
   ...
  ```

- Average and last temperature, average CPU by 30-minute windows:

  ```
  SELECT
    time_bucket('30 minutes', time) AS period,
    AVG(temperature) AS avg_temp,
    last(temperature, time) AS last_temp,
    AVG(cpu) AS avg_cpu
  FROM sensor_data
  GROUP BY period;
  ```

  Sample output:

  ```
           period         |     avg_temp     |    last_temp     |      avg_cpu
  ------------------------+------------------+------------------+-------------------
   2020-03-31 19:00:00+00 | 49.6615830013373 | 84.3963081017137 | 0.477344429974134
   2020-03-31 22:00:00+00 | 58.8521540844037 | 76.5528806950897 | 0.503637770501276
   2020-03-31 16:00:00+00 | 50.4250325243144 | 43.5192013625056 | 0.511075591299838
   2020-03-31 17:30:00+00 | 49.0742547437549 |  22.740753274411 | 0.527267253802468
   2020-04-01 14:30:00+00 | 49.3416377226822 | 59.1331578791142 | 0.438027751864865
  ...
  ```

- Query the metadata:

  ```
  SELECT
    sensors.location,
    time_bucket('30 minutes', time) AS period,
    AVG(temperature) AS avg_temp,
    last(temperature, time) AS last_temp,
    AVG(cpu) AS avg_cpu
  FROM sensor_data JOIN sensors on sensor_data.sensor_id = sensors.id
  GROUP BY period, sensors.location;
  ```

  Sample output:

  ```
   location |         period         |     avg_temp     |     last_temp     |      avg_cpu
  ----------+------------------------+------------------+-------------------+-------------------
   ceiling  | 2020-03-31 15:30:00+00 | 25.4546818090603 |  24.3201029952615 | 0.435734559316188
   floor    | 2020-03-31 15:30:00+00 | 43.4297036845237 |  79.9925176426768 |  0.56992522883229
   ceiling  | 2020-03-31 16:00:00+00 | 53.8454438598516 |  43.5192013625056 | 0.490728285357666
   floor    | 2020-03-31 16:00:00+00 | 47.0046211887772 |  23.0230117216706 |  0.53142289724201
   ceiling  | 2020-03-31 16:30:00+00 | 58.7817596504465 |  63.6621567420661 | 0.488188337767497
   floor    | 2020-03-31 16:30:00+00 |  44.611586847653 |  2.21919436007738 | 0.434762630766879
   ceiling  | 2020-03-31 17:00:00+00 | 35.7026890735142 |  42.9420990403742 | 0.550129583687522
   floor    | 2020-03-31 17:00:00+00 | 62.2794370166957 |  52.6636955793947 | 0.454323202022351
  ...
  ```

You have now successfully simulated and run queries on an IoT dataset.
