<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[Tiger Data Blog]]></title>
        <description><![CDATA[Insights, product updates, and tips from TigerData (Creators of TimescaleDB) engineers on Postgres, time series & AI. IoT, crypto, and analytics tutorials & use cases.]]></description>
        <link>https://www.tigerdata.com/blog</link>
        <image>
            <url>https://www.tigerdata.com/icon.ico</url>
            <title>Tiger Data Blog</title>
            <link>https://www.tigerdata.com/blog</link>
        </image>
        <generator>RSS for Node</generator>
        <lastBuildDate>Tue, 07 Apr 2026 09:58:51 GMT</lastBuildDate>
        <atom:link href="https://www.tigerdata.com/blog" rel="self" type="application/rss+xml"/>
        <ttl>60</ttl>
        <item>
            <title><![CDATA[TimescaleDB to the Rescue - Speeding Up Statistics]]></title>
            <description><![CDATA[In this article, software engineer Kamil Ruczyński shows how TimescaleDB outperforms MySQL for time-series data using continuous aggregates and partitioning.]]></description>
            <link>https://www.tigerdata.com/blog/timescaledb-to-the-rescue-speeding-up-statistics</link>
            <guid isPermaLink="true">https://www.tigerdata.com/blog/timescaledb-to-the-rescue-speeding-up-statistics</guid>
            <category><![CDATA[Dev Q&A]]></category>
            <category><![CDATA[TimescaleDB]]></category>
            <category><![CDATA[PostgreSQL]]></category>
            <dc:creator><![CDATA[Kamil Ruczyński]]></dc:creator>
            <pubDate>Fri, 20 Jun 2025 12:00:35 GMT</pubDate>
            <media:content medium="image" href="https://timescale.ghost.io/blog/content/images/2025/06/timescaledb-to-the-rescue.png">
            </media:content>
            <content:encoded><![CDATA[<p><em>This article, written by </em><a href="https://www.linkedin.com/in/kamilruczynski/" rel="noreferrer"><em>Kamil </em></a><a href="https://www.linkedin.com/in/kamilruczynski/" rel="noreferrer"><em>Ruczyński</em></a><em>, was originally posted&nbsp;on Apr 6, 2025&nbsp;on </em><a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/" rel="noreferrer"><em>his blog</em></a><em>.&nbsp;Kamil is a Staff Software Engineer specializing in backend solutions, with a strong focus on scalability and performance. It's reposted here with permission. </em></p><p>Some time ago, I was working on improving the performance of slow statistics. The problem was that our database contained billions of rows, making data retrieval slow, even for the last seven days. From a product perspective, we needed to display data for at least 30 days and in real-time. All the data was stored in MySQL without partitioning, so we had to find a better solution. Simply using a cache was not an option, as real-time data was required.</p><p>Let’s analyze it on contrived example but similar to the original one.</p><h2 id="mysql-solution">MySQL Solution<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#mysql-solution"></a></h2><p>Let’s say that we have a table with the following structure:</p><pre><code class="language-SQL">CREATE TABLE agent_stats (
    id BIGINT AUTO_INCREMENT NOT NULL,
    agent_id INT NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id)
);

CREATE INDEX idx_agent_stats_created_at_agent_id_event_type ON agent_stats (created_at, agent_id, event_type);</code></pre><p>We’re collecting there statistics of our AI agents. Now, we have two types of events:</p><ul><li>triggered</li><li>response_generated</li></ul><p>To test the performance of the database, we need to generate a lot of data. In MySQL there is no fast way to generate random data, so I used some script which generated 24 234 964 records. It was quite slow.</p><p>And now, we want to get the number of triggered events for each agent in the last 30 days.</p><pre><code class="language-SQL">SELECT agent_id, event_type, COUNT(*) as count
FROM agent_stats
WHERE created_at &gt; '2025-02-28 00:00:00'
GROUP BY agent_id, event_type</code></pre><p>It’s very slow, and takes 11 seconds.</p><h2 id="what-is-timescaledb">What is TimescaleDB?<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#what-is-timescaledb"></a></h2><p>TimescaleDB is an open-source time-series database built on PostgreSQL. It is designed to handle large volumes of time-series data efficiently. Basically it is a PostgreSQL extension that adds time-series capabilities to the database. It’s optimized for insertions of time-series data, and it provides features like automatic partitioning, retention policies, and continuous aggregates.</p><h2 id="timescaledb-solution">TimescaleDB Solution<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#timescaledb-solution"></a></h2><p>So let’s try to use TimescaleDB to speed up our statistics. Creating a similar table in TimescaleDB:</p><pre><code class="language-SQL">CREATE TABLE agent_stats(
   created_at TIMESTAMPTZ NOT NULL,
   agent_id BIGINT NOT NULL CHECK (agent_id &gt; 0),
   event_type VARCHAR(255) NOT NULL
);

SELECT create_hypertable('agent_stats', 'created_at');

CREATE INDEX ON agent_stats (created_at DESC, agent_id, event_type);</code></pre><p>The&nbsp;<code>create_hypertable</code>&nbsp;function creates a hypertable, which is a TimescaleDB abstraction for a standard PostgreSQL table, but with automatic partitioning based on time.</p><p>Generating data in TimescaleDB is much convenient and a lot faster than in MySQL.</p><pre><code class="language-SQL">INSERT INTO agent_stats (created_at, agent_id, event_type)
SELECT
    time,
    random() * 9 + 1, /* 1 - 10 */
    'triggered'
FROM
    generate_series(
        '2024-01-01 00:00:00',
        '2025-03-28 16:00:00',
        INTERVAL '1 second'
    ) AS time;

INSERT INTO agent_stats (created_at, agent_id, event_type)
SELECT
    time,
    random() * 9 + 1, /* 1 - 10 */
    'response_generated'
FROM
    generate_series(
        '2024-01-01 00:00:00',
        '2025-03-28 16:00:00',
        INTERVAL '1 second'
    ) AS time;</code></pre><p>In this way we can generate 2 records per second in the range of around 1 year and 3 months. I repeated this a few times, and in a few minutes total of 184 675 516 records were generated.</p><p>Now, let’s get the same data as before:</p><pre><code class="language-SQL">SELECT agent_id, event_type, COUNT(*) as count
FROM agent_stats
WHERE created_at &gt; '2025-02-28 00:00:00'
GROUP BY agent_id, event_type</code></pre><p>Keep in mind that we have much more data now, so the query it’s also slow. Now it takes 9 seconds, but of course on the same data as in MySQL it would be much faster, because of the partitioning. Ok, so now we need to speed it up.</p><h2 id="timescaledb-continuous-aggregates">TimescaleDB Continuous Aggregates<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#timescaledb-continuous-aggregates"></a></h2><p>Continuous aggregates are a powerful feature of TimescaleDB that allows you to pre-compute and store the results of query aggregations over time. It’s based on the concept of materialized views in PostgreSQL, but it can return data real-time.</p><p>Let’s create a new continuous aggregate for agent_stats table:</p><pre><code class="language-SQL">CREATE MATERIALIZED VIEW hourly_agent_stats WITH (timescaledb.continuous)
AS
SELECT
    time_bucket('1 hour', created_at) as hour,
    agent_id,
    event_type,
    COUNT(1) AS occurrences
FROM agent_stats
GROUP BY
    hour,
    agent_id,
    event_type
WITH NO DATA
;</code></pre><p>We also need to create a refresh policy to keep the materialized view up to date.</p><pre><code class="language-SQL">SELECT add_continuous_aggregate_policy('hourly_agent_stats',
    start_offset =&gt; INTERVAL '1 day',
    end_offset =&gt; INTERVAL '1 hour',
    schedule_interval =&gt; INTERVAL '1 hour'
);</code></pre><p>Now we need to populate the materialized view with the data we already have in the table.</p><pre><code class="language-SQL">CALL refresh_continuous_aggregate('hourly_agent_stats', '2024-01-01', '2025-03-28');</code></pre><p>It takes some time, and probably on the production it would be better to do it incrementally e.g. per day.</p><p>How it looks in the database:</p><figure class="kg-card kg-image-card"><img src="https://timescale.ghost.io/blog/content/images/2025/06/timescaledb-records-aggregated-by-hour.png" class="kg-image" alt="" loading="lazy" width="713" height="448" srcset="https://timescale.ghost.io/blog/content/images/size/w600/2025/06/timescaledb-records-aggregated-by-hour.png 600w, https://timescale.ghost.io/blog/content/images/2025/06/timescaledb-records-aggregated-by-hour.png 713w"></figure><p>Records in the view are aggregated by hour, so we have one record per hour per agent_id and event_type.</p><p>Now we can modify the query to take advantage of the newly created continuous aggregate:</p><pre><code class="language-SQL">SELECT agent_id, event_type, SUM(occurrences) as occurrences
FROM hourly_agent_stats
WHERE hour &gt; '2025-02-28 00:00:00'
GROUP BY agent_id, event_type</code></pre><p>This query is much faster, and it takes only 0.02 seconds.</p><p>It’s possible to get real-time data from the continuous aggregate, as historical data is fetched from the view, while the last hour (since we use a 1-hour bucket) is fetched from the source table.</p><h2 id="timescaledb-retention">TimescaleDB Retention<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#timescaledb-retention"></a></h2><p>TimescaleDB also provides a retention policy feature that allows you to automatically drop old data. Let’s say that we need only 6 months of detailed data, and 1 year of aggregated data. We can set up the following retention policies:</p><pre><code class="language-SQL">SELECT add_retention_policy('agent_stats', INTERVAL '6 MONTH');</code></pre><p>This will drop all data older than 6 months from the source table.</p><pre><code class="language-SQL">SELECT add_retention_policy('hourly_agent_stats', INTERVAL '1 YEAR');</code></pre><p>This will drop all data older than 1 year from the continuous aggregate.</p><h2 id="summary">Summary<a href="https://sarvendev.com/posts/timescale-db-to-the-rescue/#summary"></a></h2><p>We could achieve some performance improvement by using partitioning in MySQL, but it would be a slight improvement and additional work. Adding TimescaleDB increases the complexity of the whole system, as it’s a new technology that needs to be maintained, but it’s a great choice for time-series data and great choice from the point of view of application engineers, as it provides a lot of useful features. However, if you’re using PostgreSQL now, using TimescaleDB will be easier to implement, as you don’t need to learn a new technology. It’s just an extension.</p>]]></content:encoded>
        </item>
    </channel>
</rss>