TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
How to Choose a Database: A Decision Framework for Modern ApplicationsRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL PerformancePostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningAn Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesGuide to PostgreSQL Database OperationsBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Explaining PostgreSQL EXPLAINBest Practices for (Time-)Series Metadata Tables What Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideGuide to Postgres Data ManagementA Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningTop PostgreSQL Drivers for PythonUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseHow to Use Psycopg2: The PostgreSQL Adapter for PythonBuilding a Scalable DatabaseGuide to PostgreSQL Database Design
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
Building AI Agents with Persistent Memory: A Unified Database ApproachWhen Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
Complete Guide: Migrating from MongoDB to Tiger Data (Step-by-Step)How to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's Guide5 Ways to Monitor Your PostgreSQL DatabaseData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Time series databases

AWS Time-Series Database: Understanding Your OptionsThe Best Time-Series Databases ComparedAlternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your Options

Time series analysis

Stationary Time-Series Analysis

Python

How to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonTime-Series Analysis and Forecasting With Python Guide to Time-Series Analysis in Python

R

Time-Series Analysis in R
Understanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views

Time series data

What Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsWhy Consider Using PostgreSQL for Time-Series Data?
TigerData logo

Products

Time-series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time-series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2026 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at May 20, 2024

Time-Series Analysis in R

Series vs Lags plot

Written by Anber Arif

Time-series data is becoming increasingly prevalent in today's data-driven world. This type of data, which is collected at different points in time, is used in various applications, from financial forecasting to predicting customer behavior. With the growing importance of time-series data, companies are on the lookout for tools and techniques that can help them make sense of the data they have.

One such tool is R, a programming language specifically designed for statistical computing and graphics. R is an open-source language widely used by data scientists, statisticians, and researchers (but also developers at Google, Uber, or Facebook) for data analysis and visualization. Its powerful capabilities and extensive library of packages make it an ideal choice for time-series analysis.

In this article, you will learn the basics of analyzing time-series data in R. We will cover the key concepts and techniques used in time-series analysis, including data exploration, seasonality and trend detection, and forecasting. By the end of this article, you will have a solid understanding of how to use R to analyze time-series data and extract insights from it.

What Is Time-Series Analysis?

Time-series data is a collection of observations recorded over time, each associated with a specific time point. This type of data has a temporal dimension and is commonly encountered in business contexts where data accumulates over time. It serves as a valuable tool for analyzing trends, identifying patterns, and detecting seasonality within a given variable. Time-series data finds widespread use in financial analysis, economic forecasting, and machine learning.

For instance, stock prices, exchange rates, and sales figures are all examples of time-series data. By analyzing such data, we can gain insights into how these variables evolve and make informed predictions about their future behavior. 

Time-series analysis

Time-series analysis involves the visualization and modeling of temporal data to uncover underlying patterns and trends. Through techniques like charting, you can plot data points over time and identify recurring patterns and fluctuations. These visualizations, such as time-series graphs, offer insights into how variables evolve over time, and can be instrumental in understanding the data dynamics. 

For more information on time-series graphs and their applications, explore these examples.

There are generally two main types of time-series analysis:

  • Exploratory analysis

  • Predictive analysis

Exploratory analysis

Exploratory analysis in the context of time-series data serves the fundamental purpose of understanding and describing the underlying patterns inherent in the dataset. It involves several techniques explicitly tailored for time-series data to uncover key components such as trend, seasonality, cyclicity, and irregularities.

Decomposition

One of the primary techniques in exploratory analysis is decomposition. This method breaks down the time-series data into its constituent components: trend, seasonality, cyclicity, and irregularities.

  • Trend: The trend component represents the long-term movement or directionality of the data, indicating whether it is increasing, decreasing, or stable over time.

  • Seasonality: Seasonality refers to the periodic fluctuations or patterns that occur at regular intervals within the data, often corresponding to seasonal variations such as monthly or yearly cycles.

  • Cyclicity: Cyclicity captures repetitive patterns that occur over longer periods than seasonality but shorter than the overall trend. These cycles may not have fixed intervals and can vary in duration.

  • Irregularities: Irregularities represent random fluctuations or noise in the data that can not be attributed to trend, seasonality, or cyclicity.

Seasonality correlation analysis

Another important aspect of exploratory analysis for time series is seasonality correlation analysis. This technique aims to identify the likely lengths of seasonal cycles within the data.

  • Analysts examine the correlation between the observed data and lagged versions of itself, considering different time lags corresponding to potential seasonal cycles.

  • By identifying significant correlations at specific lag intervals, analysts can infer the likely lengths of seasonal cycles present in the data.

  • This information is crucial for understanding the periodic patterns and variations inherent in the dataset, enabling analysts to make informed decisions regarding modeling and forecasting.

Predictive analysis

Predictive analysis, as a counterpart to exploratory analysis, focuses on leveraging historical data to develop models that can forecast future behavior. This approach is particularly valuable in scenarios where analysts seek to anticipate trends, patterns, or outcomes based on past observations.

In predictive analysis, analysts employ various statistical and machine learning techniques to build models that capture the relationships and dependencies present in the historical data. These models are trained using past observations and their corresponding outcomes, allowing them to learn patterns and make predictions about future behavior. Several methods are commonly used for predictive analysis of time series data:

Linear regression

Linear regression is a statistical technique that models the relationship between a dependent variable and one or more independent variables. In the context of time-series analysis, linear regression can be applied to predict future values based on historical trends and patterns.

Decomposition projection

Building on the decomposition technique, decomposition projection involves using the decomposed components (e.g., trend, seasonality) to project future values of the time series. This method accounts for the data's underlying trends and seasonal patterns.

ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average) models are widely used for time-series forecasting. They incorporate autoregressive and moving average components to capture the temporal dependencies and fluctuations in the data. ARIMA models are versatile and can be adapted to various types of time-series data.

Learn more:

  • For in-depth information on general time-series analysis, delve into decomposition techniques to understand time-series patterns.

  • Explore forecasting methods for predictive insights, which can help you anticipate future trends and make informed decisions based on historical data patterns.

The TSstudio Package for R

The TSstudio package for R is a collection of analysis functions and plotting tools relevant to time-series data. This package is available under the MIT license, which is a permissive free software license that allows users to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software without any royalties or additional fees. The only requirement of the MIT license is that the license attribution be included in any copies or substantial uses of the software.

The TSstudio package includes a wide range of functions and tools for time-series analysis, including functions for exploratory analysis, such as decomposition and seasonality correlation analysis, as well as functions for predictive analysis, such as linear regression, decomposition projection, and ARIMA models.

The package also includes several plotting tools for visualizing time-series data, such as time-series graphs and autocorrelation plots. More information on the TSstudio package and its license here. To explore the detailed functionality and usage guidelines, check the TSstudio package reference manual.

Installation

To install the TSstudio package, you can use the install.packages function in R:

install.packages("TSstudio")

Once the package is installed, you can load it into your R session using the library function:

library(TSstudio)

Data

The AirPassengers dataset in R is a time-series dataset representing the monthly international airline passenger numbers from January 1949 to December 1960. It contains 144 observations, with 12 observations per year. Here’s how to load and explore the AirPassengers dataset:

# Load the dataset data(AirPassengers)

# Get the dataset info ts_info(AirPassengers)

image

To plot a time-series graph of the AirPassengers dataset, we can utilize the ts_plot() function from the TSstudio package in R. This package offers a range of tools tailored for time-series analysis and visualization.

# Plot the time-series graph ts_plot(AirPassengers,title = "Air Passengers Over the Years",         Ytitle = "Number of Air Passengers')",         Xtitle = "Year")

image

For further information, dive into practical tutorials and examples for hands-on guidance and insights into effectively using the TSstudio package.

Exploratory Analysis With R

Exploratory analysis aims to uncover key characteristics of a time series through descriptive methods. During this phase, our primary focus is:

  1. Determination of Trend type

  2. Identification of Seasonal patterns

  3. Detection of Noisy data

These insights provide a deeper understanding of historical data and are invaluable for forecasting future trends.

Decomposition of the series

Our first step is to decompose the series into its three components: trend, seasonal, and random. We achieve this using the ts_decompose function, which offers an interactive interface for the decompose function from the stats package.

image

Let’s break down each component of this decomposition plot:

  • Trend: The trend component represents the overall direction or pattern in the data. In this case, the trend component is increasing, indicating a steady growth in the number of air passengers over the years. The trend is not strictly linear, but rather a gentle, upward-sloping curve.

  • Seasonal: The seasonal component captures periodic patterns that occur at fixed intervals. In this dataset, there is a clear seasonal pattern, with peaks and troughs occurring at regular intervals.

image

The seasonal pattern appears to be annual, with a single cycle per year. The peaks occur around the summer months indicating a higher demand for air travel during the summer season. The troughs occur during the winter months suggesting a lower demand during the winter season.

  • Noise (Random): The noise component represents the irregular or residual variation in the data that cannot be attributed to the trend and seasonal components. In this plot, the noise is relatively small compared to the trend and seasonal components, indicating that the model has captured most of the underlying patterns in the data.

  • Observed: This is the original time-series data, which combines trend, seasonal, and random components.

Seasonality correlation

Seasonality correlation is a statistical technique used to analyze the similarity between data after a given lag or period of time. It is particularly useful for time series data that exhibit seasonal patterns, as it can help identify the meaningful periodicity of the data. The key components of this method are:

  • Lag period: A lag refers to a period or interval between two data points. In the context of seasonality correlation, a lag is used to measure the time difference between two data points in a seasonal cycle. For example, if the data is monthly, a lag of four means a four-month period, while a lag of 12 means a one-year period.

  • Correlation coefficient: Correlation coefficients measure the strength and direction of the relationship between two variables. In seasonality correlation analysis, correlations are calculated between observations separated by different lag periods. A correlation coefficient can range from -1 to 1, where a value of 1 indicates a perfect positive correlation, a value of -1 indicates a perfect negative correlation, and a value of 0 indicates no correlation.

  • Seasonality correlation plot: A seasonality correlation plot is a graphical representation of the correlation coefficients between data points after a given lag. It is typically presented as a line chart, with the lag on the x-axis and the correlation coefficient on the y-axis. The plot can help identify the meaningful periodicity of the data by highlighting the lag correlation values that are significantly different from zero.

High correlation lag values indicate cyclical behavior in the data. This means that the data exhibits a repeating pattern over a certain period of time. For example, if the data is monthly and the lag correlation value is high at a lag of 12, this suggests that the data exhibits a one-year cycle. High correlation lag values can help identify seasonal patterns and make predictions about future data points.

Correlation analysis

The ts_cor() function calculates both the autocorrelation function (ACF) and the partial autocorrelation function (PACF) for a given time-series dataset. These functions measure the correlation between the time series and its lagged values, providing insights into temporal dependencies and patterns within the data.

ts_cor(AirPassengers, lag.max = 40)

image

The ACF plot shows the correlation of a time series with its own lagged values. In this plot, we can observe:

  • At lag 0, the correlation is 1, which is expected, as it's the original series.

  • At lag 1, the correlation is 0.75, indicating a strong positive correlation with the previous period.

  • At lag 12, there's a significant spike, suggesting a strong seasonal component with a period of 12.

  • After lag 12, the correlation gradually decreases but remains positive for some lags, indicating a lingering effect of past values.

The PACF plot shows the correlation between a time series and its lagged values after removing the effect of intermediate lags. In this plot, we can observe:

  • At lag 0, the correlation is 1, which is expected as it's the original series.

  • At lag 1, the correlation is 0.75, similar to the ACF plot.

  • At lag 12, there's a significant spike, similar to the ACF plot, indicating a robust seasonal component with a period of 12.

  • After lag 12, the correlation quickly drops to near zero, suggesting there's little correlation left after accounting for the lag 12 effect.

We can also use ts_lags() function to identify the relationship between the series and its lags.

ts_lags(AirPassengers)

image

Predictive Analysis With R

Predictive analysis is a branch of advanced analytics that uses various statistical and machine learning techniques to identify patterns and trends in data and make predictions about future outcomes. It involves building statistical models that analyze historical data to identify relationships and patterns between variables. These models are then used to predict future events based on new data. While predictive analysis can be a powerful tool for forecasting and decision-making, it is important to recognize that it is not without complexities. Here are some key points to consider when conducting predictive analysis with R:

  • Models come with mathematical assumptions: Predictive models in R, whether simple linear regression or complex machine learning algorithms, are built upon mathematical assumptions. These assumptions may include linearity, normality, independence of errors, and homoscedasticity, among others. It is crucial to understand these assumptions and assess whether they hold true for your data before applying a model. Violations of these assumptions can lead to biased estimates and inaccurate predictions.

  • Accuracy metrics aren’t the full story: While accuracy metrics such as mean squared error (MSE), R-squared, and accuracy score provide valuable insights into the performance of predictive models, they do not tell the whole story. It is essential to consider other factors such as interpretability, computational efficiency, and practical relevance when evaluating model performance. A highly accurate model on training data may not necessarily generalize well to unseen data or real-world scenarios.

  • Understanding the math behind models: To effectively apply predictive models in R, it is beneficial to have a solid understanding of the underlying mathematical principles behind these models. This includes understanding the algorithms, optimization techniques, and mathematical frameworks used to train and evaluate models. 

  • Tuning models and developing hypotheses: Tuning models involves adjusting hyperparameters, selecting features, and optimizing model performance. Understanding the math behind models empowers practitioners to make informed decisions during the tuning process, such as selecting appropriate regularization parameters or feature engineering techniques. Additionally, understanding the math behind models helps develop hypotheses to explain unexpected behavior or model failures, leading to iterative improvements in predictive performance.

Piecewise linear models with TSLM

Piecewise linear models are a useful technique for capturing non-linear trends in time-series data by partitioning the data into segments and fitting a separate linear regression model to each segment. This approach allows for capturing broad average trends in the data while maintaining simplicity and interpretability. The key features of piecewise linear models are as follows:

  • Partitioning data into "nearly linear" sections: Piecewise linear models divide the time-series data into segments or intervals where the relationship between the predictor variable (time) and the response variable (data values) is approximately linear within each segment.

  • Fitting a line to each section: Within each segment, a linear regression model is fitted to the data points. This involves estimating the slope (gradient) and intercept of the line that best fits the data within that segment.

Capturing broad average trends: Piecewise linear models provide a simplified representation of the underlying trends in the data by capturing the broad average trends within each segment. While these models may be less accurate than more complex non-linear models, they offer ease of interpretation and understanding.

# Load AirPassengers dataset data("AirPassengers") # Split the dataset into training and testing sets (80-20 split) train_data <- window(AirPassengers, end = c(1958, 12)) test_data <- window(AirPassengers, start = c(1959, 1)) # Fit the TSLM model on the training set tslm_model <- tslm(train_data ~ trend + season) # Generate forecasts using the fitted TSLM model on the testing set tslm_forecast <- forecast(tslm_model, h = length(test_data)) # Evaluate the performance of the TSLM model on the test data tslm_evaluation <- accuracy(tslm_forecast, test_data) # Print evaluation metrics print(tslm_evaluation) # Combine the training and test data for plotting combined_data <- ts(c(train_data, test_data), start = start(train_data), frequency = frequency(train_data)) # Plot the original data with fitted values and forecast plot(combined_data, main = "TSLM Model on AirPassengers Dataset", xlim = c(1949, 1961)) lines(fitted(tslm_model), col = "red") lines(tslm_forecast$mean, col = "blue") legend("topright", legend = c("Original Data", "Fitted Values", "Forecast"),       col = c("black", "red", "blue"), lty = 1)

image

image

The plot shows the original data (black), fitted values (red), and forecasted values (blue). This model captures some underlying patterns in the data, but it is not perfect.

Exponential smoothing models

Exponential smoothing models, implemented through the ETS (Error, Trend, Seasonality) framework, are commonly used in time-series analysis for forecasting. These models are particularly useful for reducing the impact of outliers and noisy data while still capturing underlying trends and patterns in the dataset. The key features of exponential smoothing models are as follows:

  • Reduces the impact of outliers and error data: Exponential smoothing models apply a smoothing parameter to the observed data, which assigns less weight to older observations and more weight to recent observations. This reduces the impact of outliers and error data points, allowing the model to focus on capturing the overall trend and seasonality in the dataset.

  • Provides a more stable model: By smoothing out fluctuations in the data, exponential smoothing models produce a more stable forecast that follows the general trajectory of the dataset. This stability makes the model less sensitive to short-term fluctuations and noise, resulting in more reliable forecasts.

# Load necessary libraries library(forecast) # Load AirPassengers dataset data("AirPassengers") # Split the dataset into training and testing sets (80-20 split) train_data <- window(AirPassengers, end = c(1958, 12)) test_data <- window(AirPassengers, start = c(1959, 1)) # Fit the ETS model on the training set ets_model <- ets(train_data) # Generate forecasts using the ETS model ets_forecast <- forecast(ets_model, h = length(test_data)) # Plot the original data with forecasted values plot(ets_forecast, main = "ETS Model on AirPassengers Dataset") lines(test_data, col = "red") legend("topright", legend = c("Forecasted Values", "Actual Values"),       col = c("blue", "red"), lty = 1) # Print evaluation metrics ets_evaluation <- accuracy(ets_forecast, test_data) print(ets_evaluation)

Here’s the generated ETS plot for the AirPassengers dataset, along with the evaluation metrics:

image

image

ARIMA model

The ARIMA (AutoRegressive Integrated Moving Average) model is a widely used time-series forecasting method that combines autoregressive (AR), differencing (I), and moving average (MA) components. It is particularly effective for modeling time series data with linear dependencies and temporal patterns. The key components of the ARIMA model are as follows:

  • AutoRegressive (AR) component: The autoregressive component models the relationship between an observation and a number of lagged observations (autoregressive terms). It captures the linear dependency of the current value on its previous values.

  • Integrated (I) component: The integrated component represents the differencing operation applied to the time series data to make it stationary. Stationarity is essential for ARIMA models, as it ensures that the statistical properties of the data remain constant over time.

  • Moving Average (MA) component: The moving average component models the relationship between an observation and a residual error term from a moving average model applied to lagged observations. It captures the influence of past white noise or shock on the current value.

# Load necessary libraries library(forecast) # Load AirPassengers dataset data("AirPassengers") # Split the dataset into training and testing sets (80-20 split) train_data <- window(AirPassengers, end = c(1958, 12)) test_data <- window(AirPassengers, start = c(1959, 1)) # Fit the ARIMA model on the training set arima_model <- auto.arima(train_data) # Generate forecasts using the ARIMA model arima_forecast <- forecast(arima_model, h = length(test_data)) # Plot the original data with forecasted values plot(arima_forecast, main = "ARIMA Model on AirPassengers Dataset") lines(test_data, col = "red") legend("topright", legend = c("Forecasted Values", "Actual Values"),       col = c("blue", "red"), lty = 1) # Print evaluation metrics arima_evaluation <- accuracy(arima_forecast, test_data) print(arima_evaluation)

image

image

The plot shows the actual values of the dataset alongside the forecasted values generated by the ARIMA model. Overall, the ARIMA model captures the general trend of the dataset, but there are some discrepancies between the actual and forecasted values.

Check this comprehensive guide on forecasting methods and tools, including ARIMA, ETS, and TSLM.

Wrapping Up: Time-Series Analysis With Timescale

Time-series analysis is a powerful tool for understanding trends, patterns, and seasonality in data that varies over time. R packages like TSstudio provide sophisticated methods for time-series analysis, but the quality of the analysis ultimately depends on the quality and quantity of the data.

Ready to take your time-series analysis to the next level? Get started with Timescale, the powerful PostgreSQL-based platform for time-series data that scales, allowing for efficient and effective analysis of large datasets. With Timescale, you can easily manage and analyze time-series data alongside your business data, and its compatibility with R and other programming languages makes it a versatile tool for data analysis. Plus, you’ll be able to perform much more analysis with fewer lines of code by using hyperfunctions optimized for querying, aggregating, and analyzing time series.  

Sign up for a free trial today and discover how Timescale can help you unlock insights from your time-series data.

On this page