TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Best Practices for Postgres PerformanceTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPY

Database design and modeling

How to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Use PostgreSQL for Data Normalization

Database operations

Best Practices for PostgreSQL Database Operations

Data analysis

Best Practices for PostgreSQL Data Analysis

Data aggregation

Best Practices for PostgreSQL Aggregation

Database replication

Best Practices for Postgres Database Replication

Query optimization

How to Use a Common Table Expression (CTE) in SQL

Scaling postgres

Best Practices for Scaling PostgreSQL

Data management

How to Manage Your Data With Data Retention PoliciesHow to Store Video in PostgreSQL Using BYTEABest Practices for Postgres Data Management

Database security

Best Practices for Postgres Security

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Dec 23, 2024

How to Use PostgreSQL for Data Normalization

Try for free

Start supercharging your PostgreSQL today.

Written by Dylan Paulus

Managing a database is more than just storing information—it's about storing data efficiently, effectively, and balancing trade-offs. We can employ techniques like data normalization to optimize our data for maintainability and readability, or data denormalization to optimize for raw query speed. Data normalization is the process of breaking data down into structured tables to reduce duplication and make it easier to query and store by enforcing standardization. 

In this article, we'll take a look at what data normalization is, how to apply normal forms to achieve data normalization, challenges and tools for data normalization in PostgreSQL, tips on when to denormalize, and how TimescaleDB makes normalization easier.

Why Data Normalization?

The primary benefit of normalizing data is to reduce duplication, but this is just the tip of the iceberg. By reducing data duplication, we can achieve the following benefits:

  • Improve cost: Storing less data means we don't need to pay for holding on to that extra data.

  • Simplify maintenance: Having non-duplicated data means updating that data only needs to occur in a single place, helping maintain data consistency and data integrity.

  • Enhances security: By segmenting data into smaller, related tables, you can apply more granular access controls.

  • Improve scalability: Keeping data organized and structured into smaller tables makes sharding and data access easier as your data scales.

The benefits of normalizing your data are endless, and you'll find that by going through the process of applying the normal forms your data will be easier to query, maintain, and scale over time.

Anomalies

It is worth taking a minute to talk about data anomalies. When we talk about "normalization makes databases easier to maintain," what we're talking about is preventing data anomalies. What is a data anomaly? Anomalies occur when the same data value is stored in multiple places, and a mutation (insert, update, delete) modifies the data in one place but not everywhere. Leading to confusion about what a value should be, and causing possible data loss.

For example, say we had a table to track customer orders which looks like this:

customer_name

address

order_amount

order_items

John

USA

10.50

hair gel, comb

Gary

UK

25.66

backpack

John

USA

2.30

pencil

If we wanted to delete all orders by John, we would lose all the previous order information that, as a company, we might want to maintain. This is a data loss or a delete anomaly. Deleting information about John also deletes previous order information.

customer_name

address

order_amount

order_items

Gary

UK

25.66

backpack

John then moved addresses from USA to UK before their orders were shipped. To reflect John's move, we then need to update every row where customer_name = John. Since the update occurs over multiple rows, there is a chance we don't update all the rows—especially if updates happen through application code. When there are multiple sources of truth, and an update partially changes some of the rows, we end up with an update anomaly.

customer_name

address

order_amount

order_items

John

UK

10.50

hair gel, comb

Gary

UK

25.66

backpack

John

USA

2.30

pencil

Normalizing your databases helps avoid anomalies before they can ever occur.

The Normal Forms

Data normalization follows a set of rules called normal forms. The normal forms start with the first normal form, then the second normal form, and so on. Each normal form builds off the previous normal form. The first three normal forms are the most important, and the ones most databases follow, but many more exist. In this section, we'll look at the first three normal forms, plus the Boyce-Codd normal form—which expands on the third normal form.

First normal form (1NF)

First normal form is the first step in reducing data duplication. A database is in first normal form if the following criteria apply:

  • A column only contains one value (or, in other words, the column is atomic)

  • No row repeats in the table

Example

To walk through the various normal forms, we'll start with a completely denormalized table and move through the different normal forms until we have a clean, normalized database.

Let's take the following denormalized `logs table for a web application monitoring service:

image

This table violates first normal form. First, the tags column contains multiple values (e.g., frontend, backend) so some columns contain multiple values. Second, the first two rows (CRASH | frontend, backend) have exactly identical data. Nothing is distinct about them. We can solve this by introducing a primary key to make each row unique.

image

Each row is distinct from the other solving "no repeating rows." Next, we can remove multiple values from the tags column by creating two new tables. First, a table to hold all available tags. Second, a joining table will facilitate a many-to-many relationship between the logs and the tags.

image

With these three tables, our data is now in first normal form: no two rows repeat and each column contains exactly one value.

Second normal form (2NF)

To be in second normal form, a table must follow these criteria:

  • The table must be in first normal form.

  • All columns are fully dependent on the candidate key.

The wording you'll find around second normal form can be confusing. Put simply, second normal form is generally a rule around tables with composite keys (when two or more columns are combined to form the primary key). The columns in a table must depend on all composite key columns, not just one or a few of the composite keys. 

You might be thinking, what does it mean for a column to depend on another column? Seen as dependent or functionally dependent, this means that knowing the value of one column can determine another column. For example, knowing the course_id in a learning platform database can give you the course name or description. name is dependent on course_id. description is also dependent on course_id.

Example

Another team has implemented an archiving feature to mark logs as archived in our monitoring web app service. The new tables look like this:

image

We get tasked with checking if the changes to the database are normalized. The tables are still in first normal form, but we notice that with the change second normal form is violated. Why? In the log_tags table, the newly added is_archived column marks a log as either archived or not. This flag has nothing to do with tags—tags don't get archived. Since is_archived is dependent on logs and not tags, this table is not in second normal form. Additionally, we can see that is_archived's value is repeated for log_id = 1, log_id = 2, and log_id = 3. 

Luckily this is a simple fix. If we move the column to the dependent table—in this case, move is_archived to the logs table, the table will be in second normal form.

image

With is_archived moved to the logs table our database is not only in first normal form, but now in second normal form.

Third normal form (3NF)

To be in third normal form, the following criteria must apply:

  • The data must be in first and second normal form.

  • There are no transitive dependencies in the table.

A transitive dependency is when a column is dependent on a column that is not the primary key. For example, a package delivery service may create a table for deliveries containing an id, person, and address. In this example, the address is dependent on the person, which in turn is dependent on the id. This is a transitive dependency. address is transitively dependent on the primary key id through person (address -> person -> id).

Example

Logs in our metrics system don't give enough information on what is happening. We get assigned with adding a description to each log. Easy enough, we'll add a new column to the logs table to store descriptions:

image

The update follows the first normal form: there are no repeating rows, and all columns contain a single value. Additionally, all tables follow second normal form, as all columns in their respective tables are dependent on the primary key. But we fail to adhere to the third normal form because description is dependent on slug instead of the primary key id. We can fix this by moving slug and description into their own table.

image

Creating a new levels table puts our database into 3NF. Each column in their respective table is solely dependent on the primary key without any transitive dependencies.

Boyce-Codd normal form (3.5NF)

Most database systems can apply the first three normal forms and have an extremely clean, normalized data model. However, a loophole exists in very infrequent occurrences where third normal form can still result in a table not fully normalized. The loophole is patched with a normal form called the Boyce-Codd normal form. Since the Boyce-Codd normal form is seen as an extension to the third normal form or a more strict version of 3NF, you will commonly see it referred to as the 3.5 normal form. To be in Boyce-Codd normal form, your table must comply with the following:

  • It must be in first, second, and third normal form.

  • Every value in a table should be dependent on every candidate key.

Example

Back in our metrics system, we decide we want to associate levels to an owner and add severity. Each owner is determined by the severity of the level.

image

This current iteration satisfies all the normal forms we've implemented up to this point.

  • 1NF: All values are atomic.

  • 2NF: There are no partial dependencies (all attributes depend on the entire primary key).

  • 3NF: There are no transitive dependencies through non-prime attributes.

However, since owner is dependent on severity and severity is not a candidate key, this table design violates Boyce-Codd's normal form. To fix this, we can split severity and owner into separate tables.

image

With a new category_reviewers table created, and using severity as the primary key, the database is fully in first normal form, second normal form, third normal form, and Boyce-Codd normal form.

Benefits of Data Normalization in PostgreSQL

Database integrity and reduced redundancy

Normalization reduces data duplication by organizing data into separate tables with proper relationships. For example, instead of repeatedly storing a customer's address in every order record, you store it once in a customer table and reference it through foreign keys. This not only saves storage space but also prevents update anomalies, where the same data might be updated in some places but not others.

Simplified data maintenance

When data is normalized, updates only need to be made in one place. If a customer changes their address, you only update it in the customers table rather than hunting down every order record where that address appears.

Better query performance

While some believe normalization always hurts performance, it can actually improve it in write-heavy applications. Smaller, focused tables with proper indexing often perform better than large denormalized tables, especially for updates and inserts. Join operations between properly normalized and indexed tables can be very efficient.

Easier data modifications

Adding new types of data or changing existing structures is simpler in a normalized database. For instance, if you need to add support for multiple shipping addresses per customer, this is much easier when addresses are already in their own table rather than embedded in order records.

Improved data consistency

Normalization enforces referential integrity through foreign key constraints. This prevents orphaned records and ensures data relationships remain valid. For example, you can't delete a customer record if there are still orders referencing it, preventing inconsistent data states.

Challenges and Solutions for Data Normalization in PostgreSQL

When it comes to data normalization in PostgreSQL, even experienced database administrators face several common challenges. Let's dive into these challenges and explore practical solutions that can help you build more robust and efficient databases.

Handling complex data relationships

One of the most significant challenges in data normalization is managing complex relationships between different entities. Imagine your use case is an e-commerce platform where products can belong to multiple categories, have various attributes and maintain price histories. This complexity can quickly become overwhelming.

A practical solution is to implement bridge tables effectively. Instead of creating a tangled web of direct relationships, use intermediate tables to maintain clean many-to-many relationships. For example:

CREATE TABLE products (     product_id SERIAL PRIMARY KEY,     name VARCHAR(255),     base_price DECIMAL(10,2) );

CREATE TABLE categories (     category_id SERIAL PRIMARY KEY,     name VARCHAR(255) );

CREATE TABLE product_categories (     product_id INTEGER REFERENCES products(product_id),     category_id INTEGER REFERENCES categories(category_id),     PRIMARY KEY (product_id, category_id) );

CREATE TABLE price_history (     product_id INTEGER REFERENCES products(product_id),     price DECIMAL(10,2),     effective_date TIMESTAMPTZ,     PRIMARY KEY (product_id, effective_date) );

This structure maintains data integrity while keeping relationships clean and manageable. The bridge table product_categories allows products to belong to multiple categories without violating normalization principles.

The performance trade-off

A common misconception is that higher normalization levels always lead to slower query performance. While it's true that joins can impact performance, denormalization isn't always the answer. The key is finding the right balance for your specific use case.

Here are some strategies you can use for maintaining performance with normalized data:

Use appropriate indexing strategies. Create indexes on frequently joined columns and columns used in WHERE clauses:

CREATE INDEX idx_product_categories_product_id ON product_categories(product_id); CREATE INDEX idx_product_categories_category_id ON product_categories(category_id);

Implement materialized views for complex queries that are run frequently but don't need real-time data:

CREATE MATERIALIZED VIEW product_category_summary AS SELECT p.name AS product_name,        string_agg(c.name, ', ') AS categories,        p.base_price FROM products p JOIN product_categories pc ON p.product_id = pc.product_id JOIN categories c ON pc.category_id = c.category_id GROUP BY p.product_id, p.name, p.base_price;

Effective Data Modeling Strategies

Successful data modeling in PostgreSQL requires a thoughtful approach that considers both present needs and future scalability. Start with a basic model and progressively refine it based on actual usage patterns. For instance, if you're tracking user interactions with products, you might start with a simple events table:

CREATE TABLE user_events (     event_id SERIAL PRIMARY KEY,     user_id INTEGER REFERENCES users(user_id),     product_id INTEGER REFERENCES products(product_id),     event_type VARCHAR(50),     event_timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP );

As your needs evolve, you can split this into more specialized tables without breaking existing applications:

CREATE TABLE product_views (     view_id SERIAL PRIMARY KEY,     user_id INTEGER REFERENCES users(user_id),     product_id INTEGER REFERENCES products(product_id),     view_timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,     session_duration INTEGER );

CREATE TABLE product_purchases (     purchase_id SERIAL PRIMARY KEY,     user_id INTEGER REFERENCES users(user_id),     product_id INTEGER REFERENCES products(product_id),     purchase_timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,     quantity INTEGER,     purchase_price DECIMAL(10,2) );

Constraint management

Use PostgreSQL's robust constraint system to maintain data integrity. This includes not just primary and foreign keys, but also CHECK constraints and unique constraints:

ALTER TABLE products     ADD CONSTRAINT price_check      CHECK (base_price >= 0);

ALTER TABLE product_purchases     ADD CONSTRAINT quantity_check      CHECK (quantity > 0);

Tools for Data Normalization in PostgreSQL

Having your data normalized is the goal for most PostgreSQL databases for the reasons we previously looked at, but getting there has its challenges. 

First, it is difficult and takes time to fully flesh out model complex data relationships. Practicing applying normal forms to denormalized or partially normalized tables can help reduce the complexity of data normalization. Additionally, drawing Entity Relationship Diagrams (ERDs) to get a visual understanding of your data can greatly help in data normalization. Most diagramming tools can create ERD visualization—a popular free one being diagrams.net (draw.io). An example of an ERD visualization using the logs table from the previous normal forms section would look like:

image

A second challenge of normalization is balancing performance with maintainability. We will cover this more in the section about denormalization, but by the process of normalizing data, we end up with smaller, contained tables. To gain meaning out of the data we need to write queries with multiple joins to piece the data together again. Having multiple joins in a query has a negative effect on queries.

Denormalization: When and why 

Throughout this article, we've talked about the benefits of data normalization and how to go about it, but there are scenarios when we would not want to normalize our database. By going through data normalization we break down data into smaller, purpose-built tables to reduce duplication. By doing this, when we write queries we'll need to perform joins to stitch data back together. Joins aren't free and come at a performance cost. PostgreSQL has done a lot to optimize joins operations, but at the end of the day, the only thing faster than a join is not using joins. 

In cases where we need the utmost speed, we can sacrifice maintainability for query speed. This is done by denormalizing data. Just like how we apply the normal forms to normalize data, undoing the normal forms (or doing the opposite) will denormalize data. This looks like the following:

  • Storing multiple data values in a single column

  • Duplicating data between rows

  • Storing unrelated data in a single row

TimescaleDB uses some of these data denormalization techniques to speed up query performance and reduce disk size through compression.

TimescaleDB over PostgreSQL for data normalization

When it comes to picking the best tools for the job, when working with time-series data (or other challenging workloads, like real-time analytics, events, or even vector data) there is no better database than TimescaleDB. Using TimescaleDB simplifies the trade-offs between normalizing and denormalizing data (though you should still evaluate your data models!) through automated compression, continuous aggregates, and partitioning.

In TimescaleDB, compression combines multiple rows in a chunk into a single row, denormalizing the table by reversing first normal form. By compressing multiple rows into a single row, we reduce disk space because a single row takes up less space, but we also speed up queries because no joins are needed to get all the information in a time range.

Features like compression and more allow time-series workloads to scale while maintaining the simplicity of normalized data. In sum, you should choose TimescaleDB over PostgreSQL to normalize your data if your use case ticks one of these boxes:

  • You are working with time-series data or any schema where time is a significant dimension.

  • You require long-term storage and efficient compression for normalized time-series datasets.

  • You frequently perform time-based aggregations or joins across normalized tables.

  • You want built-in automation for partitioning, retention policies, and continuous aggregates.

  • You need to scale time-series workloads while maintaining the logical simplicity of normalized schemas.

Conclusion

Data normalization is a necessary step in making databases maintainable, cost-effective, and performant. In this article, we explored the process of normalizing a database through the normal forms, the pros and cons of normalization, and when you might want to denormalize a table. 

Knowing when to normalize and when the denormalize is an important skill when designing databases—balancing performance with maintainability. TimescaleDB strikes a balance by providing tools to easily maintain the simplicity and maintainability of normalization but with the performance benefits of denormalization.

You can self-host TimescaleDB or leave the worries of managing data infrastructure behind with Timescale Cloud. Get it for free (no credit card required) for 30 days on AWS, Azure, or GCP.

Read more:

  • Data Normalization Tips: How to Weave Together Public Datasets

  • Counter Analytics in PostgreSQL: Beyond Simple Data Denormalization

  • Guide to PostgreSQL Database Design 

  • Best Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables

On this page

    Try for free

    Start supercharging your PostgreSQL today.