---
title: "AI's Physical Constraints: How AI Rewired the Data Center"
published: 2026-07-02T15:34:34.000-04:00
updated: 2026-07-02T15:34:34.000-04:00
excerpt: "Why AI capacity stopped behaving like elastic compute and started depending on physical infrastructure, power, and place."
tags: Thought Leadership, AI
authors: Hien Phan, Noah Hein, Matty Stratton
---

> **TimescaleDB is now Tiger Data.**

For most of the cloud era, a server rack was a five to twenty kilowatt object. You could fill a room with them, move air across the front, and the building stayed an ordinary building. A single current AI rack, NVIDIA's GB300 NVL72, draws about 132 to 140 kilowatts, with the GPUs alone accounting for more than a hundred. That is close to an order of magnitude more power in the same floor space as those old racks, and it lands as heat in the same small volume. Past roughly a hundred kilowatts per rack, air stops being able to carry the heat out, and the rack has to be plumbed for liquid. The compute got denser and the building changed with it.

This pattern repeats all the way out to the grid. For about fifteen years, getting more computing power felt like turning a dial. You needed more, you asked for more, and a few seconds later it was there. Spin up a hundred servers for a traffic spike, spin them back down when it passes. Capacity behaved like something continuous, instant, and reversible, a knob you turned rather than a thing you built. A generation of software was designed on that assumption, and it held, because for ordinary workloads the power and hardware involved were small against what the world could supply.

That has changed. Across AI infrastructure projects, the same moment now repeats. A team asks for more, and the answer comes back no. Not "no, that costs more," which everyone understands, but a harder no. "No, those GPUs are not available this quarter." "No, that region has no more power, and will not for years." You can order a GPU in a day; the date that a few hundred megawatts arrives at a site can be four or five years out, and no amount of money moves it sooner. The request that used to be a billing question has become a physical one.

Anyone who has designed, built, or run a data center knows the physical layer was always there. The people who design and build these facilities sized the transformers, ordered the switchgear, planned the cooling, and waited on the utility. What is new is the scale at which AI hits the physical layer. The data centers going up for AI are a different class of build: denser, hotter, hungrier, and more tightly coupled to the grid than the ones that came before.

Data centers always needed chips, memory, cooling, power, and water. Most cloud workloads before the AI surge kept those requirements in a range the existing build could absorb. AI pushes them past the thresholds where the old assumptions hold. It does not create a new kind of physics. It removes the buffer that made the physics easy to ignore.

Each of these limits has been written about on its own. What gets missed is how they _connect_, and why AI makes them arrive together. AI scaling moves through a physical dependency chain: more accelerators require scarce chip packaging and memory; more memory and compute concentrate heat; concentrated heat changes the rack and the building; the building then needs power the grid may take years to deliver; at that scale, the power itself may have to be buffered inside the facility; and the cooling choices made along the way determine where water becomes a problem. The limits arrive in a predictable order, and the order is the story:

1.  **GPUs.** The first visible shortage is accelerators, the GPUs that do the AI computation, but the real bottleneck sits _around_ the chip, not _in_ it.
2.  **Memory.** The accelerator depends on high-bandwidth memory, which pulls on the same finite wafer base as ordinary memory.
3.  **Cooling.** More compute and memory in the same space means more heat in the same rack, past what air can carry.
4.  **Power and time.** Liquid cooling moves heat, but every watt still has to come from the grid. First you wait for power to arrive; then, at AI scale, you may need to buffer the workload's own power swings.
5.  **Water.** Not the national catastrophe the headlines suggest, but a local siting constraint shaped by cooling design.

Let's start with the one everyone already knows.

## GPUs: The Bottleneck Is Not the Chip

The first wall everyone notices is GPUs. You cannot get them, you cannot get enough, or the price to rent them has climbed since last year. The figures are not subtle. H100 rental prices rose roughly 40 percent off their late-2025 lows in a matter of months, [from about $1.70 to $2.35 per GPU-hour on one-year contracts](https://newsletter.semianalysis.com/p/the-great-gpu-shortage-rental-capacity) between October 2025 and March 2026, and on-demand capacity is effectively sold out across GPU types. The pressure reaches the workstation end too. In June 2026 NVIDIA listed its [RTX Pro 6000 Blackwell at $13,250](https://www.tomshardware.com/pc-components/gpus/nvidia-raises-rtx-pro-6000-blackwell-gpu-pricing-to-usd13-250-55-percent-increase-over-msrp-in-a-years-time), a 55 percent jump over the $8,565 launch price a year earlier, and the reason it gave was the 96 gigabytes of memory on the card in a market where memory is scarce.

The obvious reading is that NVIDIA cannot make enough chips, but that is not where the bottleneck lives. A modern AI accelerator is not one chip but a package: the processor die, stacks of high-bandwidth memory (HBM), and an interposer that wires them together at enormous bandwidth. A faster processor does not help if it cannot be packaged with memory, and advanced packaging capacity is finite, specifically the chip-on-wafer-on-substrate (CoWoS) lines at TSMC. The same [analysis that tracked the rental spike](https://newsletter.semianalysis.com/p/the-great-gpu-shortage-rental-capacity) named CoWoS packaging and HBM, not the processor, as the choke points. The lead times that stretch GPU orders toward a year are gated there. The chip is not the scarce thing; what surrounds it is.

So the GPU shortage is really a packaging and memory shortage wearing a GPU label. Packaging capacity can expand, but it expands on a manufacturing clock. Memory is the harder half, and the reasons it stays scarce are the next wall.

## Memory: Why the Price Will Not Come Down

When a component spikes in price, the reflex is to wait. Shortages end, factories ramp, the price comes back down. That reflex is wrong here, and the reason is structural, not cyclical.

The memory AI systems need comes in two kinds. HBM is the fast, expensive memory stacked right next to the accelerator, where the model's working data lives during computation. Dynamic random-access memory (DRAM) is the ordinary system memory around it. The binding shortage is HBM, and the pressure spills into DRAM because both pull on the same finite wafer base. SK Hynix, the leading maker of HBM, [locked up its 2026 HBM output](https://www.notebookcheck.net/SK-hynix-sells-out-its-DRAM-NAND-and-HBM-chip-supply-to-Nvidia-through-2026-as-AI-demand-outpaces-Samsung-and-Micron-s-capacity.1151402.0.html) well ahead of the year, and Micron has likewise reported its 2026 HBM sold out. Memory has gone from a rounding error in a machine's bill of materials to the single largest driver of the price on a high-end card.

So why not just make more? Essentially all the world's DRAM comes from [three companies](https://www.techtimes.com/articles/318052/20260609/samsung-leads-dram-market-share-386-sk-hynix-trails-revenue-tops-profit-margins.htm): Samsung and SK Hynix in South Korea, and Micron in the United States. When all three make the same allocation call at once, that is the global supply. There is no fourth maker at comparable scale waiting to undercut them.

Adding supply means building a fabrication plant, and a fab is not a factory you stand up in a quarter. It takes years of cleanroom construction, tool installation, and qualification before a single sellable chip comes out. HBM makes the squeeze worse, not better: each gigabyte of it uses about three times the wafer capacity of DDR5, today's standard volume DRAM, so every wafer redirected to the scarce thing makes the common thing scarcer still, and HBM already consumes [roughly a quarter of all DRAM wafer output](https://tech-insider.org/memory-chip-shortage-2026-ai-consumer-electronics/).

There is a second reason, and it is a choice rather than a constraint. The makers are steering wafers toward high-margin AI and enterprise memory and away from everything else, because that is where the money is. A new fab does not automatically reverse that, because the same margin logic governs what it chooses to build. IDC, the market-research firm, [projects 2026 DRAM supply growth of only about 16 percent year over year](https://tech-insider.org/memory-chip-shortage-2026-ai-consumer-electronics/), well below the 20 to 30 percent that was historically normal, even as demand for the AI variety grows far faster. The people running these companies are saying so directly. Intel's chief executive, Lip-Bu Tan, relayed in February 2026 what two of the key memory makers had told him: there is [no relief until 2028](https://www.bloomberg.com/news/articles/2026-02-03/intel-ceo-says-there-s-no-relief-on-memory-shortage-until-2028).

The makers could produce more, but it takes years, only three companies do it at global scale, the AI memory eats three times the wafers, and they earn far more selling to AI than to you. Part of the shortage is physics, the supply is simply years out. Part of it is choice: the capacity that does exist is being pointed at AI, not at you. You have not been priced out for a quarter. **You have been outbid.**

## Cooling: Why the Rack Changed Shape

GPUs and memory are still things you buy, even when you have to wait. The next wall is different. The same density that makes AI systems powerful, more transistors on the die and more memory stacked beside it, becomes heat the moment the hardware enters a building.

Every watt of power a machine draws comes back out as heat. Whatever goes in has to come out, or the machine cooks itself. It sounds too simple to matter, and it is the whole reason the rack changed shape.

For most of the history of computing, taking the heat out meant moving air. Servers in a rack, cool air through the room, and that was enough. At the rack densities of the CPU era, even at the top of the range, the airflow was manageable enough that the building stayed recognizably the same kind of building: rows of racks, cold aisles, hot aisles, chillers, and fans. Anyone who built those rooms knows the envelope.

AI did not make heat new. It made heat _dense_.

All the capability crammed into the die and the memory turns into heat in the same small volume. The [rack densities from the opening](https://www.nvidia.com/en-us/data-center/gb300-nvl72/), an order of magnitude higher than the CPU era, are really heat-removal figures: every one of those kilowatts has to be carried back out. The next generation on the roadmap, the Vera Rubin systems, is projected to push per-rack density several times higher again, and [cooling vendors are already designing for the increase](https://www.tomshardware.com/pc-components/cooling/cooling-system-for-a-single-nvidia-blackwell-ultra-nvl72-rack-costs-a-staggering-usd50-000-set-to-increase-to-usd56-000-with-next-generation-nvl144-racks). At those densities, air cooling stops being practical. Bigger fans do not solve the volume problem.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/07/diagram-2.png)

__Figure 1: Rack power density by generation. Past roughly 100 kilowatts, air cooling stops working and liquid becomes mandatory. The Rubin Ultra figure is a roadmap projection.__

So the machine changed shape. A given volume of air can carry only so much heat away before it has to move faster than is practical, while water carries roughly three to four thousand times as much heat per unit volume as air. Past roughly a hundred kilowatts per rack, that gap stops being an efficiency question and becomes a hard limit, and the model flips: from moving air through a room to carrying heat away in liquid piped directly to the chip. The newest systems do not offer an air-cooled option at all. The GB300 NVL72 is [fully liquid-cooled](https://www.nvidia.com/en-us/data-center/gb300-nvl72/). The rack is no longer just electrical equipment. It now has plumbing.

Capital helps with procurement and retrofits. It does not change the thermal limits of air. This wall is geometry and thermodynamics. And it has a consequence even experienced operators feel: the hardware no longer fits in most existing buildings. A data center built for air, even one finished a couple of years ago, often cannot host these racks without being substantially rebuilt, retrofitted for liquid distribution, higher floor loading, and the plumbing that comes with it. You cannot simply drop the latest GPUs into the footprint you already have. For an operator who has spent a career optimizing airflow, that is the moment it becomes clear this is a different kind of building.

Liquid cooling changes how heat leaves the chip. It does not change where the energy comes from. Every watt still starts at the grid, and that is where the slowest constraint appears.

## Power and Time: The Wall Underneath the Walls

What limits AI in the end is not chips, and it is not cooling. It is electricity, and specifically it is time. You can buy a GPU in a day. You cannot buy the specific date on which a few hundred megawatts will be delivered to a site. That is set by a physical system that moves on a timescale of _years_. Money can fund equipment and alternatives, but it does not make shared grid capacity appear on software time.

The building can be designed and built on one clock; the grid upgrades that let it draw full power often run on a longer one. Before a site can draw full power, the utility has to study and approve the load, the transmission system has to support it, and the substations and lines that feed it have to exist. The industry calls this **time-to-power**: the interval between choosing a site and being able to draw the load you planned around. For large AI sites, that interval can define the project. The upstream grid work resists money in a way the other constraints do not, because the transmission lines and substations are shared infrastructure that serves everyone on the grid, so a new load cannot simply pay to skip the queue without the wires actually being built. You can finish the building and then wait to turn it all the way on.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/07/diagram-3.png)

__The time-to-power gap. You can buy a GPU in a day, but energizing a site takes years, and the build can finish while the power wait continues.__

There is another clock running alongside the approval queue: the equipment itself. Even once a project is cleared to connect, the high-voltage transformers and switchgear that tie it to the grid can be in shortage. Lead times for large power transformers have stretched from roughly two years before 2020 to as long as five years now, and industry estimates suggest a [meaningful share of planned 2026 data-center capacity could slip](https://finance.yahoo.com/sectors/technology/articles/half-planned-us-data-center-150928890.html) for want of power equipment and grid connections. Electrical gear is not the biggest line item in a data center. It can still decide when the building turns on, a reversal any operator who has waited on a transformer order will recognize.

The grid backlog around these projects is large. At the end of 2025, more than 2,000 gigawatts of generation and storage capacity were waiting in line to connect to the US grid, roughly twice the entire installed US power fleet. More waiting to connect than currently exists. That queue is not the same thing as a data-center load request, but it shows the condition of the shared infrastructure every large new load depends on: the wires, substations, studies, and upgrades are all moving on a multi-year clock. [Lawrence Berkeley National Laboratory](https://emp.lbl.gov/publications/queued-2025-edition-characteristics), which tracks those queues, finds the median time from request to commercial operation has roughly doubled, from under two years for projects built in the early 2000s to four to five years now.

This is not abstract, and it is not only an American problem. Ireland is the cleanest example. Dublin had become one of Europe's great data center hubs until the grid could not keep up, and in 2021 the grid operator EirGrid and the Commission for Regulation of Utilities imposed what amounted to a [moratorium on new connections in the Dublin area](https://www.iiea.com/blog/data-centres-in-ireland-the-state-of-play). One Amazon project and two Microsoft projects were among those [turned away and relocated to London, Frankfurt, and Madrid](https://www.datacenterdynamics.com/en/news/microsoft-aws-equinix-join-list-of-companies-pausing-data-center-projects-in-dublin/). By 2024, data centers were drawing [around 21 percent of all the electricity in the country](https://www.iiea.com/blog/data-centres-in-ireland-the-state-of-play). The moratorium [eased only in December 2025](https://www.bloomberg.com/news/articles/2025-12-12/ireland-set-to-end-moratorium-on-new-power-links-to-data-centers), and the new terms show where things are headed: a new facility now has to bring its own power generation or storage rather than simply draw from the grid.

The power industry itself is reorganizing around this demand. In May 2026, NextEra Energy announced a roughly [$67 billion all-stock plan to acquire Dominion Energy](https://www.cnbc.com/2026/05/18/nextera-nee-dominion-energy-d-data-center-ai.html), the utility behind northern Virginia's data center corridor.

That is the first half of the power story: getting electricity to the site. The second half starts once it arrives, and it is where AI looks least like the loads the grid grew up serving. A large training run is synchronized. Tens of thousands of accelerators compute, pause together to exchange results, and compute again. Power draw follows that loop. A single H100-class GPU draws far less at idle than under compute, so when tens of thousands switch states together, the facility's load can swing by tens of megawatts in seconds or less. Meta reported [swings around 30 megawatts on a 24,000-GPU cluster](https://newsletter.semianalysis.com/p/ai-training-load-fluctuations-at-gigawatt-scale-risk-of-power-grid-blackout) training Llama 3.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/07/diagram.png)

__A synchronized training cluster swings between compute and pause many times a second. The grid is built to follow the smooth aggregate of many independent users, not one correlated load moving in lockstep.__

The grid was built around load diversity, where thousands of independent homes and businesses average into something smooth and predictable. A synchronized training cluster is neither diverse nor smooth, and that is the part that is new even to people who have planned power for a living. The stability problem is broader than training-loop swings: large data-center loads can also behave unexpectedly during grid disturbances. In July 2024, a transmission fault in Northern Virginia caused [roughly 1,500 megawatts of data-center load to disconnect itself within 82 seconds](https://www.nerc.com/globalassets/our-work/reports/event-reports/incident_review_large_load_loss.pdf), an event the North American Electric Reliability Corporation (NERC), which sets and enforces reliability standards for the North American bulk power system, said the system had never seen at that magnitude.

The fix moves on-site. xAI's Colossus cluster in Memphis installed [about 150 megawatts of grid-scale battery storage](https://www.datacenterdynamics.com/en/news/xai-deploys-168-tesla-megapacks-to-power-its-colossus-supercomputer-in-memphis/) alongside its power infrastructure. The point of that storage is not how much energy it holds but how fast it can absorb and deliver power. A small, fast store placed in front of a slower supply is a cache. Here the slow backing store is the grid, and the fast store is local batteries and power electronics. Batteries are no longer only backup equipment. In these designs, they can become part of workload control. At AI scale, power stops being merely an input to the computer. It becomes part of the computer's design.

And a design has to be operated. Once batteries, power electronics, cooling loops, and GPUs act as a single system, someone has to watch them as one: how power draw tracks compute, how the batteries answer a training swing, how heat follows the load. Those measurements arrive every second, from equipment that used to belong to three different teams. Read after the fact, they tell you what broke. Read live, they keep the loop stable.

Power is where the earlier constraints converge. The GPU you could not get, the region that was full, the building that needed rebuilding, the batteries now sitting between the workload and the grid: each one traces back to the same place, a power system that has to be built and buffered, on the grid's schedule, not the software team's.

## Water: In Proportion

Power is the hardest wall because it sets the clock and, through on-site batteries and power electronics, becomes part of the machine's own design. Water is different. It sits downstream of cooling design and geography, which makes it more local, more variable, and more solvable than the public debate suggests. AI makes the siting choice more visible because the facilities are larger and denser, but the water problem still depends on design. That matters because water draws the most public attention of any of these constraints, and some of the least accurate reporting.

One distinction matters before any number makes sense: water withdrawn is not water consumed. Withdrawal is what a facility takes in; consumption is what it uses up, mostly through evaporation. A facility can withdraw a large volume and return most of it, or consume nearly all of what it takes, depending entirely on the cooling design.

Nationally, the figure is modest. As of 2021, all US data centers combined accounted for [roughly 449 million gallons of water a day](https://ketos.co/data-centers-water-usage-myths), about three to four tenths of one percent of total US water withdrawals, far below agriculture or power generation. The headline framing of data centers draining the country's water is not supported by the national figures.

The real issue is local. [Roughly 40 percent of US data centers sit in areas of high or extreme water stress](https://ketos.co/data-centers-water-usage-myths), so even a small national share can land hard on a particular community. Stated that way, it is a siting problem, real and solvable, rather than an indictment of the technology.

The cooling design is what sets the consumption. On-site consumption ranges from nearly nothing, for an air-cooled or closed-loop facility, to as much as [70 to 80 percent of what was withdrawn](https://ketos.co/data-centers-water-usage-myths), for an open evaporative one. A single large evaporative facility can use something like [five million gallons a day, comparable to a town of fifty thousand people](https://www.brookings.edu/articles/ai-data-centers-and-water/). The same facility, built closed-loop, can use almost none. The high number and the low number describe the same building with two different cooling choices.

This is the constraint the industry is most actively engineering away. Closed-loop systems fill once and recirculate rather than evaporate. The same shift to liquid and direct-to-chip cooling described earlier can cut water needs dramatically, [by up to 95 percent in some designs](https://www.eesi.org/articles/view/data-centers-and-water-consumption), and immersion cooling can eliminate evaporative water use altogether. Reclaimed wastewater is increasingly used in place of drinking water. Of the five walls in this piece, water is the one where the engineering response is furthest along, which is exactly why it deserves to be described accurately rather than dramatically.

## The Reserve Ran Out

The five walls are not five problems. They are one fact seen from five _angles_. None of this is new physics: the power was always physical, the heat was always real, capacity always took years to build. What changed is that the cloud era ran on a deep reserve of capacity built ahead of demand, and as long as that reserve lasted, the limits underneath stayed out of view. You turned a dial and the reserve answered. AI has drawn that reserve down, and at a scale the old infrastructure was never built to carry, so the limits are back in view all at once.

That is why the data centers rising for AI are a different class of build, and why the people who built the last generation look at the numbers and recognize that the rules they worked under have moved. The accelerator depends on packaging and memory, the rack depends on liquid cooling, and the building depends on power. At this scale, the power itself needs a buffer. The site depends on grid capacity, water choices, and time. The next time a capacity question lands on your desk, ask where it will physically live and how long the power takes, before you ask what it costs. The cloud used to be an abstraction. It has an address now.

The five walls are physical. Operating inside them is not. Once the facility and the computer are one coupled system, running it means reading it as one: GPU power draw, cooling response, battery state, and grid posture, measured together and fast enough to act while the numbers are still true. That is not a facilities dashboard on a five-minute refresh. It is a [live, correlated, high-frequency record](https://www.tigerdata.com/blog/tiger-lake-a-new-architecture-for-real-time-analytical-systems-and-agents) of a machine that now runs from the silicon to the substation. Capturing that record, and [querying it before it goes stale](https://www.tigerdata.com/blog/real-time-analytics-for-time-series-continuous-aggregates), is its own problem.

## Get Started

Operational telemetry only helps if you can query it while it is still true, at the rate it arrives. That is the workload Tiger Data is built for: [time-series and event data on Postgres](https://www.tigerdata.com/learn/guide-to-postgresql-scaling), fresh and correct, [without splitting into a second system](https://www.tigerdata.com/blog/postgres-optimization-treadmill). [Start a free Tiger Cloud trial](https://console.cloud.timescale.com/signup). Running on-premises, at the edge, or air-gapped? [TimescaleDB Enterprise](https://www.tigerdata.com/newsroom/tiger-data-launches-timescaledb-enterprise-a-self-managed-time-series-database-built-for-on-premises-and-edge-deployment) is built for those deployments and is taking design partners.