How we calculate AI energy & water use
This calculator estimates the inference footprint of AI usage — the energy of running your queries. It does not include the one-time cost of training a model or manufacturing hardware, because those are amortized over billions of queries with estimates too divergent to present honestly.
Key figures at a glance
All estimates with wide (~10×) ranges; full sourcing below.
- Typical text query: ~0.3 Wh (Google measured 0.24 Wh; Epoch AI ~0.3 Wh, 2025)
- Reasoning / long-context query: ~3.9 Wh and up
- Small / efficient model: ~0.05 Wh per query
- One generated image: ~3 Wh (frontier), less for smaller models — priced per image, not per token
- Water per text prompt: ~1–3 mL (datacenter cooling + electricity generation)
- Carbon: ~440 g CO₂e/kWh global average; ~348 g US grid preset (EPA eGRID2023)
- Scope: inference only — not model training or hardware manufacturing
Energy
We model text generation with an affine formula: a fixed per-query overhead plus a marginal cost per token, with output (generated) tokens far more expensive than input (prompt) tokens — because autoregressive decoding does more work per token than reading the prompt. This is why summarizing a long document (large input, short output) costs less than writing a long essay.
Each model class is anchored to a published per-query energy estimate, then scaled by token count. Image generation is priced per image, not per token — a diffusion model has no tokens to scale by, and its cost is dominated by a roughly fixed number of denoising steps per image, so we use a flat per-image energy figure (which varies with resolution and step count, hence the wide range). We then multiply by datacenter overhead (PUE) — but only for figures that don't already include it, to avoid double-counting.
For a frontier model, a typical query of roughly 500 output tokens lands near ~0.3 Wh of compute. That figure is grounded in 2025 production data: Google measured the median text prompt to its Gemini app at 0.24 Wh (August 2025), Epoch AI estimates a typical ChatGPT/GPT-4o query at ~0.3 Wh, and Oviedo et al. 2026 (Joule, arXiv:2509.20241) report a frontier median of 0.31 Wh (IQR 0.16–0.60). Reasoning and long-context queries run much higher (~3.9 Wh and up). An earlier and widely repeated ~3 Wh estimate is now considered roughly 10× too high for a typical short query.
Small/efficient models (7–13B class) are far cheaper — on the order of ~0.05 Wh per inference, anchored to Luccioni et al., Power Hungry Processing (arXiv:2311.16863), which measured small models, and to ML.ENERGY's longitudinal per-token data.
- Frontier-model per-query figure: Google's measured inference disclosure (2025) and Epoch AI inference estimates.
- Small-model and image-generation figures: Luccioni et al., Power Hungry Processing (arXiv:2311.16863), and ML.ENERGY.
- Datacenter PUE ≈ 1.15 (AI/hyperscale): Google reports 1.09 (Q4 2025) and AWS ~1.15 (Google efficiency). The broader industry average is higher (~1.54, Uptime Institute 2025); we use the hyperscale figure because AI inference runs in those efficient facilities.
Water
AI uses water two ways, following the Scope 1 / Scope 2 split first set out in Li et al., Making AI Less Thirsty (arXiv:2304.03271) — still the only study that decomposes on-site and off-site water on a per-energy basis:
- Scope 1 — on-site cooling (water usage effectiveness, WUE): water evaporated cooling the datacenter, about 1.15 mL/Wh — Google's measured 2024 fleet average (arXiv:2508.15734), which grounds this term in real operational data rather than a 2023 estimate.
- Scope 2 — off-site electricity (energy water intensity factor, EWIF): water consumed generating the electricity the datacenter draws, about 3.1 mL/Wh on the US-average grid (Li et al.).
We combine both into a single intensity of roughly 4.0 mL per Wh, measured as water consumption (water evaporated or otherwise lost) rather than withdrawal (water taken and largely returned, which is ~10× larger for grid power and not the right metric here). The range is wide and regional: as low as ~0.5 mL/Wh for a best-case hyperscaler (Microsoft reports WUE ~0.30, with zero-water cooling designs) on a low-water grid, up to ~9 mL/Wh where on-site cooling alone reaches that level in water-stressed regions such as Arizona in summer.
Two measured reality-checks bound this from either side: Google reports a median 0.26 mL per text prompt (on-site cooling only — it excludes electricity-generation water), while Mistral's audited life-cycle analysis puts a single response at ~45 mL (full lifecycle). Our per-Wh figure sits between these because it counts on-site and off-site water but not the broader lifecycle.
One caveat on direction: Li et al. project AI's total water volume rising sharply by 2027 (4.2–6.6 billion m³ withdrawal) — but that is driven by demand growth, not by the per-query intensity rising. The figure above is per-Wh intensity, which is not projected to increase.
Carbon
Carbon = energy × grid carbon intensity. The calculator has a Global / US grid-region toggle. It defaults to the global average (~440 g CO₂e/kWh — centred between IEA Electricity 2026 at 435 g and Ember 2026 at 458 g, both full-year 2025) because a global-audience tool shouldn't be locked to one region, and the US grid is cleaner than the world average. Switching to the US preset uses ~348 g CO₂e/kWh (EPA eGRID2023, the latest release). Your actual grid may be far cleaner or dirtier — pick the region that fits.
Uncertainty
Published energy-per-query figures vary by roughly 10×, so we present a range and headline the median. Water, PUE, and grid intensity are comparatively well-constrained, so the range is driven primarily by the energy figure.
Equivalents
Comparisons are derived from public reference figures: a smartphone charge ≈ 19 Wh and a US household ≈ 29.6 kWh/day (EPA, EIA); a US cup = 240 mL and a WaterSense 2.0-gpm shower ≈ 126 mL/s (FDA, EPA); driving ≈ 0.25 g CO₂ per metre (~400 g/mile, EPA average passenger vehicle). Note: the "Google search" comparison rests on a 2009 Google figure (~0.3 Wh) that is widely cited but dated — a modern search is roughly 7× lower. We include it because it's the most-requested comparison, with this caveat.