Abstract digital landscape representing AI energy data flows
Live Research Data

The Environmental Cost of Artificial Intelligence

A comprehensive analysis of energy consumption, carbon emissions, and water usage across 30 AI models — from lightweight text classifiers to frontier reasoning systems.

20
Models Analyzed
558 Wh
Max Energy / Prompt
205 g
Max Carbon / Prompt
496 mL
Max Water / Prompt

Impact Overview

Energy consumption varies by orders of magnitude across AI task categories. Video generation dominates resource usage, while text classification remains remarkably efficient.

Top Energy Consumers

Max Wh per prompt (log scale)

0.010.11101001000CogVideoX (5sClip...GPT-o3o3-mini /DeepSeek...Llama 3.1 70B /De...GPT-4.5Qwen2-72B-InstructGPT-4.1 Nano /4.1...Agentic AI (GPT-4...

Model Distribution

Text (17)
Image (6)
Video (2)
Audio (2)
Other (3)
Energy consumption visualization

Energy flows across AI model categories

Data Explorer

Sort, filter, and search across all 30 AI models. Energy values are per-prompt. Click column headers to sort.

#
Category
AI Task
Model
Energy (Wh/prompt)
Carbon (gCO₂e)
Water (mL)
Utility / ScoreSrc
1VideoVideo Generation - Comprehensive
CogVideoX (5s Clip)
N/A
944.44
~380.0~1,000.0SOTA photorealistic video / baseline FVD
1234
2TextText: Complex Reasoning
o3-mini / DeepSeek-R1
671B (R1)
33.60
0.76 - 2.807.0 - 40.0SOTA math reasoning / AIME / Math Olympiad
156789
3TextText Generation
Gemini (Median)
N/A
0.240
0.030.261360 Arena Score / General SOTA
2101112
4TextText Generation (ChatGPT)
GPT-4o
100B-200B
2.90
0.020.31400 Arena Score / SOTA reasoning
1013
5TextText (Frontier)
Llama 3.1 405B
405B
1.86
N/AN/AMMLU: 88.6 / Frontier open model
12379
6TextText Generation (Inference)
GPT-o3
N/A
39.20
N/AN/AAdvanced reasoning capabilities
6
7TextText Generation (Inference)
GPT-4.5
N/A
30.50
N/AN/ASOTA multi-modal performance
6
8ImageImage: Standard Diffusion
Stable Diffusion 3 / Medium
2B-8B
1.22
0.24 - 0.48~2.0High aesthetic / Text-to-image fidelity
12514
9TextText Generation (Inference)
Qwen2-72B-Instruct
73B
9.87
3.93N/A0.6 Avg Score (HuggingFace)
151617
10TextText Generation (Inference)
Llama 3.1 70B / DeepSeek R1
70B
33.60
2.042N/A1350 Arena Score / Efficient mid-size
57
11TextText Generation (Inference)
BLOOM
176B
4.00
1.5N/A1100 Arena Score baseline
10
12TextAgentic Workflows
Agentic AI (GPT-4 class)
Large
4.32
N/AN/AComplex multi-step reasoning
18
13TextText (Realistic Production)
Frontier Models (GPT-4 class)
Large
4.32
N/AN/ASOTA reasoning / High utility
7181920
14TextText (Conversational)
LLaMA
65B
0.300
N/AN/A1150 Arena Score
10
15TextText Generation (Inference)
InternLM2.5-7B-Chat
8B
2.23
0.89N/A0.6 Avg Score (HuggingFace)
151617
16TextText (Basic/Simple)
Llama 3.1 8B
8B
0.240
0.01 - 0.100.26MMLU: 68.4 / Efficiency trade-off
1234
17TextText Generation (Inference)
Mixtral 8x22B
141B (MoE)
0.070
N/AN/AEfficient MoE architecture
7
18TextText Generation
Mistral Large 2
N/A
N/A1.14451250 Arena Score
10
19TextText Generation (Inference)
GPT-4.1 Nano / 4.1 Long
N/A
4.83
N/AN/ASmall is Sufficient / SOTA efficiency
68
20ImageImage-Text to Text
InternVL2-8B
8B
0.020
0.01N/A51.2 MMMU Score
1516
21ImageImage Classification
EVA02 Large
305M
1.53
0.61N/A90% ImageNet Accuracy
1516
22ImageImage Classification
TinyViT-21M
21M
0.530
0.21N/A90% ImageNet Accuracy
151617
23ImageObject Detection
DETA-Swin-Large
219M
2.40
0.96N/A0.6 Average Precision
1516
24ImageObject Detection
DETA-ResNet-50
49M
1.20
0.48N/A0.5 Average Precision
1516
25AudioSpeech Recognition
NVIDIA Canary-1B
1B
1.04
0.41N/A33.3 WER
1516
26AudioSpeech Recognition
Whisper Base (EN)
73M
0.200
0.08N/A29.5 WER
1516
27OtherTranslation
Google T5-Large
738M
0.030
0.01N/A32.0 BLEU Score
1516
28OtherTime Series Forecasting
Granite-TimeSeries-PatchTST
616K
0.110
0.04N/A0.6 MSE/Utility
1516
29OtherText Classification
Fine-tuned (Task-Specific)
Small
0.100
N/AN/AHigh accuracy (narrow task)
2
30VideoVideo Generation (Real-time)
LTX-Video
N/A
0.0005
N/AN/AHigh efficiency / 30fps HD
4
Showing 30 of 30 modelsData compiled from multiple research sources

Model Comparison

Select up to 4 models to compare side-by-side

Quick Compare:

No models selected yet

Use the selector above or try a quick preset to get started

Key Findings

Critical insights from analyzing energy, carbon, and water data across 30 AI models and 10+ task categories.

1,888,880x Energy Gap

The most energy-intensive AI task (CogVideoX video generation at 944 Wh) consumes nearly 1.9 million times more energy per prompt than the most efficient (LTX-Video at 0.0005 Wh).

Text Models: Wide Spectrum

Text generation spans from 0.016 Wh (Llama 3.1 8B) to 39.2 Wh (GPT-o3) — a 2,450x range within the same task category, driven by model size and reasoning depth.

MoE Architecture Wins

Mixtral 8x22B (141B params, MoE) uses only 0.07 Wh — dramatically less than dense models of similar capability, demonstrating mixture-of-experts as a key efficiency strategy.

Small Models, Big Impact

Task-specific fine-tuned models (0.06 Wh) and Llama 3.1 8B (0.016 Wh) prove that smaller, specialized models can deliver high accuracy at a fraction of the environmental cost.

Hidden Water Cost

Video generation consumes up to 1,000 mL of water per prompt for cooling — equivalent to a full water bottle. Even text models like Mistral Large 2 use 45 mL per prompt.

Reasoning Tax

Chain-of-thought and agentic reasoning (GPT-o3 at 39.2 Wh, Agentic AI at 4.32 Wh) impose a significant energy premium — the cost of "thinking harder" is measurable.

Carbon emissions visualization

Carbon Emissions

Video generation produces up to 380 gCO2e per prompt — equivalent to driving 1.5 km in a car.

Water consumption visualization

Water Consumption

Data center cooling for a single video generation prompt can consume up to 1 liter of water.

Abstract visualization of scientific data measurement and methodology

Measurement & Verification

Building the foundation for trusted AI environmental data

Transparency & Methodology

Data Sourcing & Model Assumptions

We believe in radical transparency. Every number on this dashboard has a source, every calculation has an assumption, and every assumption has a known limitation. This section documents our methodology so you can evaluate, challenge, and improve it.

Read the Full Mālama AICo2 Methodology (PDF)

The Measurement Problem

There is no industry-standard methodology for measuring AI's environmental footprint. Companies, in the words of the Federation of American Scientists, "report whatever they choose, however they choose." The true carbon footprint of major AI providers may be up to 662% higher than publicly reported figures.

Current metrics like Power Usage Effectiveness (PUE) — a 20-year-old standard — measure only facility efficiency, not how efficiently IT equipment actually uses delivered power. As the FAS describes it: "Like a car that reports how much fuel reaches the engine but not the miles per gallon of that engine."

This dashboard aggregates the best available data from multiple independent sources, but we acknowledge that all estimates carry significant uncertainty. Our goal is not to present definitive numbers, but to make the scale of AI's environmental impact visible and to push for better measurement infrastructure.

Our full estimation framework — including the dual-phase architecture, five-stage pipeline, task-class parameterization, and cryptographic telemetry roadmap — is documented in the Mālama AICo2 Methodology.

Data Sources

RefSourceDescription
1-4MIT Technology Review / ML.EnergyDirect GPU power measurement on Nvidia H100 hardware using ML.Energy benchmarks. 500-prompt test batches for text models; prescribed denoising steps for image/video. GPU energy doubled to estimate total system energy.
2, 10-12Google Cloud Infrastructure BlogGoogle's comprehensive methodology measuring full-system dynamic power including idle machines, CPU/RAM overhead, and data center PUE of 1.09. Median Gemini text prompt: 0.24 Wh, 0.03 gCO2e, 0.26 mL water.
5-9AI Energy Score (Hugging Face)Sasha Luccioni's Code Carbon tool measuring GPU energy during inference. Standardized benchmarking across 166+ AI models with energy efficiency ratings.
6, 8Artificial Analysis / Model BenchmarksInference cost and energy estimates for frontier models (GPT-o3, GPT-4.5, GPT-4.1 series) derived from API pricing, token throughput, and hardware utilization estimates.
15-17Hugging Face Open LLM LeaderboardEnergy consumption metrics from the Hugging Face model hub, including per-inference energy measurements for classification, detection, and generation tasks.
10, 13IEA / EPA / Academic LiteratureInternational Energy Agency data center consumption estimates (415 TWh in 2024), EPA grid carbon intensity factors, and peer-reviewed lifecycle analysis papers.
14Stability AI / Diffusion BenchmarksEnergy measurements for Stable Diffusion 3 variants at different step counts (25 standard, 50 high-quality) and model sizes (2B-8B parameters).
18-20Ampere Computing / Agentic AI ResearchAnalysis of agentic AI operational costs, persistent inference demand patterns, and non-linear infrastructure scaling for autonomous AI workflows.
FAS Policy Memo (Jhaveri & Palat)Federation of American Scientists recommendations for standardized AI energy metrics, mandatory reporting frameworks, and interagency coordination (DOE, NIST, EPA).
Carnegie Mellon UniversityCMU policy recommendations for establishing standardized metrics for AI energy and environmental impacts, including lifecycle measurement frameworks.

Model Assumptions

Click each assumption to see the evidence basis and known limitations. Confidence levels indicate our assessment of reliability.

How We Calculate

Energy (Wh/prompt)

Step 1: Measure GPU power draw during inference using hardware power sensors (NVML/RAPL) or software tools (Code Carbon, ML.Energy).

Step 2: Double the GPU measurement to account for CPU, RAM, networking, storage, and cooling overhead (~50% rule).

Step 3: Apply PUE multiplier (1.1-1.2) for data center infrastructure overhead.

E_total = E_gpu × 2.0 × PUE

Carbon (gCO2e/prompt)

Step 1: Convert energy to kWh.

Step 2: Multiply by location-based grid carbon intensity (gCO2e/kWh) from EPA eGRID or Electricity Maps.

Caveat: Market-based accounting with RECs can show near-zero emissions even when actual grid mix is carbon-heavy.

CO2 = E_total × GridIntensity

Water (mL/prompt)

Step 1: Estimate cooling energy from total energy and PUE breakdown.

Step 2: Apply Water Usage Effectiveness (WUE) ratio — liters of water per kWh of IT energy.

Caveat: Air-cooled facilities use minimal water; evaporative cooling facilities use significantly more. Few providers disclose WUE.

H2O = E_cooling × WUE

We Welcome Better Methods

This dashboard is a living document. We recognize that our methodology has significant gaps — particularly around water consumption, agentic AI workloads, and the variance between estimated and actual data center operations.

If you have access to better data, more rigorous measurement approaches, or can identify errors in our assumptions, we want to hear from you. The goal is not to be right — it is to be progressively less wrong and to build toward a future where AI's environmental impact is measured with trusted, objective, real-time data rather than estimates and extrapolations.

This is exactly why we advocate for hardware-level sensor integration inside data centers — to replace estimation with measurement. See the Agents & Sensors section below for our vision.

Submit a Contribution
Agentic AI & Sensors

The Compounding Cost of Autonomous AI

As AI evolves from single-query chatbots to autonomous agents that plan, execute, and iterate, the energy footprint doesn't just grow — it compounds. Traditional "energy per query" metrics fail to capture the true cost of agentic workflows.

From Episodic to Continuous

A single agentic workflow can involve multiple model calls, data retrieval, validation loops, and downstream integrations. Unlike a simple chatbot query, agents create persistent, background compute demand.

As Ampere Computing's research (April 2026) warns: infrastructure demand from agentic AI grows in non-linear ways. Automated decisions generate follow-up processes, and workflows branch into additional tasks. This multiplicative effect is easily underestimated.

The FAS policy memo acknowledges this shift directly: "When we move from chatbots to agentic AI systems that plan, act, remember, and iterate autonomously, traditional 'energy per query' metrics no longer capture the full picture."

Energy Escalation Ladder

Single Text QueryLlama 3.1 8B
0.016 Wh1 calls
Standard Chat SessionGPT-4o
0.3–2.9 Wh1–5 calls
Complex Reasoningo3-mini / DeepSeek-R1
1.9–33.6 Wh1–3 calls
Agentic WorkflowGPT-4 class Agent
4.32+ Wh3–10+ calls
Autonomous Agent LoopMulti-model Pipeline
10–100+ Wh10–50+ calls

Autonomous agent loops can invoke 10-50+ model calls per task. At enterprise scale with thousands of concurrent agents, energy costs compound exponentially. Current measurement infrastructure cannot track this.

10-50x
More model calls per task vs. single query
Non-Linear
Infrastructure demand growth as agents scale
Always-On
Persistent compute demand, not episodic usage
The Solution: Trusted Measurement

From Estimation to Observation

Every number on this dashboard is an estimate. The real goal is to replace estimation with measurement — deploying hardware-level sensors inside data centers to create an environment of trusted, objective, verifiable data that can dynamically and drastically reduce AI's environmental impact.

Data center with IoT sensors

Hardware-Level Sensor Integration

IoT sensors deployed at rack, server, and cooling system level — measuring actual environmental impact, not estimates.

Today: Estimation & Opacity

  • ×Software-based GPU power measurement (CodeCarbon, NVML)
  • ×Rule-of-thumb extrapolation: GPU = ~50% of total energy
  • ×Average PUE values applied uniformly across facilities
  • ×Regional grid carbon intensity averages (not real-time)
  • ×Water consumption rarely disclosed, often unknown
  • ×Self-reported data with no independent verification
  • ×True emissions may be 662% higher than reported (FAS)

Future: Measurement & Trust

  • Hardware-level sensors at rack, server, and GPU level
  • Actual energy measurement per workload, per inference
  • Real-time PUE calculated per facility, per hour
  • Live grid carbon intensity from Electricity Maps / sensors
  • Flow sensors on cooling systems measuring actual water use
  • Blockchain-verified, tamper-proof measurement records
  • AI-powered dynamic optimization reducing impact in real-time

Sensor Capabilities for Data Centers

Real-Time Energy Metering

Hardware-level power sensors at the rack, server, and GPU level — measuring actual energy consumption per workload, not estimates derived from GPU draw alone.

kWh per inference

Carbon Emissions Tracking

Live grid carbon intensity integration combined with actual energy measurements to calculate real-time, location-based carbon emissions — not market-based credits.

gCO2e per prompt

Water Consumption Monitoring

Flow sensors on cooling systems measuring actual water consumption per rack, correlated with compute workloads to attribute water usage to specific AI tasks.

mL per inference

Thermal & Airflow Analysis

Temperature, humidity, and airflow sensors creating a real-time thermal map of the data center — identifying cooling inefficiencies and hot spots for optimization.

°C / CFM per zone

The Measurement-to-Optimization Pipeline

01

Deploy Sensors

IoT sensors installed at rack, server, and cooling system level across the data center.

02

Measure Everything

Real-time energy, water, thermal, and airflow data streamed continuously — not sampled or estimated.

03

Verify On-Chain

Measurement data anchored to blockchain (Cardano) for tamper-proof, immutable audit trails — creating trusted, objective records.

04

AI-Powered Analysis

Machine learning models analyze sensor data to identify inefficiencies, predict failures, and recommend optimizations.

05

Dynamic Optimization

Automated workload scheduling, cooling adjustments, and resource allocation based on real-time environmental data.

06

Verified Reporting

Standardized, auditable environmental reports generated from actual measurements — replacing self-reported estimates.

Malama dMRV: Trusted Environmental Data

Digital Measurement, Reporting & Verification

The Malama dMRV (digital Measurement, Reporting, and Verification) infrastructure — already proven in carbon credit verification for agriculture and land management — provides the technological foundation for trusted data center monitoring.

By combining IoT sensors for real-time environmental data collection, AI-powered analysis for pattern recognition and optimization, and blockchain verification (Cardano) for tamper-proof audit trails, the platform creates an environment where environmental claims are backed by verifiable, objective measurements — not self-reported estimates.

Extending this infrastructure into data centers means every energy reading, every water flow measurement, and every carbon calculation is recorded immutably on-chain. Operators, regulators, and the public can independently verify environmental impact claims, creating the accountability that the FAS, CMU, and industry researchers have called for.

Read the Full Mālama AICo2 Methodology
Real-Time
Sensor Data
On-Chain
Verification
Immutable
Audit Trail
AI-Driven
Optimization

The Impact: What Trusted Data Enables

Dynamic Workload Scheduling

Route AI inference to facilities with the lowest real-time carbon intensity, shifting compute to when and where clean energy is available.

Cooling Optimization

Real-time thermal data enables predictive cooling adjustments — reducing water consumption by matching cooling output to actual heat load, not worst-case estimates.

Hardware Utilization

Identify idle capacity and underutilized servers that consume energy without productive output. Google found idle machines are a significant hidden cost.

Regulatory Compliance

Automated, verifiable reporting that meets emerging EU AI Act requirements and anticipated US mandatory disclosure frameworks (FAS/CMU recommendations).

Carbon-Aware Computing

Schedule training runs and batch inference during periods of high renewable energy availability, verified by real-time grid data rather than annual averages.

Accountability & Trust

Replace the current 662% reporting gap with blockchain-verified measurements that investors, regulators, and the public can independently audit.

Open Research

Contribute to Better Data

This dashboard is a living document. If you have access to better data, more rigorous measurement approaches, or can identify errors in our assumptions — we want to hear from you. The goal is to be progressively less wrong.

20+
Sources Cited
6
Model Assumptions
Open
Methodology
v2.0
Current Version

What are you contributing?

Submission Guidelines

  • Include DOI or URL for any referenced papers
  • Specify hardware and inference conditions if submitting energy data
  • Note confidence level and known limitations
  • Peer-reviewed sources are prioritized but not required
Review our full AICo2 Methodology (PDF)
New Data SourceAll fields with * are required

Submissions are reviewed by our research team. Accepted contributions will be credited in the data sources table. We prioritize peer-reviewed research but welcome all evidence-based input.

What Happens Next

Step 01

Submit

Send your data, correction, or methodology proposal via the form above.

Step 02

Review

Our research team evaluates submissions against existing sources and methodology.

Step 03

Integrate

Accepted contributions are incorporated into the dashboard with full attribution.

Step 04

Publish

Updated data and methodology changes are reflected in the next version release.