Abstract digital landscape representing AI energy data flows

Live Research Data

The Environmental Cost of Artificial Intelligence

A comprehensive analysis of energy consumption, carbon emissions, and water usage across 30 AI models — from lightweight text classifiers to frontier reasoning systems.

Explore Data Methodology Agents & Sensors AICo2 Paper

Models Analyzed

558 Wh

Max Energy / Prompt

205 g

Max Carbon / Prompt

496 mL

Max Water / Prompt

Impact Overview

Energy consumption varies by orders of magnitude across AI task categories. Video generation dominates resource usage, while text classification remains remarkably efficient.

Top Energy Consumers

Max Wh per prompt (log scale)

Model Distribution

Text (17)

Image (6)

Video (2)

Audio (2)

Other (3)

Energy flows across AI model categories

Data Explorer

Sort, filter, and search across all 30 AI models. Energy values are per-prompt. Click column headers to sort.

#	Category	AI Task	Model	Energy (Wh/prompt)	Carbon (gCO₂e)	Water (mL)	Utility / Score	Src
1	Video	Video Generation - Comprehensive	CogVideoX (5s Clip) N/A	944.44	~380.0	~1,000.0	SOTA photorealistic video / baseline FVD	1234
2	Text	Text: Complex Reasoning	o3-mini / DeepSeek-R1 671B (R1)	33.60	0.76 - 2.80	7.0 - 40.0	SOTA math reasoning / AIME / Math Olympiad	156789
3	Text	Text Generation	Gemini (Median) N/A	0.240	0.03	0.26	1360 Arena Score / General SOTA	2101112
4	Text	Text Generation (ChatGPT)	GPT-4o 100B-200B	2.90	0.02	0.3	1400 Arena Score / SOTA reasoning	1013
5	Text	Text (Frontier)	Llama 3.1 405B 405B	1.86	N/A	N/A	MMLU: 88.6 / Frontier open model	12379
6	Text	Text Generation (Inference)	GPT-o3 N/A	39.20	N/A	N/A	Advanced reasoning capabilities	6
7	Text	Text Generation (Inference)	GPT-4.5 N/A	30.50	N/A	N/A	SOTA multi-modal performance	6
8	Image	Image: Standard Diffusion	Stable Diffusion 3 / Medium 2B-8B	1.22	0.24 - 0.48	~2.0	High aesthetic / Text-to-image fidelity	12514
9	Text	Text Generation (Inference)	Qwen2-72B-Instruct 73B	9.87	3.93	N/A	0.6 Avg Score (HuggingFace)	151617
10	Text	Text Generation (Inference)	Llama 3.1 70B / DeepSeek R1 70B	33.60	2.042	N/A	1350 Arena Score / Efficient mid-size	57
11	Text	Text Generation (Inference)	BLOOM 176B	4.00	1.5	N/A	1100 Arena Score baseline	10
12	Text	Agentic Workflows	Agentic AI (GPT-4 class) Large	4.32	N/A	N/A	Complex multi-step reasoning	18
13	Text	Text (Realistic Production)	Frontier Models (GPT-4 class) Large	4.32	N/A	N/A	SOTA reasoning / High utility	7181920
14	Text	Text (Conversational)	LLaMA 65B	0.300	N/A	N/A	1150 Arena Score	10
15	Text	Text Generation (Inference)	InternLM2.5-7B-Chat 8B	2.23	0.89	N/A	0.6 Avg Score (HuggingFace)	151617
16	Text	Text (Basic/Simple)	Llama 3.1 8B 8B	0.240	0.01 - 0.10	0.26	MMLU: 68.4 / Efficiency trade-off	1234
17	Text	Text Generation (Inference)	Mixtral 8x22B 141B (MoE)	0.070	N/A	N/A	Efficient MoE architecture	7
18	Text	Text Generation	Mistral Large 2 N/A	N/A	1.14	45	1250 Arena Score	10
19	Text	Text Generation (Inference)	GPT-4.1 Nano / 4.1 Long N/A	4.83	N/A	N/A	Small is Sufficient / SOTA efficiency	68
20	Image	Image-Text to Text	InternVL2-8B 8B	0.020	0.01	N/A	51.2 MMMU Score	1516
21	Image	Image Classification	EVA02 Large 305M	1.53	0.61	N/A	90% ImageNet Accuracy	1516
22	Image	Image Classification	TinyViT-21M 21M	0.530	0.21	N/A	90% ImageNet Accuracy	151617
23	Image	Object Detection	DETA-Swin-Large 219M	2.40	0.96	N/A	0.6 Average Precision	1516
24	Image	Object Detection	DETA-ResNet-50 49M	1.20	0.48	N/A	0.5 Average Precision	1516
25	Audio	Speech Recognition	NVIDIA Canary-1B 1B	1.04	0.41	N/A	33.3 WER	1516
26	Audio	Speech Recognition	Whisper Base (EN) 73M	0.200	0.08	N/A	29.5 WER	1516
27	Other	Translation	Google T5-Large 738M	0.030	0.01	N/A	32.0 BLEU Score	1516
28	Other	Time Series Forecasting	Granite-TimeSeries-PatchTST 616K	0.110	0.04	N/A	0.6 MSE/Utility	1516
29	Other	Text Classification	Fine-tuned (Task-Specific) Small	0.100	N/A	N/A	High accuracy (narrow task)	2
30	Video	Video Generation (Real-time)	LTX-Video N/A	0.0005	N/A	N/A	High efficiency / 30fps HD	4

Showing 30 of 30 modelsData compiled from multiple research sources

Model Comparison

Select up to 4 models to compare side-by-side

Quick Compare:

No models selected yet

Use the selector above or try a quick preset to get started

Key Findings

Critical insights from analyzing energy, carbon, and water data across 30 AI models and 10+ task categories.

1,888,880x Energy Gap

The most energy-intensive AI task (CogVideoX video generation at 944 Wh) consumes nearly 1.9 million times more energy per prompt than the most efficient (LTX-Video at 0.0005 Wh).

Text Models: Wide Spectrum

Text generation spans from 0.016 Wh (Llama 3.1 8B) to 39.2 Wh (GPT-o3) — a 2,450x range within the same task category, driven by model size and reasoning depth.

MoE Architecture Wins

Mixtral 8x22B (141B params, MoE) uses only 0.07 Wh — dramatically less than dense models of similar capability, demonstrating mixture-of-experts as a key efficiency strategy.

Small Models, Big Impact

Task-specific fine-tuned models (0.06 Wh) and Llama 3.1 8B (0.016 Wh) prove that smaller, specialized models can deliver high accuracy at a fraction of the environmental cost.

Hidden Water Cost

Video generation consumes up to 1,000 mL of water per prompt for cooling — equivalent to a full water bottle. Even text models like Mistral Large 2 use 45 mL per prompt.

Reasoning Tax

Chain-of-thought and agentic reasoning (GPT-o3 at 39.2 Wh, Agentic AI at 4.32 Wh) impose a significant energy premium — the cost of "thinking harder" is measurable.

Carbon Emissions

Video generation produces up to 380 gCO2e per prompt — equivalent to driving 1.5 km in a car.

Water Consumption

Data center cooling for a single video generation prompt can consume up to 1 liter of water.

Abstract visualization of scientific data measurement and methodology

Measurement & Verification

Building the foundation for trusted AI environmental data

Transparency & Methodology

Data Sourcing & Model Assumptions

We believe in radical transparency. Every number on this dashboard has a source, every calculation has an assumption, and every assumption has a known limitation. This section documents our methodology so you can evaluate, challenge, and improve it.

Read the Full Mālama AICo2 Methodology (PDF)

The Measurement Problem

There is no industry-standard methodology for measuring AI's environmental footprint. Companies, in the words of the Federation of American Scientists, "report whatever they choose, however they choose." The true carbon footprint of major AI providers may be up to 662% higher than publicly reported figures.

Current metrics like Power Usage Effectiveness (PUE) — a 20-year-old standard — measure only facility efficiency, not how efficiently IT equipment actually uses delivered power. As the FAS describes it: "Like a car that reports how much fuel reaches the engine but not the miles per gallon of that engine."

This dashboard aggregates the best available data from multiple independent sources, but we acknowledge that all estimates carry significant uncertainty. Our goal is not to present definitive numbers, but to make the scale of AI's environmental impact visible and to push for better measurement infrastructure.

Our full estimation framework — including the dual-phase architecture, five-stage pipeline, task-class parameterization, and cryptographic telemetry roadmap — is documented in the Mālama AICo2 Methodology.

Data Sources

Ref	Source	Type	Description
1-4	MIT Technology Review / ML.Energy	Primary	Direct GPU power measurement on Nvidia H100 hardware using ML.Energy benchmarks. 500-prompt test batches for text models; prescribed denoising steps for image/video. GPU energy doubled to estimate total system energy.
2, 10-12	Google Cloud Infrastructure Blog	Primary	Google's comprehensive methodology measuring full-system dynamic power including idle machines, CPU/RAM overhead, and data center PUE of 1.09. Median Gemini text prompt: 0.24 Wh, 0.03 gCO2e, 0.26 mL water.
5-9	AI Energy Score (Hugging Face)	Primary	Sasha Luccioni's Code Carbon tool measuring GPU energy during inference. Standardized benchmarking across 166+ AI models with energy efficiency ratings.
6, 8	Artificial Analysis / Model Benchmarks	Secondary	Inference cost and energy estimates for frontier models (GPT-o3, GPT-4.5, GPT-4.1 series) derived from API pricing, token throughput, and hardware utilization estimates.
15-17	Hugging Face Open LLM Leaderboard	Secondary	Energy consumption metrics from the Hugging Face model hub, including per-inference energy measurements for classification, detection, and generation tasks.
10, 13	IEA / EPA / Academic Literature	Reference	International Energy Agency data center consumption estimates (415 TWh in 2024), EPA grid carbon intensity factors, and peer-reviewed lifecycle analysis papers.
14	Stability AI / Diffusion Benchmarks	Primary	Energy measurements for Stable Diffusion 3 variants at different step counts (25 standard, 50 high-quality) and model sizes (2B-8B parameters).
18-20	Ampere Computing / Agentic AI Research	Reference	Analysis of agentic AI operational costs, persistent inference demand patterns, and non-linear infrastructure scaling for autonomous AI workflows.
—	FAS Policy Memo (Jhaveri & Palat)	Policy	Federation of American Scientists recommendations for standardized AI energy metrics, mandatory reporting frameworks, and interagency coordination (DOE, NIST, EPA).
—	Carnegie Mellon University	Policy	CMU policy recommendations for establishing standardized metrics for AI energy and environmental impacts, including lifecycle measurement frameworks.

Model Assumptions

Click each assumption to see the evidence basis and known limitations. Confidence levels indicate our assessment of reliability.

How We Calculate

Energy (Wh/prompt)

Step 1: Measure GPU power draw during inference using hardware power sensors (NVML/RAPL) or software tools (Code Carbon, ML.Energy).

Step 2: Double the GPU measurement to account for CPU, RAM, networking, storage, and cooling overhead (~50% rule).

Step 3: Apply PUE multiplier (1.1-1.2) for data center infrastructure overhead.

E_total = E_gpu × 2.0 × PUE

Carbon (gCO2e/prompt)

Step 1: Convert energy to kWh.

Step 2: Multiply by location-based grid carbon intensity (gCO2e/kWh) from EPA eGRID or Electricity Maps.

Caveat: Market-based accounting with RECs can show near-zero emissions even when actual grid mix is carbon-heavy.

CO2 = E_total × GridIntensity

Water (mL/prompt)

Step 1: Estimate cooling energy from total energy and PUE breakdown.

Step 2: Apply Water Usage Effectiveness (WUE) ratio — liters of water per kWh of IT energy.

Caveat: Air-cooled facilities use minimal water; evaporative cooling facilities use significantly more. Few providers disclose WUE.

H2O = E_cooling × WUE

We Welcome Better Methods

This dashboard is a living document. We recognize that our methodology has significant gaps — particularly around water consumption, agentic AI workloads, and the variance between estimated and actual data center operations.

If you have access to better data, more rigorous measurement approaches, or can identify errors in our assumptions, we want to hear from you. The goal is not to be right — it is to be progressively less wrong and to build toward a future where AI's environmental impact is measured with trusted, objective, real-time data rather than estimates and extrapolations.

This is exactly why we advocate for hardware-level sensor integration inside data centers — to replace estimation with measurement. See the Agents & Sensors section below for our vision.

Submit a Contribution

Agentic AI & Sensors

The Compounding Cost of Autonomous AI

As AI evolves from single-query chatbots to autonomous agents that plan, execute, and iterate, the energy footprint doesn't just grow — it compounds. Traditional "energy per query" metrics fail to capture the true cost of agentic workflows.

From Episodic to Continuous

A single agentic workflow can involve multiple model calls, data retrieval, validation loops, and downstream integrations. Unlike a simple chatbot query, agents create persistent, background compute demand.

As Ampere Computing's research (April 2026) warns: infrastructure demand from agentic AI grows in non-linear ways. Automated decisions generate follow-up processes, and workflows branch into additional tasks. This multiplicative effect is easily underestimated.

The FAS policy memo acknowledges this shift directly: "When we move from chatbots to agentic AI systems that plan, act, remember, and iterate autonomously, traditional 'energy per query' metrics no longer capture the full picture."

Energy Escalation Ladder

Single Text QueryLlama 3.1 8B

0.016 Wh1 calls

Standard Chat SessionGPT-4o

0.3–2.9 Wh1–5 calls

Complex Reasoningo3-mini / DeepSeek-R1

1.9–33.6 Wh1–3 calls

Agentic WorkflowGPT-4 class Agent

4.32+ Wh3–10+ calls

Autonomous Agent LoopMulti-model Pipeline

10–100+ Wh10–50+ calls

Autonomous agent loops can invoke 10-50+ model calls per task. At enterprise scale with thousands of concurrent agents, energy costs compound exponentially. Current measurement infrastructure cannot track this.

10-50x

More model calls per task vs. single query

Non-Linear

Infrastructure demand growth as agents scale

Always-On

Persistent compute demand, not episodic usage

The Solution: Trusted Measurement

From Estimation to Observation

Every number on this dashboard is an estimate. The real goal is to replace estimation with measurement — deploying hardware-level sensors inside data centers to create an environment of trusted, objective, verifiable data that can dynamically and drastically reduce AI's environmental impact.

Hardware-Level Sensor Integration

IoT sensors deployed at rack, server, and cooling system level — measuring actual environmental impact, not estimates.

Today: Estimation & Opacity

×Software-based GPU power measurement (CodeCarbon, NVML)
×Rule-of-thumb extrapolation: GPU = ~50% of total energy
×Average PUE values applied uniformly across facilities
×Regional grid carbon intensity averages (not real-time)
×Water consumption rarely disclosed, often unknown
×Self-reported data with no independent verification
×True emissions may be 662% higher than reported (FAS)

Future: Measurement & Trust

✓Hardware-level sensors at rack, server, and GPU level
✓Actual energy measurement per workload, per inference
✓Real-time PUE calculated per facility, per hour
✓Live grid carbon intensity from Electricity Maps / sensors
✓Flow sensors on cooling systems measuring actual water use
✓Blockchain-verified, tamper-proof measurement records
✓AI-powered dynamic optimization reducing impact in real-time

Sensor Capabilities for Data Centers

Real-Time Energy Metering

Hardware-level power sensors at the rack, server, and GPU level — measuring actual energy consumption per workload, not estimates derived from GPU draw alone.

kWh per inference

Carbon Emissions Tracking

Live grid carbon intensity integration combined with actual energy measurements to calculate real-time, location-based carbon emissions — not market-based credits.

gCO2e per prompt

Water Consumption Monitoring

Flow sensors on cooling systems measuring actual water consumption per rack, correlated with compute workloads to attribute water usage to specific AI tasks.

mL per inference

Thermal & Airflow Analysis

Temperature, humidity, and airflow sensors creating a real-time thermal map of the data center — identifying cooling inefficiencies and hot spots for optimization.

°C / CFM per zone

The Measurement-to-Optimization Pipeline

Deploy Sensors

IoT sensors installed at rack, server, and cooling system level across the data center.

Measure Everything

Real-time energy, water, thermal, and airflow data streamed continuously — not sampled or estimated.

Verify On-Chain

Measurement data anchored to blockchain (Cardano) for tamper-proof, immutable audit trails — creating trusted, objective records.

AI-Powered Analysis

Machine learning models analyze sensor data to identify inefficiencies, predict failures, and recommend optimizations.

Dynamic Optimization

Automated workload scheduling, cooling adjustments, and resource allocation based on real-time environmental data.

Verified Reporting

Standardized, auditable environmental reports generated from actual measurements — replacing self-reported estimates.

Malama dMRV: Trusted Environmental Data

Digital Measurement, Reporting & Verification

The Malama dMRV (digital Measurement, Reporting, and Verification) infrastructure — already proven in carbon credit verification for agriculture and land management — provides the technological foundation for trusted data center monitoring.

By combining IoT sensors for real-time environmental data collection, AI-powered analysis for pattern recognition and optimization, and blockchain verification (Cardano) for tamper-proof audit trails, the platform creates an environment where environmental claims are backed by verifiable, objective measurements — not self-reported estimates.

Extending this infrastructure into data centers means every energy reading, every water flow measurement, and every carbon calculation is recorded immutably on-chain. Operators, regulators, and the public can independently verify environmental impact claims, creating the accountability that the FAS, CMU, and industry researchers have called for.

Read the Full Mālama AICo2 Methodology

Real-Time

Sensor Data

On-Chain

Verification

Immutable

Audit Trail

AI-Driven

Optimization

The Impact: What Trusted Data Enables

Dynamic Workload Scheduling

Route AI inference to facilities with the lowest real-time carbon intensity, shifting compute to when and where clean energy is available.

Cooling Optimization

Real-time thermal data enables predictive cooling adjustments — reducing water consumption by matching cooling output to actual heat load, not worst-case estimates.

Hardware Utilization

Identify idle capacity and underutilized servers that consume energy without productive output. Google found idle machines are a significant hidden cost.

Regulatory Compliance

Automated, verifiable reporting that meets emerging EU AI Act requirements and anticipated US mandatory disclosure frameworks (FAS/CMU recommendations).

Carbon-Aware Computing

Schedule training runs and batch inference during periods of high renewable energy availability, verified by real-time grid data rather than annual averages.

Accountability & Trust

Replace the current 662% reporting gap with blockchain-verified measurements that investors, regulators, and the public can independently audit.

Open Research

Contribute to Better Data

This dashboard is a living document. If you have access to better data, more rigorous measurement approaches, or can identify errors in our assumptions — we want to hear from you. The goal is to be progressively less wrong.

20+

Sources Cited

Model Assumptions

Open

Methodology

v2.0

Current Version

What are you contributing?

Submission Guidelines

Include DOI or URL for any referenced papers
Specify hardware and inference conditions if submitting energy data
Note confidence level and known limitations
Peer-reviewed sources are prioritized but not required

Review our full AICo2 Methodology (PDF)

What Happens Next

Step 01

Submit

Send your data, correction, or methodology proposal via the form above.

Step 02

Review

Our research team evaluates submissions against existing sources and methodology.

Step 03

Integrate

Accepted contributions are incorporated into the dashboard with full attribution.

Step 04

Publish

Updated data and methodology changes are reflected in the next version release.

The Environmental Cost of Artificial Intelligence

Impact Overview

Top Energy Consumers

Model Distribution

Data Explorer

Model Comparison

Key Findings

1,888,880x Energy Gap

Text Models: Wide Spectrum

MoE Architecture Wins

Small Models, Big Impact

Hidden Water Cost

Reasoning Tax

Carbon Emissions

Water Consumption

Measurement & Verification

Data Sourcing & Model Assumptions

The Measurement Problem

Data Sources

Model Assumptions

GPU Energy Doubling Rule

Carbon Intensity Calculation

Water Consumption Estimates

Hardware Standardization

PUE (Power Usage Effectiveness)

Agentic AI Multiplier

How We Calculate

Energy (Wh/prompt)

Carbon (gCO2e/prompt)

Water (mL/prompt)

We Welcome Better Methods

The Compounding Cost of Autonomous AI

From Episodic to Continuous

Energy Escalation Ladder

From Estimation to Observation

Hardware-Level Sensor Integration

Today: Estimation & Opacity

Future: Measurement & Trust

Sensor Capabilities for Data Centers

Real-Time Energy Metering

Carbon Emissions Tracking

Water Consumption Monitoring

Thermal & Airflow Analysis

The Measurement-to-Optimization Pipeline

Deploy Sensors

Measure Everything

Verify On-Chain

AI-Powered Analysis

Dynamic Optimization

Verified Reporting

Malama dMRV: Trusted Environmental Data

The Impact: What Trusted Data Enables

Dynamic Workload Scheduling

Cooling Optimization

Hardware Utilization

Regulatory Compliance

Carbon-Aware Computing

Accountability & Trust

Contribute to Better Data

What are you contributing?

Submission Guidelines

What Happens Next

Submit

Review

Integrate

Publish