Cloud vs Local: Complete Cost & Provider Comparison¶

The landscape has shifted. It's no longer "cloud vs local" -- it's "cloud AND local, each for what they do best."

Table of Contents¶

Cloud GPU Providers
Inference API Providers
Mac-as-a-Service
Cost Comparisons
When to Use Cloud vs Local
The Repatriation Trend

Cloud GPU Providers¶

Budget Tier (Cheapest)¶

Provider	GPU	$/hour	Notes
Vast.ai	RTX 3090	$0.11	Peer-to-peer marketplace. Cheapest but variable quality.
Vast.ai	A100 80GB	~$0.50	Can be unreliable.
Vast.ai	H100 80GB	$1.49	Lowest H100 price anywhere.
TensorDock	RTX 4090	$0.37	Good for inference on smaller models.
TensorDock	A100 80GB	$1.42	Solid budget option.
Salad	RTX 3090	$0.11	Distributed consumer GPUs.
Fluence	A100 80GB	$0.80	Decentralized. Zero egress fees.

Mid-Range AI Providers¶

Provider	GPU	$/hour	Notes
RunPod	A100 80GB	$1.74	Docker-first. Great templates.
RunPod	H100 80GB	$3.19	Secure cloud option.
Lambda Labs	A100 80GB	$1.10	No egress fees. Often sold out.
Thunder Compute	A100	$0.78	Virtualized GPUs.

Enterprise Cloud¶

Provider	GPU	$/hour	Notes
AWS EC2 P4d	8xA100	$21.95-40.97	Recently 33% price cut.
AWS EC2 P5	8xH100	~$55.04	45% price reduction Jun 2025.
Azure	H100	$6.98	Most expensive H100.
GCP	Various	Premium	Good ML ecosystem.

Dedicated Monthly Servers¶

Provider	GPU	$/month	Notes
Hetzner GEX44	RTX 4000 Ada	~$184	Cheapest dedicated GPU server. EU-based.
Hetzner GEX131	RTX PRO 6000	~$940	Training-grade. Flexible hourly billing.
Contabo	L40S	$790	90% cheaper than AWS equivalent.

GPU Pricing Summary¶

GPU	VRAM	Cheapest	Mid-Range	Enterprise
RTX 3090	24GB	$0.11/hr	$0.20	N/A
RTX 4090	24GB	$0.18/hr	$0.37-0.44	N/A
A100 80GB	80GB	$0.44/hr	$1.10-1.74	$2.74-5.04
H100 80GB	80GB	$1.49/hr	$2.25-3.19	$5.95-6.98

Inference API Providers¶

No GPU management required. Call open-source models via API.

Provider	Speed	Cost/1M tokens	Free Tier	Best For
Groq	300+ tok/s	$0.59	Yes (generous)	Real-time, coding
Cerebras	969 tok/s (record)	$0.10 (8B)	Yes	Bulk processing
DeepInfra	Fast	$0.08	Yes	Budget inference
Together.ai	86 tok/s	$0.88 (70B)	$25 credit	Fine-tuning + inference
Fireworks AI	68 tok/s	$0.18-7.00	Yes	Multimodal
SambaNova	Fast	Competitive	Yes	Enterprise
Replicate	Varies	Pay-per-run	Yes	Serverless
Novita AI	Good	$0.10	Yes	200+ models
OpenRouter	Varies	Pass-through	No	Unified access to all

Context: GPT-4o equivalent performance now costs ~$0.40/million tokens. Down from $20 in late 2022. A 50x price drop in 3 years.

When to Use APIs vs Self-Hosted¶

< 8,000 conversations/day: API is almost always cheaper
> 30M tokens/day: Self-hosted reaches cost parity within 1-4 months
For coding agents: API calls to frontier models (Claude, GPT-4o) are generally better quality than any self-hosted model

Mac-as-a-Service¶

You can rent Mac hardware in the cloud.

Provider	Hardware	Price	Best For
MacStadium	Mac Mini M4	$109-199/mo	CI/CD, enterprise, long-term
MacStadium	Mac Studio (Ultra)	~$599/mo	Heavy workloads
MacinCloud	Mac servers	$1/hr or $25/mo	Occasional use
JUUZ	Mac Mini M2, Studio	$49/mo (EU)	Budget Mac cloud
AWS EC2 Mac	Mac instances	~$25/day	AWS ecosystem
Rent-a-Mac.io	Mac Minis	Varies	Dedicated

Buy vs Rent Math¶

Mac Studio Ultra at $599/mo via MacStadium = $7,188/year. Buying one costs $5,600 one-time. Buying is cheaper after ~9 months.

Cost Comparisons¶

Scenario A: Running a 70B Model Daily (8 hrs/day)¶

Option	Monthly Cost	Annual Cost
Mac Studio M4 Max 128GB	$0 (after $3,500 purchase)	~$292 amortized/yr
RunPod A100 80GB	$417	$5,004
Vast.ai A100	$120	$1,440
TensorDock A100	$340	$4,080
Lambda Labs A100	$264	$3,168
Hetzner GEX44 dedicated	$184	$2,208

Scenario B: Light Use (2 hrs/day)¶

Option	Monthly Cost	Annual Cost
Mac Studio M4 Max 128GB	$0 (after $3,500)	$292 amortized
RunPod A100	$104	$1,248
Vast.ai A100	$30	$360

Break-Even Points¶

Mac Mini M4 Pro 64GB ($2,000): Breaks even in 6-12 months vs $100+/mo cloud APIs
Mac Studio M4 Max 128GB ($3,500): Breaks even in 8-13 months vs cloud A100 at 8 hrs/day
Mac Studio M3 Ultra 512GB ($9,500): Breaks even in ~12-18 months vs cloud GPU at $8/hr

Enterprise Scale (Lenovo H100 Study)¶

Metric	On-Premises	AWS On-Demand	AWS 3-Year Reserved
5-Year TCO (24/7)	$871,912	$4,306,416	$2,362,812
Savings vs cloud	--	$3.4M saved	$1.5M saved

On-prem breaks even at just 5 hours/day usage vs on-demand cloud.

When to Use Cloud vs Local¶

Cloud Makes Sense When¶

You need frontier models (Claude Opus, GPT-4o, Gemini Ultra) -- they're cloud-only
Training large models (NVIDIA H100/B200 clusters)
Burst workloads -- need 100 GPUs for 3 days, then nothing
You lack IT staff to manage hardware
Rapid prototyping -- try before committing
Global serving -- need low latency worldwide

Local Makes Sense When¶

Running inference on known models 24/7
Data privacy is critical (healthcare, finance, legal, government)
Predictable, steady workloads (math favors local)
Low latency required (no network round-trip)
Want to avoid vendor lock-in and API deprecations
Total cost of ownership over 2+ years matters

The Pragmatic Hybrid (What Most Teams Do)¶

Local model (Ollama + Qwen3-Coder or GLM-4.7) for autocomplete, quick questions, private data, iteration
Cloud model (Claude Opus/Sonnet, GPT-4o) for complex multi-file coding, architecture, heavy reasoning
Inference API (Groq free tier, DeepInfra) for cheap bulk queries

The Repatriation Trend¶

Major shift happening in 2025-2026:

86% of CIOs planned to repatriate some cloud workloads (Barclays CIO Survey)
83% of enterprise CIOs plan to bring at least some workloads on-premise
Only 8-9% plan full repatriation -- most go hybrid

Top reasons: 1. Cost -- cloud GPU costs are staggering at scale 2. Security/Privacy -- sensitive data leaving the organization 3. Latency -- network round-trip adds 50-200ms 4. Control -- vendor lock-in, unpredictable pricing 5. Regulatory compliance -- data residency requirements

The Mac Mini buying frenzy (triggered partly by OpenClaw/Clawdbot going viral) is the consumer/SMB version of this enterprise trend.

Recommendations by Use Case¶

Need	Best Path	Monthly Cost
Best coding quality	Claude Code (Anthropic API)	$20-200
Cheapest queries	Inference APIs (Groq free tier)	$0-20
Run your own models	Mac Mini M4 Pro 64GB + Ollama	$0 after $2K purchase
Cloud GPU flexibility	RunPod or TensorDock A100	$85-417
Privacy/compliance	Self-hosted on Hetzner	$184
Enterprise at scale	On-prem H100 cluster	High upfront, saves millions over 5yr