Cloud is cheapest
At 100,000 tasks/month for "Support chatbot". Runner-up: sub.
Break-even analysis
Monthly cost vs. task volume
Assumptions
- Cloud cost model
- Variable per token + optional overhead
- Cloud per task
- (800 × $0.15 + 400 × $0.60) ÷ 1M
- Local cost model
- Fixed monthly: GPU + electricity + maintenance
- Local monthly
- $1,200.00 + $250.00 + $500.00
- Subscription cost model
- Per-seat flat fee + overage above included quota
- Sub monthly
- $25.00 × 20 seats + overage × $0.0100
- Utilization
- 6.7%
Formulas
cloud_per_task = (in_tok × in_price + out_tok × out_price) ÷ 1,000,000 + overhead
cloud_monthly = cloud_per_task × monthly_tasks
local_monthly = gpu + electricity + maintenance
sub_monthly = fee × seats + max(0, tasks − cap × seats) × overage
break_even_cloud↔sub = (fee × seats) ÷ cloud_per_task
break_even_cloud↔local = local_monthly ÷ cloud_per_task
effective_per_1M = monthly_cost ÷ (tokens_per_month ÷ 1,000,000)
Above the break-even volume, the flat local cost is spread over more tasks than cloud's per-token charges accumulate to — so local wins. Below it, cloud's pay-per-token model is cheaper than carrying fixed infrastructure.
Export summary
Plain-text snapshot of inputs and results.
INFERENCE COST ANALYSIS ======================= Task: Support chatbot Monthly tasks: 100,000 Avg input tokens: 800 Avg output tokens: 400 Cloud model: GPT-4o-mini Input $/1M: $0.15 Output $/1M: $0.60 Request overhead: $0.0000 Local model: Llama 3 8B · A100 GPU/server $/mo: $1,200.00 Electricity $/mo: $250.00 Maintenance $/mo: $500.00 Max throughput: 1,500,000 tasks/mo Subscription: ChatGPT Team Fee per seat $/mo: $25.00 Seats: 20 Included tasks: 3,000 / seat / mo Overage $/task: $0.0100 RESULTS ------- Cloud cost / task: $0.0004 Local cost / task: $0.0195 Sub cost / task: $0.0090 Cloud monthly total: $36.00 Local monthly total: $1,950.00 Sub monthly total: $900.00 Effective $/1M (cloud): $0.30 Effective $/1M (local): $16.25 (now) · $1.08 (at max) Effective $/1M (sub): $7.50 Break-even volume: 5,416,667 tasks/mo Break-even cloud↔sub: 1,388,889 tasks/mo Utilization: 6.7% RECOMMENDATION: Cloud is cheaper