Inference Cost Calculator — Cloud vs Local LLM

Quick start · scenario presets

Click a preset to load a realistic task + model combo, then tweak below.

Task profile

Task name

Monthly tasks (volume)

How many requests/calls per month.

Avg input tokens

~4 chars = 1 token

Avg output tokens

Reply length

≈ 120M tokens / month

Cloud model (pay per token)

API · variable

Pick a model

Model name

Input $/1M tokens

Output $/1M tokens

Request overhead (optional)

Fixed surcharge per request, if any.

Local / open model

Self-hosted · fixed

Pick a model + hardware

Loads indicative GPU/power/maintenance costs. Adjust to match your real bill.

Model + hardware label

GPU / server cost ($/mo)

Electricity ($/mo)

Maintenance / engineering ($/mo)

Max monthly throughput (tasks)

Practical ceiling at planned duty cycle.

Subscription plan

Compare

Plan name

Fee per seat ($/mo)

Seats

Included tasks / seat / mo

0 = unlimited (no overage).

Overage $/task above cap

Recommendation

Cloud is cheapest

At 100,000 tasks/month for "Support chatbot". Runner-up: sub.

Monthly savings vs runner-up

$864.00

Cloud / month

$36.00

$0.0004 / task

Local / month

$1,950.00

$0.0195 / task at current vol

Subscription / month

$900.00

incl. 40K overage tasks

Break-even (cloud↔local)

5.4M tasks

local cheaper above this

Effective $/1M · cloud

$0.30

at current volume

Effective $/1M · local

$16.25

$1.08 at max throughput

Effective $/1M · sub

$7.50

at current volume

Utilization

6.7%

current ÷ max throughput

Break-even analysis

Monthly cost vs. task volume

Cloud (variable)Local (fixed)Subscription

Assumptions

Cloud cost model: Variable per token + optional overhead
Cloud per task: (800 × $0.15 + 400 × $0.60) ÷ 1M
Local cost model: Fixed monthly: GPU + electricity + maintenance
Local monthly: $1,200.00 + $250.00 + $500.00
Subscription cost model: Per-seat flat fee + overage above included quota
Sub monthly: $25.00 × 20 seats + overage × $0.0100
Utilization: 6.7%

Formulas

cloud_per_task = (in_tok × in_price + out_tok × out_price) ÷ 1,000,000 + overhead

cloud_monthly = cloud_per_task × monthly_tasks

local_monthly = gpu + electricity + maintenance

sub_monthly = fee × seats + max(0, tasks − cap × seats) × overage

break_even_cloud↔sub = (fee × seats) ÷ cloud_per_task

break_even_cloud↔local = local_monthly ÷ cloud_per_task

effective_per_1M = monthly_cost ÷ (tokens_per_month ÷ 1,000,000)

Above the break-even volume, the flat local cost is spread over more tasks than cloud's per-token charges accumulate to — so local wins. Below it, cloud's pay-per-token model is cheaper than carrying fixed infrastructure.

Export summary

Plain-text snapshot of inputs and results.

INFERENCE COST ANALYSIS
=======================

Task:                Support chatbot
Monthly tasks:       100,000
Avg input tokens:    800
Avg output tokens:   400

Cloud model:         GPT-4o-mini
  Input  $/1M:       $0.15
  Output $/1M:       $0.60
  Request overhead:  $0.0000

Local model:         Llama 3 8B · A100
  GPU/server $/mo:   $1,200.00
  Electricity $/mo:  $250.00
  Maintenance $/mo:  $500.00
  Max throughput:    1,500,000 tasks/mo

Subscription:        ChatGPT Team
  Fee per seat $/mo: $25.00
  Seats:             20
  Included tasks:    3,000 / seat / mo
  Overage $/task:    $0.0100

RESULTS
-------
Cloud cost / task:        $0.0004
Local cost / task:        $0.0195
Sub cost / task:          $0.0090
Cloud monthly total:      $36.00
Local monthly total:      $1,950.00
Sub monthly total:        $900.00
Effective $/1M (cloud):   $0.30
Effective $/1M (local):   $16.25 (now) · $1.08 (at max)
Effective $/1M (sub):     $7.50
Break-even volume:        5,416,667 tasks/mo
Break-even cloud↔sub:     1,388,889 tasks/mo
Utilization:              6.7%


RECOMMENDATION: Cloud is cheaper