Inference Cost Calculatorv1.0

Recommendation

Cloud is cheapest

At 100,000 tasks/month for "Support chatbot". Runner-up: sub.

Monthly savings vs runner-up
$864.00
Cloud / month
$36.00
$0.0004 / task
Local / month
$1,950.00
$0.0195 / task at current vol
Subscription / month
$900.00
incl. 40K overage tasks
Break-even (cloud↔local)
5.4M tasks
local cheaper above this
Effective $/1M · cloud
$0.30
at current volume
Effective $/1M · local
$16.25
$1.08 at max throughput
Effective $/1M · sub
$7.50
at current volume
Utilization
6.7%
current ÷ max throughput

Break-even analysis

Monthly cost vs. task volume

Cloud (variable)Local (fixed)Subscription

Assumptions

Cloud cost model
Variable per token + optional overhead
Cloud per task
(800 × $0.15 + 400 × $0.60) ÷ 1M
Local cost model
Fixed monthly: GPU + electricity + maintenance
Local monthly
$1,200.00 + $250.00 + $500.00
Subscription cost model
Per-seat flat fee + overage above included quota
Sub monthly
$25.00 × 20 seats + overage × $0.0100
Utilization
6.7%

Formulas

cloud_per_task = (in_tok × in_price + out_tok × out_price) ÷ 1,000,000 + overhead

cloud_monthly = cloud_per_task × monthly_tasks

local_monthly = gpu + electricity + maintenance

sub_monthly = fee × seats + max(0, tasks − cap × seats) × overage

break_even_cloud↔sub = (fee × seats) ÷ cloud_per_task

break_even_cloud↔local = local_monthly ÷ cloud_per_task

effective_per_1M = monthly_cost ÷ (tokens_per_month ÷ 1,000,000)

Above the break-even volume, the flat local cost is spread over more tasks than cloud's per-token charges accumulate to — so local wins. Below it, cloud's pay-per-token model is cheaper than carrying fixed infrastructure.

Export summary

Plain-text snapshot of inputs and results.

INFERENCE COST ANALYSIS
=======================

Task:                Support chatbot
Monthly tasks:       100,000
Avg input tokens:    800
Avg output tokens:   400

Cloud model:         GPT-4o-mini
  Input  $/1M:       $0.15
  Output $/1M:       $0.60
  Request overhead:  $0.0000

Local model:         Llama 3 8B · A100
  GPU/server $/mo:   $1,200.00
  Electricity $/mo:  $250.00
  Maintenance $/mo:  $500.00
  Max throughput:    1,500,000 tasks/mo

Subscription:        ChatGPT Team
  Fee per seat $/mo: $25.00
  Seats:             20
  Included tasks:    3,000 / seat / mo
  Overage $/task:    $0.0100

RESULTS
-------
Cloud cost / task:        $0.0004
Local cost / task:        $0.0195
Sub cost / task:          $0.0090
Cloud monthly total:      $36.00
Local monthly total:      $1,950.00
Sub monthly total:        $900.00
Effective $/1M (cloud):   $0.30
Effective $/1M (local):   $16.25 (now) · $1.08 (at max)
Effective $/1M (sub):     $7.50
Break-even volume:        5,416,667 tasks/mo
Break-even cloud↔sub:     1,388,889 tasks/mo
Utilization:              6.7%


RECOMMENDATION: Cloud is cheaper