IDENTITY: DETERMINISTIC TOOL
AI INFRASTRUCTURE — WORKBENCH
BENCHMARK: LLAMA 3 70B BF16

AI GRAVITY & PLACEMENT ENGINE

CALCULATE TOKEN TCO ACROSS CLOUD AND ON-PREM INFRASTRUCTURE. DATA GRAVITY SCORING. PLACEMENT VERDICT. ARCHITECT’S TIP. LLAMA 3 70B BF16 — 145GB VRAM LOCKED.

🔒 No cookies. No tracking. Runs entirely in your local browser session. No data leaves your machine.

Where your data lives determines what your AI costs. Most infrastructure teams discover this after the invoice arrives — not before the architecture is committed.

The AI Gravity & Placement Engine calculates the Token TCO for running Llama 3 70B at BF16 precision across six infrastructure tiers: AWS, GCP, CoreWeave, Lambda, Nutanix AHV, and Cisco UCS. It doesn’t stop at hourly GPU rates. It calculates the Data Gravity Score — the friction cost of moving your dataset to each provider — and uses that score to generate a placement verdict: Stay Put, Hybrid Burst, or Full Repatriation.

The benchmark is locked at Llama 3 70B in BF16 precision. BF16 requires approximately 145GB of VRAM just for model weights — which forces a multi-GPU configuration on every provider and reveals which platforms have the high-speed interconnects (InfiniBand or NVLink equivalent) needed to bridge those GPUs without introducing latency penalties. INT4 quantization fits on a single 48GB GPU. BF16 tells you what the architecture actually costs at production fidelity.

The Gravity Score is the differentiator. Most AI infrastructure calculators compare compute rates and stop there. The Gravity Score measures egress cost as a fraction of compute cost — when that ratio exceeds 0.5, the data is too heavy to move economically and the placement decision is already made. When it falls below 0.1, the data is weightless and the cheapest compute wins. Everything between those thresholds is the architectural decision space this engine is designed to map.

The output is not a table. It is a verdict — provider, strategy, reasoning, and an Architect Tip that surfaces the Day 2 operational consideration that the cost comparison alone doesn’t show.

AI gravity & placement engine — Token TCO and data gravity scoring for Llama 3 70B BF16 across cloud and on-prem infrastructure
Where your data lives determines what your AI costs. The Gravity Score makes that friction visible.

Key Features

>_ Token TCO Benchmark
Llama 3 70B — BF16 Locked
All cost outputs are normalized to cost-per-1M-tokens using Llama 3 70B at BF16 precision — the most widely deployed open-weight model at production scale. BF16 is locked as the benchmark because it forces a multi-GPU configuration on every provider and reveals which platforms have the high-speed interconnects needed at scale. Methodology informed by NVIDIA inference optimization guidance and MLCommons inference benchmarks.
>_ Data Gravity Scoring
Egress Friction as a Cost Ratio
The engine calculates a Gravity Score (G) for each provider — egress cost divided by monthly compute cost. G > 0.5 triggers Stay Put or Full Repatriation. G < 0.1 indicates weightless data and defaults to the lowest-cost compute recommendation. The score makes data movement friction visible as a ratio, not an assumption.
>_ Six-Provider Normalized Comparison
AWS · GCP · CoreWeave · Lambda · Nutanix · Cisco
All six providers normalized to cost-per-GPU-hour at the 8-GPU BF16 configuration. On-prem providers use 36-month CapEx amortization plus the configurable OpEx Adder for a fully-loaded comparable rate. Cloud providers use April 2026 on-demand pricing. One unit of measure across all six tiers.
>_ OpEx Adder — Configurable
20% Default — Adjustable to 35%
On-prem TCO includes a configurable OpEx Adder covering power, cooling, rack space, and maintenance. Default 20% reflects a modern efficient data center. Slide to 25–35% for older Tier II facilities or environments with full staff allocation. Applies only to Nutanix and Cisco — cloud providers bake equivalent costs into their margins.
>_ Placement Verdict + Architect Tip
Stay Put · Hybrid Burst · Full Repatriation
The engine outputs a Strategic Path — provider recommendation, strategy label, reasoning against your specific inputs, and an Architect Tip that surfaces the Day 2 operational consideration the cost comparison alone doesn’t show. Not a table. A verdict.
>_ Sovereign Mode
Air-Gap & Regulated Environment Support
Toggle Sovereign Mode to exclude all public cloud providers from the recommendation set. When enabled, only Nutanix AHV and Cisco UCS are eligible for the verdict — reflecting environments where data sovereignty or regulatory frameworks make hyperscaler placement non-viable regardless of cost.
>_ Duty Cycle Sensitivity
Burst Training vs Steady-State Inference
Adjust the Duty Cycle slider to model burst training scenarios (20–40%) versus steady-state inference (100%). Below 70% utilization the fixed CapEx of on-prem begins to lose its cost advantage versus elastic cloud pricing — the engine identifies the crossover point dynamically as you slide.
>_ AI Infrastructure — Related Tool
Cloud Egress Cost Calculator
Model the exact egress fees your dataset incurs moving between cloud providers or to on-prem — the raw input that drives the Gravity Score in this engine.
[→] Egress Calculator

AI INFRASTRUCTURE PLACEMENT AUDIT

The engine surfaces the number. The triage session maps it against your dataset location, workload profile, compliance requirements, and 36-month CapEx window — and tells you whether to stay, burst, or repatriate. One conversation. Deterministic output.

REQUEST INFRASTRUCTURE TRIAGE

Frequently Asked Questions

Q: What benchmark does the AI Gravity & Placement Engine use?

A: The engine uses Llama 3 70B at BF16 precision as the standard benchmark unit. BF16 requires approximately 145GB of VRAM for model weights alone, forcing a multi-GPU configuration on every provider. This precision level was chosen because it reflects production-fidelity inference requirements — INT4 quantization fits on a single GPU and masks the interconnect and fabric costs that matter at scale. All Token TCO outputs are expressed as cost-per-1M-tokens at this benchmark configuration.

Q: What is the Data Gravity Score and how is it calculated?

A: The Gravity Score (G) measures the friction cost of moving your dataset to a given provider as a fraction of your monthly compute cost: G = (Dataset Size in GB × Egress Rate) ÷ Monthly Compute Cost. A score above 0.5 means egress costs exceed half your compute spend — at that point, moving the data is economically irrational and the engine defaults to Stay Put or Full Repatriation. Below 0.1, the data is effectively weightless and the cheapest compute wins. The score between those thresholds is the architectural decision space.

Q: How are on-prem providers (Nutanix, Cisco) priced in the comparison?

A: On-prem providers use a 36-month CapEx amortization model divided by 730 hours/month, with a configurable OpEx Adder (default 20%) applied on top. The OpEx Adder covers power, cooling, rack space, and maintenance overhead — costs that cloud providers bake into their hourly margins. Adjusting the OpEx Adder to 25–35% models older Tier II facilities or environments with significant staff allocation. This normalization produces a fully-loaded GPU-hour rate that is directly comparable to cloud on-demand pricing.

Q: What does Sovereign Mode do?

A: When Sovereign / Regulated Mode is enabled, AWS, GCP, CoreWeave, and Lambda are excluded from the placement recommendation entirely — regardless of their cost position. Only Nutanix AHV and Cisco UCS are eligible for the verdict. This reflects environments where data sovereignty requirements, air-gap mandates, or regulatory frameworks make public cloud placement non-viable. The cost comparison table still displays all six providers for reference, but the Architect’s Verdict is constrained to private infrastructure.

Q: What is the OpEx Adder and should I change the default?

A: The OpEx Adder accounts for the ongoing operational costs of on-prem infrastructure that don’t appear in the CapEx amortization — power and cooling (typically 10–12% of hardware CapEx annually), rack space and cabling (2–3%), and maintenance contracts (8–10%). The default 20% is a conservative baseline appropriate for modern data centers with efficient cooling. Increase it to 25–30% for older Tier II facilities, or 30–35% if including staff allocation for a dedicated GPU cluster administrator. The adder directly impacts the Nutanix and Cisco monthly TCO and can shift the placement verdict at higher settings.

Q: Why is CoreWeave more expensive per GPU-hour than AWS in the comparison?

A: CoreWeave’s rate reflects bare-metal HGX configuration with dedicated InfiniBand fabric and no multi-tenant GPU contention. AWS p5.48xlarge pricing includes shared infrastructure overhead and managed service components. At the benchmark configuration (8x H100, BF16 precision), CoreWeave’s dedicated fabric eliminates the inter-GPU latency penalty that multi-tenant environments can introduce — the premium reflects that architectural guarantee, not raw hardware cost. For workloads where GPU contention is a performance risk, the CoreWeave rate is the more accurate model of actual inference cost.

>_ Methodology & Data Notice

Provider rates reflect April 2026 market observations. AWS and GCP on-demand pricing sourced from published regional rate cards (US-East-1 / us-central1). CoreWeave and Lambda rates reflect published reserved cluster pricing. Nutanix and Cisco rates use 36-month CapEx amortization plus 20% OpEx Adder on reference H100 node configurations. All rates are subject to change. Verify current pricing directly with each provider before making infrastructure commitments. Lambda egress reflects absence of published fee — bandwidth limits apply at production scale. Token TCO calculated against Llama 3 70B BF16 at 730 hours/month steady-state utilization unless Duty Cycle is adjusted.