Skip to content
Hardware Tiers

From data center racks to pocket-sized edge chips

Three tiers of AI hardware, covering every deployment scenario.

Abstract visualization of data center AI infrastructure
01 — Data Center

Data center inference

Run frontier models locally. Multi-GPU and multi-chip systems for large-scale inference, from single accelerator cards to liquid-cooled multi-chip platforms. The hardware that powers production AI.

Up to 405B Parameter models
Multi-chip Scaling & interconnect
PFLOPS Class performance
Available from: NVIDIA AMD Tenstorrent Q.ANT

NVIDIA The industry standard

CUDA-based GPUs with the broadest software ecosystem. The default choice for most AI workloads — and the benchmark everything else is measured against.

RTX 6000 Blackwell DC DGX Spark H100 / H200

AMD The GPU alternative

Instinct accelerators with growing ROCm software support. A competitive option for inference workloads, often at lower cost than NVIDIA equivalents.

Instinct MI300X Instinct MI325X

Tenstorrent The open-source RISC-V path

RISC-V-based AI accelerators with a fully open-source software stack. Up to 5x lower TCO than NVIDIA for inference workloads. From single PCIe cards to liquid-cooled multi-chip systems.

Blackhole p150a (PCIe card) TT-QuietBox (4-chip, liquid-cooled) Wormhole n300

Q.ANT Photonic computing from Germany

Next-generation photonic AI processors that compute with light instead of electrons. Up to 30x energy efficiency versus traditional silicon. Made in Germany, funded by BMBF.

NPS Server (Photonic NPU)
Request access
Abstract visualization of desktop AI workstation compute
02 — Workstation

Desktop AI & local inference

AI on your desk. Single-card accelerators, desktop AI supercomputers, and unified-memory workstations that let you run large models without a server room. The answer to "can I run this model locally?"

Up to 192 GB Unified / GPU memory
70B+ models On a single system
Desktop Form factor
Available from: NVIDIA Tenstorrent Apple

NVIDIA Desktop AI supercomputer

The DGX Spark brings Grace Blackwell to a desktop form factor with 128 GB unified memory — powerful enough for most production models without a server room.

DGX Spark RTX 6000 workstation cards

Tenstorrent Affordable single-card AI

A single Blackhole PCIe card delivers 664 TFLOPS for under €1,300 — plug it into any workstation and start running models. The QuietBox packs four chips in a quiet, liquid-cooled desktop.

Blackhole p150a (PCIe card, ~€1,300) TT-QuietBox (4-chip desktop)

Apple Unified memory for large models

Mac Studio with M-series Ultra offers up to 192 GB unified memory — enough to run 70B models at FP16 or 120B+ at INT4. The best price-per-GB for local inference in the Apple ecosystem.

Mac Studio (M-series Ultra)
Request access
Abstract visualization of distributed edge AI nodes
03 — Edge AI

On-device inference without the cloud

AI at the edge — offline, sovereign, power-efficient. Run vision models and small LLMs on devices that draw less power than a phone charger. Perfect for manufacturing floors, retail, vehicles, and anywhere cloud connectivity isn't guaranteed.

1-5W Power consumption
Up to 214 TOPS Edge performance
LLM + Vision On-device capable
Available from: Hailo Axelera DEEPX NVIDIA

Hailo The edge AI leader

Dataflow architecture designed for maximum power efficiency. The Hailo-10 brings LLM inference to edge devices at just 2.5W. Deep Raspberry Pi integration makes prototyping fast.

Hailo-10 (40 TOPS, LLM-capable) Hailo-8 (26 TOPS, vision)

Axelera European RISC-V edge AI

Dutch-designed edge processors using Digital In-Memory Computing on RISC-V. The Metis delivers 214 TOPS at extreme power efficiency. EU-funded, European supply chain.

Metis AIPU (214 TOPS) Europa AIPU (629 TOPS, coming soon)

DEEPX Ultra-low-power embedded AI

South Korean edge AI chips optimized for the absolute lowest power envelope. The DX-M1 delivers 25 TOPS at just 1-5W in a tiny M.2 form factor.

DX-M1 (25 TOPS, 1-5W, M.2)

NVIDIA CUDA at the edge

Jetson brings the CUDA ecosystem to edge devices. Familiar tools and frameworks for developers already in the NVIDIA ecosystem, with GPU-accelerated inference on-device.

Jetson Orin Nano (40 TOPS, 8 GB)
Request access
What You Get

Real numbers, not marketing claims

Every benchmark report includes the metrics that matter for production deployment.

Throughput

Tokens/second for LLMs, frames/second for vision. Under real-world concurrent load, not synthetic peaks.

Latency

Time to first token, P50/P95/P99 latency. The numbers that determine whether your users wait or don't.

Power

Per-device watt measurements. Critical for edge deployment and data center TCO calculations.

Cost-per-inference

Hardware purchase price amortized to cost per million tokens. The number your CFO cares about.

Compatibility

Did the model need conversion? Quantization? What broke? Honest notes on real-world readiness.

Methodology

MLPerf-aligned, fully documented, open-source benchmark scripts. Reproducible by anyone.

Want to test your model on our hardware?

Book a consultation and we'll design a benchmark plan for your specific workload.

Get started Developer access