Skip to content
How It Works

Three steps to running on real hardware

1

Pick your hardware

Choose from our catalog: NVIDIA Blackwell GPUs, Tenstorrent Blackhole accelerators, Hailo edge chips, and more. Each device comes with SDKs and frameworks pre-installed.

2

Connect from your browser

Get browser-based SSH access — no client install, no VPN. Book time slots by the hour. Your environment is isolated and reset between sessions.

3

Run your workload

Deploy your models, run your benchmarks, test your code. Get real performance numbers — throughput, latency, power consumption — on real hardware.

What You Get

Not just compute — answers

Pre-installed SDKs

Every device comes ready: CUDA, TT-Forge, Hailo SDK, JetPack, MLX — no setup time wasted. Start coding immediately.

Popular models pre-loaded

Llama 3.1, Mistral, Qwen, DeepSeek, YOLOv8 — already compiled and optimized for each hardware target. Or bring your own.

Benchmark toolkit

Our open-source benchmark suite is pre-installed. Run standardized tests or build your own — either way, you get comparable numbers across hardware.

Power measurement

Per-device wattmeters give you real power consumption data. Critical for edge deployment or TCO calculations.

Use Cases

What developers use TensorPi for

01

Model-hardware compatibility testing

"Does my fine-tuned model actually run on Tenstorrent?" Find out in an hour, not after a €10K purchase.

02

Inference performance benchmarking

Compare tok/s, latency, and memory usage across NVIDIA, Tenstorrent, and edge devices. Same model, same prompts, different hardware.

03

Edge deployment prototyping

Test your vision or LLM pipeline on Hailo, Axelera, or Jetson before committing to an edge architecture.

04

Framework and SDK evaluation

Try TT-Forge, Hailo TAPPAS, or Axelera Voyager SDK without setting up the full toolchain yourself.

$ tensorpi connect --device blackhole-01
Connected to Tenstorrent Blackhole p150a
32 GB GDDR6 • TT-Forge 0.9 • vLLM ready
$ python benchmark.py --model llama-3.1-70b --quantization fp8
Loading model... done (38.2 GB)
Running inference benchmark (1000 requests)...
Throughput: 85 tok/s
TTFT: 140ms • P95: 180ms
Power: 75W avg
$ tensorpi compare --add rtx6000-01 --model llama-3.1-70b
Comparison report saved to ./report-2026-07-15.pdf

Illustrative session — actual interface may vary.

Ready to get your hands on real hardware?

No commitment, no long-term contracts. Book hardware by the hour and find out what works for your project.

Request access Browse hardware