NVIDIA 4090 / 5090
24+ GB VRAM consumer cards from gamer providers. Great for 7B–34B models, batch inference, LoRA fine-tunes.
Batch inference and fine-tuning on idle 4090s, 5090s, and M3 Max Apple Silicon. $0.20 / GPU-hour — ~10x cheaper per token than H100 spot at typical 7B–34B workloads. Hugging Face TGI / vLLM / MLX templates included. Bring your model, we route the work.
24+ GB VRAM consumer cards from gamer providers. Great for 7B–34B models, batch inference, LoRA fine-tunes.
M3 Max and M4 Macs running MLX. Especially good for Mistral, Llama, and Whisper transcription.
First minute is rounded; after that, billed per second. Best-fit pricing for short inference bursts.
We benchmark each provider’s GPU before dispatch and only charge once we’ve confirmed advertised performance.
vLLM, Hugging Face TGI, Ollama, ComfyUI, AUTOMATIC1111 — start a job with a template name, no Dockerfile needed.
Mount weights from S3 or HF Hub. Provider hardware can’t exfiltrate model parameters — VRAM is wiped at job exit.
First $5 is on us. Run a vLLM job against Llama 3 70B and tell us how the latency compares.
See pricing