Back to products
IonRouter

IonRouter

Serve Any AI Model, Faster & Cheaper

Overview

What it is

Teams use IonRouter as a drop‑in OpenAI-compatible API to hit the best open models for LLMs, vision, video, and TTS at HALF market rate. You can run agents and multi‑modal apps, and deploy your finetunes on our fleet while we handle optimization and scaling in the background. Under the hood, IonRouter runs a custom inference engine (IonAttention) built for NVIDIA Grace Hopper, cutting price and latency for your workloads.

Intent

I need it when

Deploy high-throughput LLM inference without managing GPU infrastructure

IonRouter provides OpenAI-compatible API access to language models (Qwen, GLM, Kimi, DeepSeek) with per-second billing and 0ms cold starts on dedicated GPU streams. Users point existing OpenAI clients to IonRouter's endpoint with one line of code, eliminating infrastructure management while achieving 7,167 tok/s throughput on single GH200 GPUs via IonAttention engine.

Run real-time vision-language model inference for robotics and surveillance applications

IonRouter supports concurrent VLM perception with <1s cold starts and multiplexes five vision models on a single GPU. Enables real-time robotics perception, multi-stream video analysis, and surveillance workflows with dedicated GPU streams and per-second billing, eliminating idle costs.

Deploy custom fine-tuned models and LoRAs without cold start penalties

IonRouter provides dedicated GPU streams for custom models, finetunes, and LoRAs with 0ms cold starts and per-second billing. Users bring any open-source model and receive isolated compute capacity, enabling production deployment of proprietary models without shared resource contention.

Integrate text-to-speech into applications with minimal latency and cost

IonRouter offers TTS models (Orpheus, Dia, F5-TTS) billed per minute of generated audio (~$0.003–$0.006/min) with real-time generation speed. F5-TTS supports voice cloning via reference audio, enabling personalized speech synthesis for applications without maintaining TTS infrastructure.

Generate images and videos on-demand without provisioning dedicated compute

IonRouter offers Flux image generation (~3s per image at ~$0.005) and video generation models (Wan, HunyuanVideo, LTX) billed per GPU·second. Supports text-to-video, image-to-video, and text-to-image workflows with sub-10-second generation times, ideal for game asset generation and AI video pipelines.

Drop

Not a fit when

  • User requires on-premise or self-hosted deployment; IonRouter is a cloud-based API service only
  • User needs models not listed in IonRouter's catalog (limited to Qwen, GLM, Kimi, MiniMax, DeepSeek, Flux, Wan, and specific vision/audio models)
  • User requires guaranteed SLA or enterprise support contracts; product targets developers and teams with API-first workflows
  • User needs real-time inference with sub-millisecond latency for time-critical applications beyond the stated 0ms cold start for dedicated streams
  • User operates in regions where IonRouter infrastructure is unavailable or restricted
Commercials

Pricing

Pay-per-token pricing. Input tokens and output tokens charged separately per million tokens. Video and audio models billed per GPU·second or per minute of generated audio. View pricing