Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.
Intent
I need it when
Reduce LLM token costs for coding agents without code changes
Edgee compresses input and output tokens by up to 50% through surgical removal of redundancy (tool-result trimming, tool-surface reduction, output brevity). Works as a transparent proxy—install CLI, connect coding agent, save tokens on first request. No application code changes required.
Maintain coding productivity when LLM providers fail or rate-limit
Edgee provides automatic per-request retry and fallback across 200+ models from 8+ providers (OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, zAI, AWS Bedrock). Transient errors retry once; persistent failures route to configured backup. Zero downtime from agent perspective.
Use own LLM provider keys while keeping routing and observability benefits
Bring Your Own Keys (BYOK) feature lets users register API keys for Anthropic, OpenAI, Google Vertex AI, Mistral, DeepSeek, xAI, zAI, and AWS Bedrock. Requests route through user's key and billing account while Edgee's compression, fallback, and observability continue to work.
Extend coding session duration on Claude Pro/Max plans when quota is hit
Edgee implements plan-cap continuity: when Claude Pro/Max quota is exhausted, it automatically falls back to API-key-based providers, keeping the session active. Compression also extends sessions by 26.5% on the same plan.
Track and optimize LLM spending per developer, repository, and team
Edgee dashboard aggregates costs, token usage, compression savings, and errors across all requests. Team plan includes per-repo/per-PR attribution via GitHub integration, spending caps, alerts, and usage exports. Every response includes real-time compression metrics (saved tokens, cost savings, reduction percentage).
Drop
Not a fit when
User requires on-premises deployment without custom enterprise agreement; only Team and Free plans are SaaS-only
User needs real-time token compression for non-coding LLM workloads; product is optimized for coding agents (Claude Code, Codex, Cursor, OpenCode)
User operates with strict data residency requirements outside EU/US without enterprise custom deployment option
User needs compression for streaming responses mid-stream; retries and fallback only work before first chunk delivery
Commercials
Pricing
Freemium with team and enterprise tiers. Free plan includes token compression for solo developers. Team plan at €29/developer/month (billed annually) adds fallback, rerouting, and team management. Enterprise plan offers custom pricing with SSO, audit logs, and self-hosted options.View pricing