Edgee Codex Compressor

Use Codex at 35.6% lower costs

Website edgee.ai

What it is

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Intent

I need it when

Reduce LLM token costs for coding agents without code changes

Edgee compresses input and output tokens by up to 50% through surgical removal of redundancy (tool-result trimming, tool-surface reduction, output brevity). Works as a transparent proxy—install CLI, connect coding agent, save tokens on first request. No application code changes required.

Maintain coding productivity when LLM providers fail or rate-limit

Edgee provides automatic per-request retry and fallback across 200+ models from 8+ providers (OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, zAI, AWS Bedrock). Transient errors retry once; persistent failures route to configured backup. Zero downtime from agent perspective.

Use own LLM provider keys while keeping routing and observability benefits

Bring Your Own Keys (BYOK) feature lets users register API keys for Anthropic, OpenAI, Google Vertex AI, Mistral, DeepSeek, xAI, zAI, and AWS Bedrock. Requests route through user's key and billing account while Edgee's compression, fallback, and observability continue to work.

Extend coding session duration on Claude Pro/Max plans when quota is hit

Edgee implements plan-cap continuity: when Claude Pro/Max quota is exhausted, it automatically falls back to API-key-based providers, keeping the session active. Compression also extends sessions by 26.5% on the same plan.

Track and optimize LLM spending per developer, repository, and team

Edgee dashboard aggregates costs, token usage, compression savings, and errors across all requests. Team plan includes per-repo/per-PR attribution via GitHub integration, spending caps, alerts, and usage exports. Every response includes real-time compression metrics (saved tokens, cost savings, reduction percentage).

Drop

Not a fit when

User requires on-premises deployment without custom enterprise agreement; only Team and Free plans are SaaS-only
User needs real-time token compression for non-coding LLM workloads; product is optimized for coding agents (Claude Code, Codex, Cursor, OpenCode)
User requires guaranteed sub-100ms latency; gateway adds measurable overhead despite edge processing optimization
User operates with strict data residency requirements outside EU/US without enterprise custom deployment option
User needs compression for streaming responses mid-stream; retries and fallback only work before first chunk delivery

Commercials

Pricing

Freemium with team and enterprise tiers. Free plan includes token compression for solo developers. Team plan at €29/developer/month (billed annually) adds fallback, rerouting, and team management. Enterprise plan offers custom pricing with SSO, audit logs, and self-hosted options. View pricing