Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.
Intent
I need it when
Reduce LLM token costs and extend coding session duration without code changes
Edgee compresses input and output tokens by up to 50% through surgical removal of redundancy (tool-result trimming, output brevity). Users install the CLI, connect their coding agent (Claude Code, Codex, Cursor), and see cost savings on the first request. No application code changes required; works as a transparent proxy.
Monitor LLM request latency, errors, and model performance across the team
Edgee's observability dashboard aggregates costs, token usage, compression savings, performance metrics, and errors across all requests. Logs page shows every request with model, provider, tokens, cost, compression delta, and latency. Users can filter by tags (environment, feature, user, team) and inspect full request/response payloads in debug mode.
Use own LLM provider keys while keeping compression and observability benefits
Bring Your Own Keys (BYOK) feature lets users register their own API keys (Anthropic, OpenAI, Google Vertex AI, Mistral, DeepSeek, xAI, zAI, AWS Bedrock) with Edgee. Requests are billed to the user's provider account; Edgee's routing, compression, and observability continue to work normally. Keys are stored securely and masked after creation.
Ensure coding agents keep working when an LLM provider fails or hits rate limits
Edgee automatically retries transient errors and falls back to alternative LLM providers (200+ models supported) without code changes. Team plan includes fallback and rerouting; free plan includes auto-retry on transient errors. Requests are routed based on provider success rates and availability.
Track and control LLM spending per developer, repository, and team
Edgee provides session-level metering (local SQLite logs) and team-level dashboards showing cost, token usage, compression savings, and errors per request. Team plan includes GitHub integration for per-repo and per-PR attribution, spending caps, and alerts. Every response includes compression metrics (saved_tokens, cost_savings, reduction).
Drop
Not a fit when
User needs a traditional LLM API without token compression or routing—Edgee is a gateway layer that adds overhead for users who don't need cost optimization
Organization requires on-premises deployment with no SaaS option—Edgee's free and team plans are cloud-hosted; enterprise self-hosted is custom only
User works exclusively with non-supported LLM providers—Edgee supports major providers (OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, zAI, AWS Bedrock) but not all niche models
Team has no coding agents or LLM-based applications—Edgee is purpose-built for Claude Code, Codex, Cursor, and similar agents; not a general-purpose LLM client
User requires real-time streaming with guaranteed fallback mid-stream—Edgee cannot retry or reroute once streaming begins; fallback only works before first chunk
Organization needs sub-monthly billing or pay-as-you-go pricing—Edgee's Team plan requires annual commitment; no hourly or daily billing options
Commercials
Pricing
Freemium with team and enterprise tiers. Free plan includes token compression and basic features. Team plan at €29/developer/month (billed annually) includes fallback, rerouting, and team management. Enterprise plan has custom volume-based pricing.View pricing