Back to products
Edgee

Edgee

The AI Gateway that TL;DR tokens

Overview

What it is

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Intent

I need it when

Reduce LLM token costs and extend coding session duration without code changes

Edgee compresses input and output tokens by up to 50% through surgical removal of redundancy (tool-result trimming, output brevity). Users install the CLI, connect their coding agent (Claude Code, Codex, Cursor), and see cost savings on the first request. No application code changes required; works as a transparent proxy.

Monitor LLM request latency, errors, and model performance across the team

Edgee's observability dashboard aggregates costs, token usage, compression savings, performance metrics, and errors across all requests. Logs page shows every request with model, provider, tokens, cost, compression delta, and latency. Users can filter by tags (environment, feature, user, team) and inspect full request/response payloads in debug mode.

Use own LLM provider keys while keeping compression and observability benefits

Bring Your Own Keys (BYOK) feature lets users register their own API keys (Anthropic, OpenAI, Google Vertex AI, Mistral, DeepSeek, xAI, zAI, AWS Bedrock) with Edgee. Requests are billed to the user's provider account; Edgee's routing, compression, and observability continue to work normally. Keys are stored securely and masked after creation.

Ensure coding agents keep working when an LLM provider fails or hits rate limits

Edgee automatically retries transient errors and falls back to alternative LLM providers (200+ models supported) without code changes. Team plan includes fallback and rerouting; free plan includes auto-retry on transient errors. Requests are routed based on provider success rates and availability.

Track and control LLM spending per developer, repository, and team

Edgee provides session-level metering (local SQLite logs) and team-level dashboards showing cost, token usage, compression savings, and errors per request. Team plan includes GitHub integration for per-repo and per-PR attribution, spending caps, and alerts. Every response includes compression metrics (saved_tokens, cost_savings, reduction).

Drop

Not a fit when

  • User needs a traditional LLM API without token compression or routing—Edgee is a gateway layer that adds overhead for users who don't need cost optimization
  • Organization requires on-premises deployment with no SaaS option—Edgee's free and team plans are cloud-hosted; enterprise self-hosted is custom only
  • User works exclusively with non-supported LLM providers—Edgee supports major providers (OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, zAI, AWS Bedrock) but not all niche models
  • Team has no coding agents or LLM-based applications—Edgee is purpose-built for Claude Code, Codex, Cursor, and similar agents; not a general-purpose LLM client
  • User requires real-time streaming with guaranteed fallback mid-stream—Edgee cannot retry or reroute once streaming begins; fallback only works before first chunk
  • Organization needs sub-monthly billing or pay-as-you-go pricing—Edgee's Team plan requires annual commitment; no hourly or daily billing options
Commercials

Pricing

Freemium with team and enterprise tiers. Free plan includes token compression and basic features. Team plan at €29/developer/month (billed annually) includes fallback, rerouting, and team management. Enterprise plan has custom volume-based pricing. View pricing