Back to products
PandaProbe

PandaProbe

open source agent engineering platform

Overview

What it is

PandaProbe is an open-source agent engineering platform that gives you deep observability into AI agent applications. Use it to trace, evaluate, monitor and debug your AI agents in development and production.

Intent

I need it when

Evaluate agent performance with research-grounded metrics tailored to long-running agents

PandaProbe provides SOTA metrics purpose-built for agent behavior, LLM-as-judge scoring with structured feedback, and session-level evaluation (not just isolated traces). This allows teams to score and pinpoint exactly where agents drift across entire lifecycles.

Debug and improve AI agent behavior in production before users encounter failures

PandaProbe captures full agent trajectories (tool calls, LLM hops, decision branches) and applies state-of-the-art metrics to detect agent uncertainty over long trajectories. This enables developers to spot issues and behavioral drift before users experience problems.

Instrument and trace AI agents with minimal code changes across multiple frameworks and LLM providers

PandaProbe offers one-line instrumentation for major agent frameworks (LangGraph, LangChain, CrewAI, Google ADK, Claude SDK, OpenAI SDK) and works with any LLM provider out of the box. Spans and metadata are captured automatically, reducing integration overhead.

Deploy agent observability infrastructure without vendor lock-in or managed service costs

PandaProbe is open source (Apache 2.0) and self-hostable with all core features available free. Teams can deploy on their own infrastructure for unlimited scale, or use the managed cloud option and upgrade only when team size or trace volume requires it.

Monitor agent regressions and performance changes across production versions automatically

PandaProbe schedules eval runs against production traffic on custom cadences (daily, hourly, or cron), alerts on metric regressions across agent versions, and catches behavioral drift the moment it appears. This enables proactive regression detection without manual intervention.

Drop

Not a fit when

  • User needs traditional software monitoring without AI agent-specific metrics and evaluation capabilities
  • Organization requires on-premises deployment with vendor support and SLAs but lacks engineering resources for self-hosting
  • Team uses agent frameworks not listed in integrations (LangGraph, LangChain, DeepAgents, CrewAI, Google ADK, Claude Agent SDK, OpenAI Agents SDK)
  • Project requires real-time tracing with sub-millisecond latency requirements across distributed systems
  • User needs human-in-the-loop annotation at scale beyond the 1-seat limit in Hobby plan without upgrading
Commercials

Pricing

Freemium cloud with open-source self-hosted option. Cloud tiers: Hobby (free), Pro ($29/month), Startup ($299/month), Enterprise (custom). Open source deployment free with Apache 2.0 license. View pricing