Back to products
OpenAI WebSocket Mode for Responses API

OpenAI WebSocket Mode for Responses API

Persistent AI agents. Up to 40% faster.

Website developers.openai.com
Overview

What it is

Every agent turn, you're resending the full context. Again. That overhead compounds fast. WebSocket Mode for the Responses API keeps a persistent connection, sends only incremental inputs, and cuts end-to-end latency by up to 40% on heavy tool-call workflows.

Intent

I need it when

Maintain conversation state across multiple model-tool interactions

WebSocket mode uses previous_response_id chaining to continue sessions, allowing developers to send only new input items while the API maintains full context. This enables stateful multi-turn interactions without resending prior conversation history.

Reduce latency in agentic workflows with many tool-call round trips

WebSocket mode maintains a persistent connection to the Responses API and sends only incremental inputs (new tool outputs and messages) on each turn, eliminating per-turn connection overhead. For workflows with 20+ tool calls, this delivers approximately 40% faster end-to-end execution compared to standard HTTP requests.

Optimize API costs while maintaining low latency in agent deployments

WebSocket mode is compatible with both Zero Data Retention (ZDR) and store=false options, allowing developers to reduce latency without incurring storage costs. The incremental input pattern also reduces token overhead per turn.

Build low-latency orchestration loops with repeated tool calls

WebSocket mode is optimized for long-running, tool-call-heavy workflows such as agentic coding or orchestration loops. The persistent connection and incremental input pattern reduce continuation overhead, making it ideal for multi-step agent reasoning and tool execution chains.

Drop

Not a fit when

  • Workflows with few or no tool calls—WebSocket mode overhead is not justified for simple single-turn requests
  • Applications requiring HTTP-only infrastructure that cannot maintain persistent connections
  • Use cases where response latency is not a primary concern and cost optimization is the only goal
  • Systems that need to store full conversation history server-side, as WebSocket mode is compatible with Zero Data Retention
  • Real-time applications requiring sub-millisecond latency where network round-trip time dominates
Commercials

Pricing

Pricing not specified