EU GPT logo
EU GPT

Public preview — This API is in public preview. Endpoints, schemas, and limits may change before general availability.

API

Streaming

How EU GPT streams responses as Server-Sent Events, and when streaming is the right call.

EU GPT streams responses as Server-Sent Events (SSE) by default. The SDKs hide the wire format, but it helps to know what is on the wire.

When to stream#

You wantChoose
To render output as the model produces it (chat UIs, dashboards).stream: true
The shortest end-to-end latency for the first token.stream: true
A single JSON blob you can JSON.parse or .json().stream: false
Back-end batch jobs that do not need partial output.stream: false

Streaming reduces perceived latency dramatically — the first text delta typically arrives 200-400 ms after the request, while the full response can take seconds. For anything user-facing, stream.

The wire format#

The HTTP response has Content-Type: text/event-stream. The body is a sequence of events separated by blank lines:

event: message
data: {"type":"response.created","sequence_number":0, ...}

event: message
data: {"type":"response.output_text.delta","sequence_number":1,"delta":"Hello"}

event: message
data: {"type":"response.output_text.delta","sequence_number":2,"delta":", world"}

event: message
data: {"type":"response.completed","sequence_number":3,"final_text":"Hello, world"}

Every event JSON has a type and a sequence_number. The full taxonomy of event types is in Streaming events.

Ordering and gaps#

  • sequence_number is monotonically increasing within a single response. Skips never happen — a missing number is a bug, not an expected condition.
  • response.created is always the first event.
  • A terminal event — response.completed or error — always closes the stream.
  • The connection closes (server-side) after the terminal event.

Buffering and proxies#

Some HTTP proxies and load balancers buffer responses, defeating the point of streaming. If your application sits behind a custom proxy:

  • Disable response buffering on routes that forward /v1/responses.
  • For Nginx, set proxy_buffering off; and proxy_cache off; on that location.
  • For Cloudflare workers, set cache: "no-store" and avoid Response wrapping that re-reads the body.

The EU GPT edge does not buffer streams.

Server-side timeouts#

There is no hard server-side timeout for a single response, but practically responses complete within a few minutes. If you receive no events for 60 seconds, the upstream model has likely stalled — disconnect and retry with the same prompt.

For long-running generations, prefer narrower prompts or break the work into smaller calls rather than counting on one giant stream.

Parsing the stream#

You almost never need to parse SSE by hand:

  • JavaScript: use the openai SDK’s AsyncIterable interface.
  • Python: use the openai SDK’s stream=True mode, which yields parsed events.
  • Other languages: any standard SSE client will work. The data: payload is a JSON object.

For hand-rolled parsers, see Handling streams.