Public preview — This API is in public preview. Endpoints, schemas, and limits may change before general availability.
API
Streaming
How EU GPT streams responses as Server-Sent Events, and when streaming is the right call.
EU GPT streams responses as Server-Sent Events (SSE) by default. The SDKs hide the wire format, but it helps to know what is on the wire.
When to stream#
| You want | Choose |
|---|---|
| To render output as the model produces it (chat UIs, dashboards). | stream: true |
| The shortest end-to-end latency for the first token. | stream: true |
A single JSON blob you can JSON.parse or .json(). | stream: false |
| Back-end batch jobs that do not need partial output. | stream: false |
Streaming reduces perceived latency dramatically — the first text delta typically arrives 200-400 ms after the request, while the full response can take seconds. For anything user-facing, stream.
The wire format#
The HTTP response has Content-Type: text/event-stream. The body is a sequence of events separated by blank lines:
event: message
data: {"type":"response.created","sequence_number":0, ...}
event: message
data: {"type":"response.output_text.delta","sequence_number":1,"delta":"Hello"}
event: message
data: {"type":"response.output_text.delta","sequence_number":2,"delta":", world"}
event: message
data: {"type":"response.completed","sequence_number":3,"final_text":"Hello, world"}
Every event JSON has a type and a sequence_number. The full taxonomy of event types is in Streaming events.
Ordering and gaps#
sequence_numberis monotonically increasing within a single response. Skips never happen — a missing number is a bug, not an expected condition.response.createdis always the first event.- A terminal event —
response.completedorerror— always closes the stream. - The connection closes (server-side) after the terminal event.
Buffering and proxies#
Some HTTP proxies and load balancers buffer responses, defeating the point of streaming. If your application sits behind a custom proxy:
- Disable response buffering on routes that forward
/v1/responses. - For Nginx, set
proxy_buffering off;andproxy_cache off;on that location. - For Cloudflare workers, set
cache: "no-store"and avoidResponsewrapping that re-reads the body.
The EU GPT edge does not buffer streams.
Server-side timeouts#
There is no hard server-side timeout for a single response, but practically responses complete within a few minutes. If you receive no events for 60 seconds, the upstream model has likely stalled — disconnect and retry with the same prompt.
For long-running generations, prefer narrower prompts or break the work into smaller calls rather than counting on one giant stream.
Parsing the stream#
You almost never need to parse SSE by hand:
- JavaScript: use the
openaiSDK’sAsyncIterableinterface. - Python: use the
openaiSDK’sstream=Truemode, which yields parsed events. - Other languages: any standard SSE client will work. The
data:payload is a JSON object.
For hand-rolled parsers, see Handling streams.