Streaming Messages

Streaming Messages allow real-time content retrieval while the model generates responses, without waiting for the complete response to be generated. This approach can significantly improve user experience, especially when generating long text content, as users can immediately see output beginning to appear.

Features

Streaming messages use an incremental generation mechanism, transmitting content in chunks in real-time during the generation process, rather than waiting for the complete response to be generated before returning it all at once. This mechanism allows developers to:

Real-time Response: No need to wait for complete response, content displays progressively
Improved Experience: Reduce user waiting time, provide instant feedback
Reduced Latency: Content is transmitted as it’s generated, reducing perceived latency
Flexible Processing: Real-time processing and display during reception

Core Parameter Description

stream=True: Enable streaming output, must be set to True
model: Models that support streaming output, such as glm-4.6, glm-4.5, etc.

Response Format Description

Streaming responses use Server-Sent Events (SSE) format, with each event containing:

choices[0].delta.content: Incremental text content
choices[0].delta.reasoning_content: Incremental reasoning content
choices[0].finish_reason: Completion reason (only appears in the last chunk)
usage: Token usage statistics (only appears in the last chunk)

Code Examples

cURL
Python

curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "glm-4.6",
    "messages": [
        {
            "role": "user",
            "content": "Write a poem about spring"
        }
    ],
    "stream": true
}'

Response Example

The streaming response format is as follows:

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":"Spring"},"finish_reason":null}]}

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":" comes"},"finish_reason":null}]}

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":" with"},"finish_reason":null}]}

...

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":8,"completion_tokens":262,"total_tokens":270,"prompt_tokens_details":{"cached_tokens":0}}}

data: [DONE]

Application Scenarios

Chat Applications

Real-time conversation experience
Character-by-character reply display
Reduced waiting time

Content Generation

Article writing assistant
Code generation tools
Creative content creation

Educational Applications

Online Q&A systems
Learning assistance tools
Knowledge Q&A platforms

Customer Service Systems

Intelligent customer service bots
Real-time problem solving
User support systems

Get Started

Language Models

Visual Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Capabilities

Tools

Agents

Features

Core Parameter Description

Response Format Description

Code Examples

Response Example

Application Scenarios

Chat Applications

Content Generation

Educational Applications

Customer Service Systems

Get Started

Language Models

Visual Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Capabilities

Tools

Agents

​Features

​Core Parameter Description

​Response Format Description

​Code Examples

​Response Example

​Application Scenarios

Chat Applications

Content Generation

Educational Applications

Customer Service Systems

Features

Core Parameter Description

Response Format Description

Code Examples

Response Example

Application Scenarios