Deep Thinking

Deep Thinking is an advanced reasoning feature that enables Chain of Thought mechanisms, allowing the model to perform deep analysis and reasoning before answering questions. This approach significantly improves the model’s accuracy and interpretability in complex tasks, particularly suitable for scenarios requiring multi-step reasoning, logical analysis, and problem-solving.

Features

The Deep Thinking feature currently supports the latest models in the GLM-4.5 and GLM-4.6 series. By enabling deep thinking, the model can:

Multi-step Reasoning: Break down complex problems into multiple steps for gradual analysis and resolution
Logical Analysis: Provide clear reasoning processes and logical chains
Improved Accuracy: Reduce errors and improve answer quality through deep thinking
Enhanced Interpretability: Display the thinking process to help users understand the model’s reasoning logic
Intelligent Judgment: The model automatically determines whether deep thinking is needed to optimize response efficiency

Core Parameters

thinking.type: Controls the deep thinking mode
- enabled (default): Enable dynamic thinking, model automatically determines if deep thinking is needed
- disabled: Disable deep thinking, provide direct answers
model: Models that support deep thinking, such as glm-4.6, glm-4.5, glm-4.5v, etc.

Code Examples

cURL
Python SDK

Basic Call (Enable Deep Thinking)

curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "glm-4.7",
    "messages": [
        {
            "role": "user",
            "content": "Explain in detail the basic principles of quantum computing and analyze its potential impact in the field of cryptography"
        }
    ],
    "thinking": {
        "type": "enabled"
    },
    "max_tokens": 4096,
    "temperature": 1.0
}'

Streaming Call (Deep Thinking + Streaming Output)

curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "glm-4.7",
    "messages": [
        {
            "role": "user",
            "content": "Design a recommendation system architecture for an e-commerce website, considering user behavior, product features, and real-time requirements"
        }
    ],
    "thinking": {
        "type": "enabled"
    },
    "stream": true,
    "max_tokens": 4096,
    "temperature": 1.0
}'

Disable Deep Thinking

curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "glm-4.7",
    "messages": [
        {
            "role": "user",
            "content": "How is the weather today?"
        }
    ],
    "thinking": {
        "type": "disabled"
    }
}'

Install SDK

# Install latest version
pip install zai-sdk

# Or specify version
pip install zai-sdk==0.1.0

Verify Installation

import zai
print(zai.__version__)

Basic Call (Enable Deep Thinking)

from zai import ZaiClient

# Initialize client
client = ZaiClient(api_key='your_api_key')

# Create deep thinking request
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Explain in detail the basic principles of quantum computing and analyze its potential impact in the field of cryptography"}
    ],
    thinking={
        "type": "enabled"  # Enable deep thinking mode
    },
    max_tokens=4096,
    temperature=1.0
)

print("Model response:")
print(response.choices[0].message.content)
print("\n---")
print(response.choices[0].message.reasoning_content)

Streaming Call (Deep Thinking + Streaming Output)

from zai import ZaiClient

# Initialize client
client = ZaiClient(api_key='your_api_key')

# Create streaming deep thinking request
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "Design a recommendation system architecture for an e-commerce website, considering user behavior, product features, and real-time requirements"}
    ],
    thinking={
        "type": "enabled"  # Enable deep thinking mode
    },
    stream=True,  # Enable streaming output
    max_tokens=4096,
    temperature=1.0
)

# Process streaming response
reasoning_content = ""
thinking_phase = True

for chunk in response:
    if not chunk.choices:
        continue
    
    delta = chunk.choices[0].delta
    
    # Process thinking process (if any)
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        reasoning_content += delta.reasoning_content
        if thinking_phase:
            print("🧠 Thinking...", end="", flush=True)
            thinking_phase = False
        print(delta.reasoning_content, end="", flush=True)
    
    # Process answer content
    if hasattr(delta, 'content') and delta.content:
        if thinking_phase:
            print("\n\n💡 Answer:")
            thinking_phase = False
        print(delta.content, end="", flush=True)

Disable Deep Thinking

from zai import ZaiClient

# Initialize client
client = ZaiClient(api_key='your_api_key')

# Disable deep thinking for quick response
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "user", "content": "How is the weather today?"}
    ],
    thinking={
        "type": "disabled"  # Disable deep thinking mode
    }
)

print(response.choices[0].message.content)

Response Example

Response format with deep thinking enabled:

{
  "created": 1677652288,
  "model": "glm-4.7",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Artificial intelligence has tremendous application prospects in medical diagnosis...",
        "reasoning_content": "Let me analyze this question from multiple angles. First, I need to consider the technical advantages of AI in medical diagnosis..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "completion_tokens": 239,
    "prompt_tokens": 8,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "total_tokens": 247
  }
}

Best Practices

Recommended scenarios to enable:

Complex problem analysis and solving
Multi-step reasoning tasks
Technical solution design
Strategy planning and decision
Academic research and analysis
Creative writing and content creation

Can be disabled scenarios:

Simple fact query
Basic translation tasks
Simple classification judgment
Quick question and answer requirements

Application scenarios

Academic Research

Research method design
Data analysis and explanation
Theory deduction and proof

Technology Consulting

System architecture design
Technological scheme evaluation
Problem diagnosis and solution

Business Analysis

Market trends analysis
Business model design
Investment decision support

Education Training

Complex concept explanation
Learning path planning
Knowledge system building

Notes

Response time：Enable deep thinking will increase response time, particularly for complex tasks
Token consumption：Thinking process will consume extra tokens, please manage your tokens
Model support：Ensure you’re using models that support deep thinking
Task matching：Choose whether to enable deep thinking according to the task complexity
Streaming output：Combine streaming output to see the thinking process, improving user experience

Get Started

Language Models

Vision Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Audio Models

Capabilities

Tools

Agents

Features

Core Parameters

Code Examples

Response Example

Best Practices

Application scenarios

Academic Research

Technology Consulting

Business Analysis

Education Training

Notes

Get Started

Language Models

Vision Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Audio Models

Capabilities

Tools

Agents

​Features

​Core Parameters

​Code Examples

​Response Example

​Best Practices

​Application scenarios

Academic Research

Technology Consulting

Business Analysis

Education Training

​Notes

Features

Core Parameters

Code Examples

Response Example

Best Practices

Application scenarios

Notes