This guide explains how to migrate your calls from GLM-4.5 or other earlier models to Z.AI GLM-4.6, our most powerful coding model to date, covering sampling parameter differences, streaming tool calls, and other key points.
GLM-4.6 Features
- Support for larger context and output: Maximum context 200K, maximum output 128K.
- New support for streaming output during tool calling process (
tool_stream=true), real-time retrieval of tool call parameters.
- Same as GLM-4.5 series, supports deep thinking (
thinking={ type: "enabled" }).
- Superior code performance and advanced reasoning capabilities.
Migration Checklist
Start Migration
1. Update Model Identifier
resp = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Briefly describe the advantages of GLM-4.6"}]
)
2. Update Sampling Parameters
temperature: Controls randomness; higher values are more divergent, lower values are more stable.
top_p: Controls nucleus sampling; higher values expand candidate set, lower values converge candidate set.
temperature defaults to 1.0, top_p defaults to 0.95, not recommended to adjust both simultaneously.
# Plan A: Use temperature (recommended)
resp = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Write a more creative brand introduction"}],
temperature=1.0
)
# Plan B: Use top_p
resp = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Generate more stable technical documentation"}],
top_p=0.8
)
3. Deep Thinking (Optional)
- GLM-4.6 continues to support deep thinking capability, enabled by default.
- Recommended to enable for complex reasoning and coding tasks:
resp = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "Design a three-tier microservice architecture for me"}],
thinking={"type": "enabled"}
)
- GLM-4.6 exclusively supports real-time streaming construction and output during tool calling process, disabled by default (
False), requires enabling both:
stream=True: Enable streaming output for responses
tool_stream=True: Enable streaming output for tool call parameters
response = client.chat.completions.create(
model="glm-4.6",
messages=[{"role": "user", "content": "How's the weather in Beijing"}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather conditions for a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City, eg: Beijing, Shanghai"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
],
stream=True,
tool_stream=True,
)
# Initialize streaming collection variables
reasoning_content = ""
content = ""
final_tool_calls = {}
reasoning_started = False
content_started = False
# Process streaming response
for chunk in response:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
# Streaming reasoning process output
if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
if not reasoning_started and delta.reasoning_content.strip():
print("\nπ§ Thinking Process:")
reasoning_started = True
reasoning_content += delta.reasoning_content
print(delta.reasoning_content, end="", flush=True)
# Streaming answer content output
if hasattr(delta, 'content') and delta.content:
if not content_started and delta.content.strip():
print("\n\n㪠Answer Content:")
content_started = True
content += delta.content
print(delta.content, end="", flush=True)
# Streaming tool call information (parameter concatenation)
if delta.tool_calls:
for tool_call in delta.tool_calls:
idx = tool_call.index
if idx not in final_tool_calls:
final_tool_calls[idx] = tool_call
final_tool_calls[idx].function.arguments = tool_call.function.arguments
else:
final_tool_calls[idx].function.arguments += tool_call.function.arguments
# Output final tool call information
if final_tool_calls:
print("\nπ Function Calls Triggered:")
for idx, tool_call in final_tool_calls.items():
print(f" {idx}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}")
See: Tool Streaming Output Documentation
5. Testing and Regression
First verify in development environment that post-migration calls are stable, focus on:
- Whether responses meet expectations, whether thereβs excessive randomness or excessive conservatism in output
- Whether tool streaming construction and output work normally
- Latency and cost in long context and deep thinking scenarios
More Resources