GLM-5 Features
- Support for larger context and output: Maximum context 200K, maximum output 128K.
- New support for streaming output during tool calling process (
tool_stream=true), real-time retrieval of tool call parameters. - Same as GLM-4.5 series, supports deep thinking (
thinking={ type: "enabled" }), when enabled will think compulsorily. - Superior code performance and advanced reasoning capabilities.
Migration Checklist
- Update model identifier to
glm-5 - Sampling parameters:
temperaturedefault value1.0,top_pdefault value0.95, recommend choosing only one for tuning - Deep thinking: Enabled or disable
thinking={ type: "enabled" }as needed for complex reasoning/coding - Streaming response: Enable
stream=trueand properly handledelta.reasoning_contentanddelta.content - Streaming tool calls: Enable
stream=trueandtool_stream=trueand stream-concatenatedelta.tool_calls[*].function.arguments - Maximum output and context: Set
max_tokensappropriately (GLM-5 maximum output 128K, context 200K) - Prompt optimization: Work with deep thinking, use clearer instructions and constraints
- Development environment verification: Conduct use case testing and regression, focus on randomness, latency, parameter completeness in tool streams
Start Migration
1. Update Model Identifier
- Update
modeltoglm-5.
2. Update Sampling Parameters
temperature: Controls randomness; higher values are more divergent, lower values are more stable.top_p: Controls nucleus sampling; higher values expand candidate set, lower values converge candidate set.temperaturedefaults to1.0,top_pdefaults to0.95, not recommended to adjust both simultaneously.
3. Deep Thinking (Optional)
- GLM-4.7 continues to support deep thinking capability, enabled by default.
- Recommended to enable for complex reasoning and coding tasks:
4. Streaming Output and Tool Calls (Optional)
- GLM-5 supports real-time streaming construction and output during tool calling process, disabled by default (
False), requires enabling both:stream=True: Enable streaming output for responsestool_stream=True: Enable streaming output for tool call parameters
5. Testing and Regression
First verify in development environment that post-migration calls are stable, focus on:
- Whether responses meet expectations, whether there’s excessive randomness or excessive conservatism in output
- Whether tool streaming construction and output work normally
- Latency and cost in long context and deep thinking scenarios
More Resources
Concept Parameters
Common model parameter concepts and sampling recommendations
Tool Streaming Output
View tool streaming output usage details
API Reference
View complete API documentation
Technical Support
Get technical support and help