Skip to main content
POST
/
paas
/
v4
/
audio
/
transcriptions
Speech to Text
curl --request POST \
  --url https://api.z.ai/api/paas/v4/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=glm-asr-2512 \
  --form stream=false \
  --form file='@example-file'
{
  "id": "<string>",
  "created": 123,
  "request_id": "<string>",
  "model": "<string>",
  "text": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.z.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Use the following format for authentication: Bearer

Body

multipart/form-data
file
file
required

The audio file to be transcribed. Supported audio file formats: .wav / .mp3. Specifications: file size ≤ 25 MB, audio duration ≤ 30 seconds.

model
enum<string>
default:glm-asr-2512
required

The model ID to invoke.

Available options:
glm-asr-2512
file_base64
string

Base64 encoded audio file. Only one of file_base64 or file needs to be provided (if both are provided, file takes precedence).

prompt
string

In long text scenarios, you can provide previous transcription results as context. Recommended to be less than 8000 characters.

hotwords
string[]

Hotword list to improve recognition accuracy for domain-specific vocabulary. Format example: ["person_name","place_name"]. Recommended not to exceed 100 items.

Maximum array length: 100
stream
boolean
default:false

This parameter should be set to false or omitted when using synchronous calls. It indicates that the model returns all content at once after generating all content. Default is false. If set to true, the model will return generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.

request_id
string

Passed by the user side, needs to be unique; used to distinguish each request, 6–64 characters. If not provided by the user side, the platform will generate one by default.

Required string length: 6 - 64
user_id
string

Unique ID for the end user, 6–128 characters. Avoid using sensitive information.

Required string length: 6 - 128

Response

Request processed successfully

id
string

Task ID

created
integer<int64>

Request creation time, as a Unix timestamp in seconds.

request_id
string

Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.

model
string

Model name

text
string

The complete transcribed content of the audio.