1. The Request
Method: POST Endpoint:https://api.openai.com/v1/audio/speech
Content-Type: application/json
Request Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | The model ID. Options: gpt-4o-mini-tts (steerable), tts-1 (low latency), or tts-1-hd (high quality). |
| input | string | Yes | The text to be turned into audio. (Max 4,096 characters). |
| voice | string | Yes | The voice ID to use. Options include: alloy, echo, fable, onyx, nova, shimmer, coral, ash, sage, marine, cedar. |
| response_format | string | No | Output format. Options: mp3 (default), opus, aac, flac, wav, or pcm. |
| speed | number | No | The speed of the generated audio from 0.25 to 4.0. (Default is 1.0). |
2. The Response
ForPOST /v1/audio/speech, the response is binary audio bytes, not JSON.
Success response
- HTTP status:
200 - Body: non-empty binary data
Content-Type: notapplication/json(typicallyaudio/wavwhenresponse_format=wav)