Skip to main content
The v1/audio/speech endpoint is designed to convert written text into lifelike spoken audio.

1. The Request

Method: POST Endpoint: https://api.openai.com/v1/audio/speech Content-Type: application/json

Request Body Parameters

ParameterTypeRequiredDescription
modelstringYesThe model ID. Options: gpt-4o-mini-tts (steerable), tts-1 (low latency), or tts-1-hd (high quality).
inputstringYesThe text to be turned into audio. (Max 4,096 characters).
voicestringYesThe voice ID to use. Options include: alloy, echo, fable, onyx, nova, shimmer, coral, ash, sage, marine, cedar.
response_formatstringNoOutput format. Options: mp3 (default), opus, aac, flac, wav, or pcm.
speednumberNoThe speed of the generated audio from 0.25 to 4.0. (Default is 1.0).
Example Request (cURL):
curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    "input": "I have a very important secret to tell you... but you must promise not to tell anyone.",
    "voice": "shimmer",
    "instructions": "Whisper in a mysterious and slightly urgent tone.",
    "speed": 0.9
  }' \
  --output secret_message.mp3

2. The Response

For POST /v1/audio/speech, the response is binary audio bytes, not JSON.

Success response

  • HTTP status: 200
  • Body: non-empty binary data
  • Content-Type: not application/json (typically audio/wav when response_format=wav)

Example (HTTP-style)

HTTP/1.1 200 OK
Content-Type: audio/wav
Content-Length: 124830
[binary audio bytes...]

Example CURL to save response

curl -X POST "https://api.sciforium.com/v1/audio/speech" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-api-key: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    "input": "The quick brown fox jumps over the lazy dog.",
    "voice": "Vivian",
    "response_format": "wav"
  }' \
  --output speech.wav