Skip to main content
The /v1/audio/transcriptions API turns uploaded audio into text. For typical files it behaves as a single request/response (upload the file, get the transcript back).

1. The Request

Method: POST
Endpoint: https://api.sciforium.com/v1/audio/transcriptions
Content-Type: multipart/form-data

Request body parameters

ParameterTypeRequiredDescription
filefileYesAudio to transcribe. Common formats: mp3, mp4, m4a, wav, webm. Max 25 MB (enforced server-side).
modelstringYesModel ID your deployment supports (see your model list / console).
languagestringNoLanguage hint (ISO-639-1), e.g. en, fr, es.
promptstringNoOptional hint for style, vocabulary, or domain terms.
response_formatstringNoOutput shape, e.g. json, text, verbose_json, srt, vtt (depends on model).
timestamp_granularities[]arrayNoFor some models / formats (e.g. word or segment granularity in verbose output).

Example request (CURL)

curl -X POST "https://api.sciforium.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-api-key: $TOKEN" \
  -F "file=@interview_audio.mp3" \
  -F "model=YOUR_MODEL_ID" \
  -F "response_format=json"

2. The response

Response shape depends on response_format and model support.

Common response formats

response_formatTypical response
wavStandard uncompressed WAV audio; best compatibility with tools/players.
pcmRaw 16-bit PCM audio bytes (typically 24kHz mono); best for low-latency streaming pipelines.
opusCompressed Opus audio; much smaller files with good speech quality.

Example response

response_format=wav

{
  "format": "wav",
  "content_type": "audio/wav",
  "audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}

response_format=pcm

{
  "format": "pcm",
  "content_type": "audio/octet-stream",
  "audio_base64": "kP8A/wD+AP0A/AD7APoA+QD4..."
}

response_format=opus

{
  "format": "opus",
  "content_type": "audio/ogg",
  "audio_base64": "T2dnUwACAAAAAAAAAADY8kQeAAAAA..."
}