/v1/audio/transcriptions API turns uploaded audio into text. For typical files it behaves as a single request/response (upload the file, get the transcript back).
1. The Request
Method:POSTEndpoint:
https://api.sciforium.com/v1/audio/transcriptionsContent-Type:
multipart/form-data
Request body parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Audio to transcribe. Common formats: mp3, mp4, m4a, wav, webm. Max 25 MB (enforced server-side). |
model | string | Yes | Model ID your deployment supports (see your model list / console). |
language | string | No | Language hint (ISO-639-1), e.g. en, fr, es. |
prompt | string | No | Optional hint for style, vocabulary, or domain terms. |
response_format | string | No | Output shape, e.g. json, text, verbose_json, srt, vtt (depends on model). |
timestamp_granularities[] | array | No | For some models / formats (e.g. word or segment granularity in verbose output). |
Example request (CURL)
2. The response
Response shape depends onresponse_format and model support.
Common response formats
| response_format | Typical response |
|---|---|
wav | Standard uncompressed WAV audio; best compatibility with tools/players. |
pcm | Raw 16-bit PCM audio bytes (typically 24kHz mono); best for low-latency streaming pipelines. |
opus | Compressed Opus audio; much smaller files with good speech quality. |