POSTEndpoint:
https://api.sciforium.com/v1/chat/completions
Overview
The/v1/chat/completions endpoint generates conversational responses from a sequence of messages and generation parameters.
Request Body Parameters
The API expects a JSON body.model and messages are required.
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model ID to use (e.g., deepseek-r1-distill-llama-8b, qwen-2.5-7b, gpt-oss-120b). |
messages | array | Yes | Array of message objects with role and content. |
temperature | number | No | Sampling temperature (0 to 2). |
top_p | number | No | Nucleus sampling probability (0 to 1). |
max_tokens | number | No | Maximum output tokens. |
max_completion_tokens | number | No | Alternative max-output-token field (compatibility). |
presence_penalty | number | No | Presence penalty (-2 to 2). |
frequency_penalty | number | No | Frequency penalty (-2 to 2). |
seed | integer | No | Deterministic sampling seed. |
stop | string or string[] | No | Stop sequence(s). |
n | integer | No | Number of completions to generate. |
stream | boolean | No | Stream tokens as SSE when true. |
stream_options | object | No | Streaming options, e.g. { "include_usage": true }. |
Example response
The API returns a JSON object containing the completion and usage metadata.id: Unique identifier for the completion.object: Object type (chat.completion).model: Model used for generation.choices: Array of completion objects, each with:index: Choice index.message: Assistant message withroleandcontent.finish_reason: Reason for completion (stop,length, etc.).
usage: Token usage statistics includingprompt_tokens,completion_tokens,total_tokens, andgpu_seconds.