Skip to main content
Method: POST
Endpoint: https://api.sciforium.com/v1/chat/completions

Overview

The /v1/chat/completions endpoint generates conversational responses from a sequence of messages and generation parameters.

Request Body Parameters

The API expects a JSON body. model and messages are required.
ParameterTypeRequiredDescription
modelstringYesThe model ID to use (e.g., deepseek-r1-distill-llama-8b, qwen-2.5-7b, gpt-oss-120b).
messagesarrayYesArray of message objects with role and content.
temperaturenumberNoSampling temperature (0 to 2).
top_pnumberNoNucleus sampling probability (0 to 1).
max_tokensnumberNoMaximum output tokens.
max_completion_tokensnumberNoAlternative max-output-token field (compatibility).
presence_penaltynumberNoPresence penalty (-2 to 2).
frequency_penaltynumberNoFrequency penalty (-2 to 2).
seedintegerNoDeterministic sampling seed.
stopstring or string[]NoStop sequence(s).
nintegerNoNumber of completions to generate.
streambooleanNoStream tokens as SSE when true.
stream_optionsobjectNoStreaming options, e.g. { "include_usage": true }.

Example response

The API returns a JSON object containing the completion and usage metadata.
  • id: Unique identifier for the completion.
  • object: Object type (chat.completion).
  • model: Model used for generation.
  • choices: Array of completion objects, each with:
    • index: Choice index.
    • message: Assistant message with role and content.
    • finish_reason: Reason for completion (stop, length, etc.).
  • usage: Token usage statistics including prompt_tokens, completion_tokens, total_tokens, and gpu_seconds.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "deepseek-r1-distill-llama-8b",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "The capital of France is Paris." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33,
    "gpu_seconds": 0.041
  }
}