Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sciforium.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The parse endpoint extracts text content from files (PDFs, images, DOCX, text files). Files are sent as base64-encoded payloads and processed in batch.

Supported file types

MIME typeDescription
application/pdfPDF documents
application/vnd.openxmlformats-officedocument.wordprocessingml.documentDOCX files
application/mswordLegacy DOC files
application/jsonJSON files
text/*Any plain text file
image/*Images (OCR extraction)

Limits

  • Max file size: 20 MB per file
  • Max files per request: 20
  • Max pages (PDF): 50 (configurable via options.max_pages)
  • Request timeout: 110 seconds

1. The Request

Method: POST Endpoint: https://api.sciforium.com/api/attachments/parse Content-Type: application/json

Request fields

FieldTypeRequiredDescription
filesarrayYes1..20 file objects
files[].urlstringYesData URI: data:<mime>;base64,<bytes>
files[].filenamestringYesFile name (max 255 chars)
files[].media_typestringNoMIME type hint
options.max_pagesintegerNo1..50, default 50

Example CURL Request (PDF)

curl -X POST "https://api.sciforium.com/api/attachments/parse" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-api-key: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9...",
        "filename": "invoice.pdf",
        "media_type": "application/pdf"
      }
    ],
    "options": {
      "max_pages": 10
    }
  }'

Example response — POST /api/attachments/parse

200 OKContent-Type: application/json

Success (one file completed)

{
  "id": "parse_7f3c2a1b-9d8e-4f6c-a5b4-3210fedcba98",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "invoice.pdf",
      "status": "completed",
      "content": {
        "text": "Invoice #10248\nDate: 2026-04-01\nTotal: $128.50"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 342
  }
}

Partial Success (eg. Page Limit)

{
  "id": "parse_aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "long-report.pdf",
      "status": "partial",
      "content": {
        "text": "…extracted text for the first N pages…"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 8900
  }
}

Per-file Error

{
  "id": "parse_bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "corrupt.pdf",
      "status": "error",
      "error": {
        "code": "PROCESSING_FAILED",
        "message": "Could not read PDF structure"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 0,
    "failed": 1,
    "total_processing_time_ms": 120
  }
}

Mixed batch (one OK, one Failed)

{
  "id": "parse_cccccccc-dddd-eeee-ffff-000000000000",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "notes.txt",
      "status": "completed",
      "content": {
        "text": "Meeting notes\n- Action items…"
      }
    },
    {
      "filename": "unknown.xyz",
      "status": "error",
      "error": {
        "code": "UNSUPPORTED_FORMAT",
        "message": "Unsupported file format"
      }
    }
  ],
  "metadata": {
    "total_files": 2,
    "completed": 1,
    "failed": 1,
    "total_processing_time_ms": 210
  }
}