Skip to main content

Overview

The parse endpoint extracts text content from files (PDFs, images, DOCX, text files). Files are sent as base64-encoded payloads and processed in batch.

Supported file types

MIME typeDescription
application/pdfPDF documents
application/vnd.openxmlformats-officedocument.wordprocessingml.documentDOCX files
application/mswordLegacy DOC files
application/jsonJSON files
text/*Any plain text file
image/*Images (OCR extraction)

Limits

  • Max file size: 20 MB per file
  • Max files per request: 20
  • Max pages (PDF): 50 (configurable via options.max_pages)
  • Request timeout: 110 seconds

1. The Request

Method: POST Endpoint: https://api.sciforium.com/api/attachments/parse Content-Type: application/json

Request fields

FieldTypeRequiredDescription
filesarrayYes1..20 file objects
files[].urlstringYesData URI: data:<mime>;base64,<bytes>
files[].filenamestringYesFile name (max 255 chars)
files[].media_typestringNoMIME type hint
options.max_pagesintegerNo1..50, default 50

Example CURL Request (PDF)

curl -X POST "https://api.sciforium.com/api/attachments/parse" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-api-key: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9...",
        "filename": "invoice.pdf",
        "media_type": "application/pdf"
      }
    ],
    "options": {
      "max_pages": 10
    }
  }'

Example response — POST /api/attachments/parse

200 OKContent-Type: application/json

Success (one file completed)

{
  "id": "parse_7f3c2a1b-9d8e-4f6c-a5b4-3210fedcba98",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "invoice.pdf",
      "status": "completed",
      "content": {
        "text": "Invoice #10248\nDate: 2026-04-01\nTotal: $128.50"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 342
  }
}

Partial Success (eg. Page Limit)

{
  "id": "parse_aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "long-report.pdf",
      "status": "partial",
      "content": {
        "text": "…extracted text for the first N pages…"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 8900
  }
}

Per-file Error

{
  "id": "parse_bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "corrupt.pdf",
      "status": "error",
      "error": {
        "code": "PROCESSING_FAILED",
        "message": "Could not read PDF structure"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 0,
    "failed": 1,
    "total_processing_time_ms": 120
  }
}

Mixed batch (one OK, one Failed)

{
  "id": "parse_cccccccc-dddd-eeee-ffff-000000000000",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "notes.txt",
      "status": "completed",
      "content": {
        "text": "Meeting notes\n- Action items…"
      }
    },
    {
      "filename": "unknown.xyz",
      "status": "error",
      "error": {
        "code": "UNSUPPORTED_FORMAT",
        "message": "Unsupported file format"
      }
    }
  ],
  "metadata": {
    "total_files": 2,
    "completed": 1,
    "failed": 1,
    "total_processing_time_ms": 210
  }
}