Attachment Parser API

Overview

The parse endpoint extracts text content from files (PDFs, images, DOCX, text files). Files are sent as base64-encoded payloads and processed in batch.

Supported file types

MIME type	Description
`application/pdf`	PDF documents
`application/vnd.openxmlformats-officedocument.wordprocessingml.document`	DOCX files
`application/msword`	Legacy DOC files
`application/json`	JSON files
`text/*`	Any plain text file
`image/*`	Images (OCR extraction)

Limits

Max file size: 20 MB per file
Max files per request: 20
Max pages (PDF): 50 (configurable via options.max_pages)
Request timeout: 110 seconds

1. The Request

Method: POST Endpoint: https://api.sciforium.com/api/attachments/parse Content-Type: application/json

Request fields

Field	Type	Required	Description
`files`	array	Yes	`1..20` file objects
`files[].url`	string	Yes	Data URI: `data:<mime>;base64,<bytes>`
`files[].filename`	string	Yes	File name (max 255 chars)
`files[].media_type`	string	No	MIME type hint
`options.max_pages`	integer	No	`1..50`, default `50`

Example CURL Request (PDF)

curl -X POST "https://api.sciforium.com/api/attachments/parse" \
  -H "Authorization: Bearer $TOKEN" \
  -H "x-api-key: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9...",
        "filename": "invoice.pdf",
        "media_type": "application/pdf"
      }
    ],
    "options": {
      "max_pages": 10
    }
  }'

Example response — `POST /api/attachments/parse`

200 OK — Content-Type: application/json

Success (one file completed)

{
  "id": "parse_7f3c2a1b-9d8e-4f6c-a5b4-3210fedcba98",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "invoice.pdf",
      "status": "completed",
      "content": {
        "text": "Invoice #10248\nDate: 2026-04-01\nTotal: $128.50"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 342
  }
}

Partial Success (eg. Page Limit)

{
  "id": "parse_aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "long-report.pdf",
      "status": "partial",
      "content": {
        "text": "…extracted text for the first N pages…"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 1,
    "failed": 0,
    "total_processing_time_ms": 8900
  }
}

Per-file Error

{
  "id": "parse_bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "corrupt.pdf",
      "status": "error",
      "error": {
        "code": "PROCESSING_FAILED",
        "message": "Could not read PDF structure"
      }
    }
  ],
  "metadata": {
    "total_files": 1,
    "completed": 0,
    "failed": 1,
    "total_processing_time_ms": 120
  }
}

Mixed batch (one OK, one Failed)

{
  "id": "parse_cccccccc-dddd-eeee-ffff-000000000000",
  "object": "parse.batch_result",
  "results": [
    {
      "filename": "notes.txt",
      "status": "completed",
      "content": {
        "text": "Meeting notes\n- Action items…"
      }
    },
    {
      "filename": "unknown.xyz",
      "status": "error",
      "error": {
        "code": "UNSUPPORTED_FORMAT",
        "message": "Unsupported file format"
      }
    }
  ],
  "metadata": {
    "total_files": 2,
    "completed": 1,
    "failed": 1,
    "total_processing_time_ms": 210
  }
}

​Overview

​Supported file types

​Limits

​1. The Request

​Request fields

​Example CURL Request (PDF)

​Example response — POST /api/attachments/parse

​Success (one file completed)

​Partial Success (eg. Page Limit)

​Per-file Error

​Mixed batch (one OK, one Failed)