Documentation Index
Fetch the complete documentation index at: https://docs.sciforium.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The parse endpoint extracts text content from files (PDFs, images, DOCX, text files). Files are sent as base64-encoded payloads and processed in batch.
Supported file types
| MIME type | Description |
|---|
application/pdf | PDF documents |
application/vnd.openxmlformats-officedocument.wordprocessingml.document | DOCX files |
application/msword | Legacy DOC files |
application/json | JSON files |
text/* | Any plain text file |
image/* | Images (OCR extraction) |
Limits
- Max file size: 20 MB per file
- Max files per request: 20
- Max pages (PDF): 50 (configurable via
options.max_pages)
- Request timeout: 110 seconds
1. The Request
Method: POST
Endpoint: https://api.sciforium.com/api/attachments/parse
Content-Type: application/json
Request fields
| Field | Type | Required | Description |
|---|
files | array | Yes | 1..20 file objects |
files[].url | string | Yes | Data URI: data:<mime>;base64,<bytes> |
files[].filename | string | Yes | File name (max 255 chars) |
files[].media_type | string | No | MIME type hint |
options.max_pages | integer | No | 1..50, default 50 |
Example CURL Request (PDF)
curl -X POST "https://api.sciforium.com/api/attachments/parse" \
-H "Authorization: Bearer $TOKEN" \
-H "x-api-key: $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"url": "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9...",
"filename": "invoice.pdf",
"media_type": "application/pdf"
}
],
"options": {
"max_pages": 10
}
}'
Example response — POST /api/attachments/parse
200 OK — Content-Type: application/json
Success (one file completed)
{
"id": "parse_7f3c2a1b-9d8e-4f6c-a5b4-3210fedcba98",
"object": "parse.batch_result",
"results": [
{
"filename": "invoice.pdf",
"status": "completed",
"content": {
"text": "Invoice #10248\nDate: 2026-04-01\nTotal: $128.50"
}
}
],
"metadata": {
"total_files": 1,
"completed": 1,
"failed": 0,
"total_processing_time_ms": 342
}
}
Partial Success (eg. Page Limit)
{
"id": "parse_aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"object": "parse.batch_result",
"results": [
{
"filename": "long-report.pdf",
"status": "partial",
"content": {
"text": "…extracted text for the first N pages…"
}
}
],
"metadata": {
"total_files": 1,
"completed": 1,
"failed": 0,
"total_processing_time_ms": 8900
}
}
Per-file Error
{
"id": "parse_bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
"object": "parse.batch_result",
"results": [
{
"filename": "corrupt.pdf",
"status": "error",
"error": {
"code": "PROCESSING_FAILED",
"message": "Could not read PDF structure"
}
}
],
"metadata": {
"total_files": 1,
"completed": 0,
"failed": 1,
"total_processing_time_ms": 120
}
}
Mixed batch (one OK, one Failed)
{
"id": "parse_cccccccc-dddd-eeee-ffff-000000000000",
"object": "parse.batch_result",
"results": [
{
"filename": "notes.txt",
"status": "completed",
"content": {
"text": "Meeting notes\n- Action items…"
}
},
{
"filename": "unknown.xyz",
"status": "error",
"error": {
"code": "UNSUPPORTED_FORMAT",
"message": "Unsupported file format"
}
}
],
"metadata": {
"total_files": 2,
"completed": 1,
"failed": 1,
"total_processing_time_ms": 210
}
}