What is Docling Agent for Agentic Document Operations?

Apr 16, 2026

The Problem

I needed to build a document processing pipeline for invoice extraction. The traditional approach looked like this:

PDF Input → Parse with fixed rules → Regex for fields → Template matching
           → Manual edge case handling → Separate tool for each task
           → Fragile, breaks when format changes

Every time a vendor changed their invoice format, my extraction rules broke. Adding new document types meant writing new parsers from scratch. Editing documents required separate tools. Generating reports needed template systems.

The Reddit community confirmed this frustration:

- Hardcoded extraction rules that break on format changes
- Template-based generation that lacks flexibility
- Manual editing workflows with no automation
- Separate tools for each task (extract, edit, generate)
- No unified document representation across operations

Then I found Docling Agent. It promised something different: natural language-driven document operations.

What is Docling Agent?

Docling Agent is an AI-powered framework that enables agentic document operations. Instead of fixed rules, you use natural language prompts to tell an AI agent what to do with documents.

TRADITIONAL:
  "Extract invoice_number using regex pattern [A-Z]{2}-[0-9]{6}"
  → Breaks when vendor changes format to INV-123456

AGENTIC:
  "Extract the invoice number from this document"
  → Agent finds it regardless of format, position, or style

The key innovation: DoclingDocument, a unified format that all operations share.

The DoclingDocument Format

Before understanding the agents, I needed to understand DoclingDocument. It’s the foundation.

┌─────────────────────────────────────────────────────────────┐
│                    DoclingDocument                          │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   Headers   │  │    Text     │  │   Tables    │        │
│  │  (h1-h6)    │  │  (paragraphs│  │  (with cells│        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │   Pictures  │  │    Lists    │  │   Footnotes │        │
│  │  (images)   │  │ (ordered/   │  │             │        │
│  │             │  │  unordered) │  │             │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
│                                                             │
│  Serialization: JSON | Export: Markdown, HTML, JSON        │
└─────────────────────────────────────────────────────────────┘

This unified format means:

+ Preserves hierarchy and structure
+ All operations work on same representation
+ Can save to multiple formats from one source
+ Agent operations don't lose document context

The basic usage:

from docling_core.types.doc import DoclingDocument

# Load existing document
doc = DoclingDocument.load_from_json("document.json")

# Iterate through elements
for element, level in doc.iterate_items():
    # element can be: text, table, picture, header, list
    print(f"Level {level}: {element}")

# Export to multiple formats
doc.save_as_html("output.html")
doc.save_as_markdown("output.md")
doc.save_as_json("output.json")

Four Agent Types

Docling Agent provides four specialized agents. Each handles a different document operation.

1. DoclingWritingAgent

Creates new documents from natural language prompts.

Natural Language Prompt → Agent interprets intent → Generates DoclingDocument
                         → Structured content with hierarchy → Export to any format

Example use:

from docling_agent import DoclingWritingAgent

agent = DoclingWritingAgent(model_id="granite-7b")

result = agent.run(
    prompt="Create a quarterly sales report with sections for revenue, costs, and projections"
)

# Export the generated document
result.document.save_as_markdown("report.md")

When to use: Generate reports from scratch, create documentation templates, produce structured content from outlines.

2. DoclingEditingAgent

Applies targeted modifications to existing documents.

Existing DoclingDocument + Natural Language Task → Agent identifies targets
                                                  → Applies modifications
                                                  → Returns modified document

Example:

from docling_agent import DoclingEditingAgent
from docling_core.types.doc import DoclingDocument

doc = DoclingDocument.load_from_json("report.json")

agent = DoclingEditingAgent()
result = agent.run(
    document=doc,
    task="Add a summary section at the beginning and improve table formatting"
)

result.document.save_as_markdown("report_improved.md")

When to use: Refine table structures, add missing sections, fix formatting inconsistencies.

3. DoclingExtractingAgent

Extracts structured data using schema definitions.

PDF/Image → Convert to DoclingDocument → Define schema with field types
          → Agent extracts matching fields → Returns typed data

Example:

from docling_agent import DoclingExtractingAgent
from docling.document_converter import DocumentConverter

# Convert PDF first
converter = DocumentConverter()
doc = converter.convert("invoice.pdf").document

# Define what to extract
schema = {
    "invoice_number": "string",
    "vendor": "string",
    "total": "number",
    "date": "date",
    "line_items": "array"
}

# Extract
agent = DoclingExtractingAgent()
extracted = agent.run(document=doc, schema=schema)

print(f"Invoice #{extracted.invoice_number}")
print(f"Total: {extracted.total}")

When to use: Invoice data extraction, resume parsing, form field extraction, research paper metadata.

4. DoclingEnrichingAgent

Adds metadata and annotations to documents.

DoclingDocument → Agent analyzes content → Adds summaries, keywords, entities
                → Returns enriched document → Search-ready, classified

Example:

from docling_agent import DoclingEnrichingAgent

agent = DoclingEnrichingAgent()
result = agent.run(
    document=research_paper_doc,
    tasks=["summarize", "extract_keywords", "identify_entities", "classify_sections"]
)

# Now the document has:
# - Summary in metadata
# - Search keywords attached
# - Key entities identified
# - Sections classified by type

When to use: Add document summaries, generate search keywords, identify key entities, classify content.

Why Local Execution Matters

One of the key advantages: runs completely locally.

CLOUD PROCESSING:
  Document → Upload to API → Processed on remote servers → Download result
           → Data leaves your infrastructure → Privacy concerns → API costs

LOCAL PROCESSING:
  Document → Process on your machine → Result stays local
           → Data never leaves → Privacy preserved → No API costs

This matters for:

+ Privacy-sensitive documents (contracts, medical records)
+ Air-gapped environments (military, healthcare)
+ Compliance with data regulations (GDPR, HIPAA)
+ No per-document API costs
+ Faster batch processing (no network latency)

Local setup:

from docling.datamodel.pipeline_options import PdfPipelineOptions

# Point to locally downloaded models
pipeline_options = PdfPipelineOptions(
    artifacts_path="/local/path/to/models"
)

# No network calls needed

Model support: OpenAI GPT variants, IBM Granite, model-agnostic via Mellea integration.

Combining Agents in a Pipeline

The real power comes from chaining agents.

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  PDF Invoice │────▶│   Convert    │────▶│ DoclingDoc   │
└──────────────┘     └──────────────┘     └──────────────┘
                                                │
                                                ▼
                         ┌──────────────┐     ┌──────────────┐
                         │   Extract    │────▶│ Structured   │
                         │    Agent     │     │    Data      │
                         └──────────────┘     └──────────────┘
                                                │
                                                ▼
                         ┌──────────────┐     ┌──────────────┐
                         │   Enrich     │────▶│ Searchable   │
                         │    Agent     │     │   Invoice    │
                         └──────────────┘     └──────────────┘

Complete workflow:

from docling.document_converter import DocumentConverter
from docling_agent import DoclingExtractingAgent, DoclingEnrichingAgent

# Step 1: Convert PDF
converter = DocumentConverter()
doc = converter.convert("invoice.pdf").document

# Step 2: Extract data
schema = {"invoice_number": "string", "total": "number", "vendor": "string"}
extract_agent = DoclingExtractingAgent()
data = extract_agent.run(document=doc, schema=schema)

# Step 3: Enrich for search
enrich_agent = DoclingEnrichingAgent()
enriched_doc = enrich_agent.run(
    document=doc,
    tasks=["summarize", "extract_keywords"]
)

# Save both outputs
enriched_doc.save_as_json("invoice_enriched.json")
# data contains extracted fields

Current Status and Limitations

Important caveat: the project is still early-stage.

Status: "still immature and work-in-progress"
Availability: Public repository on GitHub
Direction: Docling moving beyond conversion to full document operations

What this means:

+ Core functionality works (convert, basic extraction)
- API examples may change
- Documentation still evolving
- Chunkless RAG mentioned but not documented yet
- Some conceptual examples based on documented patterns

The Reddit discussion captured this:

“Still early stage but the direction is clear, Docling is moving beyond conversion.”

Summary

Docling Agent represents a shift from rigid document processing to natural language-driven operations:

1. DoclingDocument: Unified format for all operations
2. Four agents: Write, Edit, Extract, Enrich
3. Natural language: Define tasks in prompts, not code
4. Local execution: Privacy-preserving, no API costs
5. Composable: Chain agents for complex pipelines

The traditional approach required separate tools for each task, hardcoded rules that broke on format changes, and no unified document representation. Docling Agent solves this with natural language prompts and a single document format that flows through all operations.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

What is Docling Agent for Agentic Document Operations?

The Problem

What is Docling Agent?

The DoclingDocument Format

Four Agent Types

1. DoclingWritingAgent

2. DoclingEditingAgent

3. DoclingExtractingAgent

4. DoclingEnrichingAgent

Why Local Execution Matters

Combining Agents in a Pipeline

Current Status and Limitations

Summary

Final Words + More Resources

Comments