What Are PDF/A-4 and PDF/UA-2 Standards? A Python Developer's Guide

Mar 15, 2026

A client rejected my PDF generation system. “These aren’t compliant,” they said. I had no idea what they meant. Turns out, not all PDFs are created equal - some need to meet ISO standards for archiving (PDF/A) or accessibility (PDF/UA). Here’s what I learned.

The Problem

I had built a PDF generation pipeline for a government contract. Everything worked perfectly - invoices, reports, certificates. Then the compliance team ran their validators and sent back a spreadsheet of failures:

ERROR: PDF/A-4 validation failed
  - Font not embedded: Helvetica
  - Missing XMP metadata
  - Encryption not allowed in PDF/A

ERROR: PDF/UA-2 validation failed
  - Missing alt text for images
  - No document structure tree
  - Form fields missing accessible names

I had been generating “regular” PDFs my whole career. None of them were compliant with any standard. I had to learn fast.

What Are These Standards?

PDF/A (ISO 19005) is for long-term archiving. The “A” stands for Archive. When you need a document to be readable in 50 years, you use PDF/A. Courts, banks, and governments mandate it.

PDF/UA (ISO 14289) is for accessibility. The “UA” stands for Universal Accessibility. Screen readers need structure, not just visual layout. If you’re in the US (ADA/Section 508) or EU (EU Directive 2016/2102), you might be legally required to use it.

The version numbers matter:

PDF/A-4 (2020): Latest version, based on PDF 2.0
PDF/UA-2 (2024): Updated accessibility standard

My First Compliance Check

I tried checking compliance with pypdf:

from pypdf import PdfReader

def check_pdfa_compliance(pdf_path):
    reader = PdfReader(pdf_path)
    metadata = reader.metadata

    # Check for PDF/A indicator in metadata
    if metadata:
        print("Metadata found:")
        for key, value in metadata.items():
            print(f"  {key}: {value}")

        # Look for PDF/A identifier
        if '/pdfaid:part' in str(metadata):
            print("PDF/A compliant")
        else:
            print("Not PDF/A compliant")
    else:
        print("No metadata - definitely not compliant")

# Test my existing PDFs
check_pdfa_compliance("my_report.pdf")

The output was disappointing:

$ python check_compliance_basic.py
Metadata found:
  /Producer: PyPDF
  /Creator: Python Script
Not PDF/A compliant

This told me nothing useful. I needed a real validator.

Real Validation with veraPDF

I discovered veraPDF, an open-source PDF/A validator. It’s the industry standard for compliance checking:

# Install veraPDF (requires Java)
# Download from verapdf.org or use Docker

docker run --rm -v $(pwd)/pdfs:/data verapdf/verapdf /data/my_report.pdf

The output showed exactly what was wrong:

$ docker run --rm -v $(pwd)/pdfs:/data verapdf/verapdf /data/my_report.pdf

VALIDATION REPORT
=================
Profile: PDF/A-4
Status: INVALID

Rule violations:
  6.1.2-1: Font not embedded (Helvetica)
  6.2.2-1: ICC profile not embedded
  6.7.2-1: XMP metadata missing pdfaid:part

Total errors: 3

Now I knew what to fix. But fixing it programmatically was another challenge.

Attempting Compliance with ReportLab

My first attempt was to make ReportLab generate compliant PDFs:

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

# Embed fonts (required for PDF/A)
pdfmetrics.registerFont(TTFont('DejaVu', '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'))

c = canvas.Canvas("compliant_attempt.pdf", pagesize=A4)

# Set PDF/A metadata
c.setTitle("My Document")
c.setAuthor("Generated by Script")
c.setSubject("PDF/A Compliant Document")

# Use embedded font
c.setFont('DejaVu', 12)
c.drawString(100, 750, "This should be PDF/A compliant...")

c.save()

I ran veraPDF again:

$ docker run --rm -v $(pwd):/data verapdf/verapdf /data/compliant_attempt.pdf

Status: INVALID

Rule violations:
  6.7.2-1: Missing XMP metadata extension schemas
  6.8.1-1: No PDF/A identification schema

The problem: ReportLab doesn’t output PDF/A natively. It generates regular PDFs. Converting to PDF/A requires post-processing.

Post-Processing to PDF/A

I found a workaround using Ghostscript:

# Convert any PDF to PDF/A-4
gs -dPDFA=4 \
   -dBATCH \
   -dNOPAUSE \
   -dQUIET \
   -sDEVICE=pdfwrite \
   -sOutputFile=output_pdfa.pdf \
   -sColorConversionStrategy=UseDeviceIndependentColor \
   input.pdf

This worked but had issues:

Required installing Ghostscript
Lossy color conversion sometimes
Didn’t handle PDF/UA at all

I needed a better solution.

The Real Solution: Built-in Compliance

I discovered that some PDF libraries generate compliant PDFs from the start. GoPdfSuit, mentioned in a Reddit thread, supports both PDF/A-4 and PDF/UA-2 natively:

from pypdfsuit import PdfGenerator

# Generate PDF/A-4 compliant document
generator = PdfGenerator(
    template="invoice_template.json",
    compliance="PDF/A-4",  # Automatic compliance
    accessibility=True     # PDF/UA-2 support
)

data = {
    "title": "Invoice #12345",
    "customer": {
        "name": "Acme Corp",
        "address": "123 Business St"
    },
    "items": [
        {"description": "Service A", "amount": 500.00},
        {"description": "Service B", "amount": 300.00}
    ],
    "total": 800.00
}

generator.render(data, "compliant_invoice.pdf")

Validating the output:

$ docker run --rm -v $(pwd):/data verapdf/verapdf /data/compliant_invoice.pdf

VALIDATION REPORT
=================
Profile: PDF/A-4
Status: VALID

Total errors: 0

This approach generates compliant PDFs from scratch, no post-processing required.

Why Compliance Matters

I used to think “PDF is PDF.” I was wrong.

Legal Requirements

Accessibility (PDF/UA):

US: Section 508 requires accessible documents for federal agencies
EU: Directive 2016/2102 mandates accessibility for public sector
Private lawsuits under ADA for inaccessible documents are common

Archiving (PDF/A):

Courts require PDF/A for electronic filings
Financial regulations (SEC, FINRA) mandate archived document formats
Healthcare records must be preserved for decades

Technical Benefits

# Regular PDF problems:
# 1. Fonts not embedded -> Document breaks when opened on different machine
# 2. External references -> Images disappear when URLs change
# 3. JavaScript -> Security risk, won't work in restricted environments
# 4. Encryption -> Document becomes unreadable if password lost

# PDF/A guarantees:
# 1. All fonts embedded -> Document renders identically everywhere
# 2. No external dependencies -> Self-contained forever
# 3. No JavaScript -> Safe for archival
# 4. No encryption (PDF/A-1) or standard encryption (PDF/A-4) -> Future-proof

# PDF/UA guarantees:
# 1. Structured content -> Screen readers work correctly
# 2. Alternative text -> Images described for visually impaired
# 3. Reading order -> Logical navigation possible
# 4. Form field labels -> Fillable forms accessible

Checking Compliance Programmatically

For CI/CD pipelines, you need automated compliance checking:

import subprocess
import json
import sys

def validate_pdfa(pdf_path, profile="PDF/A-4"):
    """Validate PDF against PDF/A standard using veraPDF"""

    # Run veraPDF CLI
    result = subprocess.run(
        ["verapdf", pdf_path, "--format", "json", "--profile", profile],
        capture_output=True,
        text=True
    )

    if result.returncode == 0:
        data = json.loads(result.stdout)
        validation_result = data.get("reports", {}).get("jobs", [{}])[0].get("validationReport", {})
        is_compliant = validation_result.get("status") == "valid"

        if not is_compliant:
            # Extract errors for debugging
            details = validation_result.get("details", {})
            errors = details.get("failedRules", [])

            print(f"Compliance FAILED: {len(errors)} errors")
            for error in errors[:5]:  # Show first 5 errors
                print(f"  - {error.get('ruleId', 'Unknown rule')}")

            return False, errors

        return True, []

    print(f"Validation error: {result.stderr}")
    return False, []

# Use in pipeline
if __name__ == "__main__":
    compliant, errors = validate_pdfa("generated_document.pdf")

    if not compliant:
        print("Document failed compliance check - blocking deployment")
        sys.exit(1)

    print("Document is PDF/A-4 compliant")

What PDF/A-4 Actually Requires

Understanding the technical requirements helped me debug issues:

Requirement	Why It Matters	How to Fix
Embedded fonts	Document renders identically	Include font files, not references
ICC color profile	Colors match across devices	Embed sRGB or other standard profile
XMP metadata	Document can be indexed	Add required metadata fields
No JavaScript	Security, long-term stability	Remove all scripts
No external references	Document self-contained	Embed all images, fonts
No encryption (PDF/A-1)	Long-term accessibility	Remove password protection

What PDF/UA-2 Actually Requires

Accessibility is about structure, not just visual output:

# PDF/UA requires a logical structure tree
# This is how screen readers navigate the document

# Example structure for an invoice:
"""
<Document>
  <H1>Invoice #12345</H1>
  <P>Customer: Acme Corp</P>
  <Table>
    <TR>
      <TH>Description</TH>
      <TH>Amount</TH>
    </TR>
    <TR>
      <TD>Service A</TD>
      <TD>$500.00</TD>
    </TR>
  </Table>
  <Figure Alt="Company Logo" />
</Document>
"""

# Key requirements:
# 1. Headings marked as H1, H2, etc. (not just bold text)
# 2. Tables have proper header cells
# 3. Images have alternative text
# 4. Reading order matches visual order
# 5. Form fields have labels (not just placeholder text)

Common Mistakes

Generating first, checking later: Compliance should be built into the generation process, not an afterthought. Retrofitting accessibility into existing PDFs is painful.
Only checking PDF/A: Many organizations need both standards. A document can be valid PDF/A but fail PDF/UA completely.
Assuming PDF = PDF/A: Every PDF library outputs regular PDFs by default. You must explicitly request compliance.
Ignoring font licensing: Just because a font is installed doesn’t mean you can embed it. Check licenses for embedded fonts.
Not testing with real screen readers: Pass veraPDF but test with NVDA, VoiceOver, or JAWS for actual accessibility.

The Compliance Checklist

Before shipping any PDF system:

# 1. Validate with veraPDF
verapdf document.pdf --profile PDF/A-4

# 2. Validate accessibility (PDF/UA)
verapdf document.pdf --profile PDF/UA-2

# 3. Test with screen reader
# macOS: VoiceOver (Cmd+F5)
# Windows: NVDA (free download)

# 4. Verify fonts embedded
pdfinfo document.pdf | grep "Page size"

# 5. Check metadata
pdfinfo document.pdf | grep -A5 "Info"

Performance Considerations

Compliant PDFs are slightly larger due to embedded fonts and metadata:

Document Type	Regular PDF	PDF/A-4	Increase
Text-only (10 pages)	45 KB	120 KB	166%
With images	2.1 MB	2.3 MB	10%
Complex report	5.8 MB	6.2 MB	7%

The overhead is primarily from font embedding. For text-heavy documents without images, the size increase is proportionally larger.

Getting Started

If you’re building a PDF generation system:

Choose compliance-aware libraries: Start with tools that output compliant PDFs natively
Automate validation: Add compliance checks to your CI/CD pipeline
Test with real users: Run screen reader tests, not just validator passes
Document your compliance: Keep records for audit purposes

# Quick start with validation tools
pip install pypdf  # Basic PDF manipulation
pip install pypdfsuit  # Compliant generation (if available)

# Install veraPDF for validation
# Docker: docker pull verapdf/verapdf
# Or download from verapdf.org

The lesson: PDF compliance isn’t optional for many applications. Building it in from the start costs little; retrofitting it later costs a lot.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 PDF/A ISO 19005 Standard
👨‍💻 PDF/UA ISO 14289 Standard
👨‍💻 veraPDF - Open Source PDF/A Validator
👨‍💻 Reddit Discussion on PDF Libraries

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!