Skip to content

What Are PDF/A-4 and PDF/UA-2 Standards? A Python Developer's Guide

A client rejected my PDF generation system. “These aren’t compliant,” they said. I had no idea what they meant. Turns out, not all PDFs are created equal - some need to meet ISO standards for archiving (PDF/A) or accessibility (PDF/UA). Here’s what I learned.

The Problem

I had built a PDF generation pipeline for a government contract. Everything worked perfectly - invoices, reports, certificates. Then the compliance team ran their validators and sent back a spreadsheet of failures:

ERROR: PDF/A-4 validation failed
- Font not embedded: Helvetica
- Missing XMP metadata
- Encryption not allowed in PDF/A
ERROR: PDF/UA-2 validation failed
- Missing alt text for images
- No document structure tree
- Form fields missing accessible names

I had been generating “regular” PDFs my whole career. None of them were compliant with any standard. I had to learn fast.

What Are These Standards?

PDF/A (ISO 19005) is for long-term archiving. The “A” stands for Archive. When you need a document to be readable in 50 years, you use PDF/A. Courts, banks, and governments mandate it.

PDF/UA (ISO 14289) is for accessibility. The “UA” stands for Universal Accessibility. Screen readers need structure, not just visual layout. If you’re in the US (ADA/Section 508) or EU (EU Directive 2016/2102), you might be legally required to use it.

The version numbers matter:

  • PDF/A-4 (2020): Latest version, based on PDF 2.0
  • PDF/UA-2 (2024): Updated accessibility standard

My First Compliance Check

I tried checking compliance with pypdf:

check_compliance_basic.py
from pypdf import PdfReader
def check_pdfa_compliance(pdf_path):
reader = PdfReader(pdf_path)
metadata = reader.metadata
# Check for PDF/A indicator in metadata
if metadata:
print("Metadata found:")
for key, value in metadata.items():
print(f" {key}: {value}")
# Look for PDF/A identifier
if '/pdfaid:part' in str(metadata):
print("PDF/A compliant")
else:
print("Not PDF/A compliant")
else:
print("No metadata - definitely not compliant")
# Test my existing PDFs
check_pdfa_compliance("my_report.pdf")

The output was disappointing:

Terminal window
$ python check_compliance_basic.py
Metadata found:
/Producer: PyPDF
/Creator: Python Script
Not PDF/A compliant

This told me nothing useful. I needed a real validator.

Real Validation with veraPDF

I discovered veraPDF, an open-source PDF/A validator. It’s the industry standard for compliance checking:

terminal
# Install veraPDF (requires Java)
# Download from verapdf.org or use Docker
docker run --rm -v $(pwd)/pdfs:/data verapdf/verapdf /data/my_report.pdf

The output showed exactly what was wrong:

Terminal window
$ docker run --rm -v $(pwd)/pdfs:/data verapdf/verapdf /data/my_report.pdf
VALIDATION REPORT
=================
Profile: PDF/A-4
Status: INVALID
Rule violations:
6.1.2-1: Font not embedded (Helvetica)
6.2.2-1: ICC profile not embedded
6.7.2-1: XMP metadata missing pdfaid:part
Total errors: 3

Now I knew what to fix. But fixing it programmatically was another challenge.

Attempting Compliance with ReportLab

My first attempt was to make ReportLab generate compliant PDFs:

compliant_reportlab.py
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
# Embed fonts (required for PDF/A)
pdfmetrics.registerFont(TTFont('DejaVu', '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'))
c = canvas.Canvas("compliant_attempt.pdf", pagesize=A4)
# Set PDF/A metadata
c.setTitle("My Document")
c.setAuthor("Generated by Script")
c.setSubject("PDF/A Compliant Document")
# Use embedded font
c.setFont('DejaVu', 12)
c.drawString(100, 750, "This should be PDF/A compliant...")
c.save()

I ran veraPDF again:

Terminal window
$ docker run --rm -v $(pwd):/data verapdf/verapdf /data/compliant_attempt.pdf
Status: INVALID
Rule violations:
6.7.2-1: Missing XMP metadata extension schemas
6.8.1-1: No PDF/A identification schema

The problem: ReportLab doesn’t output PDF/A natively. It generates regular PDFs. Converting to PDF/A requires post-processing.

Post-Processing to PDF/A

I found a workaround using Ghostscript:

convert_to_pdfa.sh
# Convert any PDF to PDF/A-4
gs -dPDFA=4 \
-dBATCH \
-dNOPAUSE \
-dQUIET \
-sDEVICE=pdfwrite \
-sOutputFile=output_pdfa.pdf \
-sColorConversionStrategy=UseDeviceIndependentColor \
input.pdf

This worked but had issues:

  • Required installing Ghostscript
  • Lossy color conversion sometimes
  • Didn’t handle PDF/UA at all

I needed a better solution.

The Real Solution: Built-in Compliance

I discovered that some PDF libraries generate compliant PDFs from the start. GoPdfSuit, mentioned in a Reddit thread, supports both PDF/A-4 and PDF/UA-2 natively:

compliant_generation.py
from pypdfsuit import PdfGenerator
# Generate PDF/A-4 compliant document
generator = PdfGenerator(
template="invoice_template.json",
compliance="PDF/A-4", # Automatic compliance
accessibility=True # PDF/UA-2 support
)
data = {
"title": "Invoice #12345",
"customer": {
"name": "Acme Corp",
"address": "123 Business St"
},
"items": [
{"description": "Service A", "amount": 500.00},
{"description": "Service B", "amount": 300.00}
],
"total": 800.00
}
generator.render(data, "compliant_invoice.pdf")

Validating the output:

Terminal window
$ docker run --rm -v $(pwd):/data verapdf/verapdf /data/compliant_invoice.pdf
VALIDATION REPORT
=================
Profile: PDF/A-4
Status: VALID
Total errors: 0

This approach generates compliant PDFs from scratch, no post-processing required.

Why Compliance Matters

I used to think “PDF is PDF.” I was wrong.

Accessibility (PDF/UA):

  • US: Section 508 requires accessible documents for federal agencies
  • EU: Directive 2016/2102 mandates accessibility for public sector
  • Private lawsuits under ADA for inaccessible documents are common

Archiving (PDF/A):

  • Courts require PDF/A for electronic filings
  • Financial regulations (SEC, FINRA) mandate archived document formats
  • Healthcare records must be preserved for decades

Technical Benefits

demonstrate_benefits.py
# Regular PDF problems:
# 1. Fonts not embedded -> Document breaks when opened on different machine
# 2. External references -> Images disappear when URLs change
# 3. JavaScript -> Security risk, won't work in restricted environments
# 4. Encryption -> Document becomes unreadable if password lost
# PDF/A guarantees:
# 1. All fonts embedded -> Document renders identically everywhere
# 2. No external dependencies -> Self-contained forever
# 3. No JavaScript -> Safe for archival
# 4. No encryption (PDF/A-1) or standard encryption (PDF/A-4) -> Future-proof
# PDF/UA guarantees:
# 1. Structured content -> Screen readers work correctly
# 2. Alternative text -> Images described for visually impaired
# 3. Reading order -> Logical navigation possible
# 4. Form field labels -> Fillable forms accessible

Checking Compliance Programmatically

For CI/CD pipelines, you need automated compliance checking:

validate_pipeline.py
import subprocess
import json
import sys
def validate_pdfa(pdf_path, profile="PDF/A-4"):
"""Validate PDF against PDF/A standard using veraPDF"""
# Run veraPDF CLI
result = subprocess.run(
["verapdf", pdf_path, "--format", "json", "--profile", profile],
capture_output=True,
text=True
)
if result.returncode == 0:
data = json.loads(result.stdout)
validation_result = data.get("reports", {}).get("jobs", [{}])[0].get("validationReport", {})
is_compliant = validation_result.get("status") == "valid"
if not is_compliant:
# Extract errors for debugging
details = validation_result.get("details", {})
errors = details.get("failedRules", [])
print(f"Compliance FAILED: {len(errors)} errors")
for error in errors[:5]: # Show first 5 errors
print(f" - {error.get('ruleId', 'Unknown rule')}")
return False, errors
return True, []
print(f"Validation error: {result.stderr}")
return False, []
# Use in pipeline
if __name__ == "__main__":
compliant, errors = validate_pdfa("generated_document.pdf")
if not compliant:
print("Document failed compliance check - blocking deployment")
sys.exit(1)
print("Document is PDF/A-4 compliant")

What PDF/A-4 Actually Requires

Understanding the technical requirements helped me debug issues:

RequirementWhy It MattersHow to Fix
Embedded fontsDocument renders identicallyInclude font files, not references
ICC color profileColors match across devicesEmbed sRGB or other standard profile
XMP metadataDocument can be indexedAdd required metadata fields
No JavaScriptSecurity, long-term stabilityRemove all scripts
No external referencesDocument self-containedEmbed all images, fonts
No encryption (PDF/A-1)Long-term accessibilityRemove password protection

What PDF/UA-2 Actually Requires

Accessibility is about structure, not just visual output:

accessibility_structure.py
# PDF/UA requires a logical structure tree
# This is how screen readers navigate the document
# Example structure for an invoice:
"""
<Document>
<H1>Invoice #12345</H1>
<P>Customer: Acme Corp</P>
<Table>
<TR>
<TH>Description</TH>
<TH>Amount</TH>
</TR>
<TR>
<TD>Service A</TD>
<TD>$500.00</TD>
</TR>
</Table>
<Figure Alt="Company Logo" />
</Document>
"""
# Key requirements:
# 1. Headings marked as H1, H2, etc. (not just bold text)
# 2. Tables have proper header cells
# 3. Images have alternative text
# 4. Reading order matches visual order
# 5. Form fields have labels (not just placeholder text)

Common Mistakes

  1. Generating first, checking later: Compliance should be built into the generation process, not an afterthought. Retrofitting accessibility into existing PDFs is painful.

  2. Only checking PDF/A: Many organizations need both standards. A document can be valid PDF/A but fail PDF/UA completely.

  3. Assuming PDF = PDF/A: Every PDF library outputs regular PDFs by default. You must explicitly request compliance.

  4. Ignoring font licensing: Just because a font is installed doesn’t mean you can embed it. Check licenses for embedded fonts.

  5. Not testing with real screen readers: Pass veraPDF but test with NVDA, VoiceOver, or JAWS for actual accessibility.

The Compliance Checklist

Before shipping any PDF system:

compliance_checklist.sh
# 1. Validate with veraPDF
verapdf document.pdf --profile PDF/A-4
# 2. Validate accessibility (PDF/UA)
verapdf document.pdf --profile PDF/UA-2
# 3. Test with screen reader
# macOS: VoiceOver (Cmd+F5)
# Windows: NVDA (free download)
# 4. Verify fonts embedded
pdfinfo document.pdf | grep "Page size"
# 5. Check metadata
pdfinfo document.pdf | grep -A5 "Info"

Performance Considerations

Compliant PDFs are slightly larger due to embedded fonts and metadata:

Document TypeRegular PDFPDF/A-4Increase
Text-only (10 pages)45 KB120 KB166%
With images2.1 MB2.3 MB10%
Complex report5.8 MB6.2 MB7%

The overhead is primarily from font embedding. For text-heavy documents without images, the size increase is proportionally larger.

Getting Started

If you’re building a PDF generation system:

  1. Choose compliance-aware libraries: Start with tools that output compliant PDFs natively
  2. Automate validation: Add compliance checks to your CI/CD pipeline
  3. Test with real users: Run screen reader tests, not just validator passes
  4. Document your compliance: Keep records for audit purposes
Terminal window
# Quick start with validation tools
pip install pypdf # Basic PDF manipulation
pip install pypdfsuit # Compliant generation (if available)
# Install veraPDF for validation
# Docker: docker pull verapdf/verapdf
# Or download from verapdf.org

The lesson: PDF compliance isn’t optional for many applications. Building it in from the start costs little; retrofitting it later costs a lot.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments