How Do I Wrap CLI Tools in Python for Batch File Processing

Mar 14, 2026

The Problem

I needed to compress 50 PDF files. Ghostscript is the tool for this, but the command syntax is absurdly complex:

gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dQUIET -sOutputFile=output.pdf input.pdf

I stared at this command for 30 seconds trying to remember what each flag does. Then I had to do it again for the next file. And the next.

The problem isn’t just the length - it’s that I use this maybe once a month. Every time, I have to look up the syntax again. And for batch processing? I’d need to write a shell script with loops, error handling, progress reporting…

There had to be a better way.

What I Wanted

A simple command I could actually remember:

pdf-compress *.pdf --quality ebook

That’s it. One command. Handles all the files. Shows progress. Handles errors. No googling required.

First Attempt: A Shell Script

I started with bash:

#!/bin/bash
for file in *.pdf; do
    gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \
       -dNOPAUSE -dBATCH -dQUIET \
       -sOutputFile="${file%.pdf}_compressed.pdf" "$file"
done

This worked, but I quickly ran into problems:

No error handling - When Ghostscript failed, the script kept going
No progress feedback - Processing 50 files in silence is nerve-wracking
No quality options - I wanted --quality screen|ebook|printer|prepress
Glob pattern limitations - Handling *.pdf vs docs/*.pdf vs absolute paths
No parallel processing - One file at a time on a 8-core CPU

I realized I was reinventing what Python already does well.

Second Attempt: Basic Python Wrapper

I switched to Python for better control:

#!/usr/bin/env python3
import subprocess
import sys
from pathlib import Path

def compress_pdf(input_path, output_path, quality="ebook"):
    cmd = [
        "gs", "-sDEVICE=pdfwrite",
        f"-dPDFSETTINGS=/{quality}",
        "-dNOPAUSE", "-dBATCH", "-dQUIET",
        f"-sOutputFile={output_path}",
        str(input_path)
    ]
    subprocess.run(cmd, check=True)

if __name__ == "__main__":
    for pdf in Path(".").glob("*.pdf"):
        output = f"{pdf.stem}_compressed.pdf"
        print(f"Compressing {pdf.name}...")
        compress_pdf(pdf, output)

Better! But I still had issues:

$ python compress_v1.py
Compressing report.pdf...
Compressing data.pdf...
Error: Ghostscript returned non-zero exit status 1

Which file failed? What was the error? The script just crashed with no context.

Problem 1: Missing Error Context

I added proper error handling:

def compress_pdf(input_path, output_path, quality="ebook"):
    """Compress PDF with detailed error reporting."""
    cmd = [
        "gs", "-sDEVICE=pdfwrite",
        f"-dPDFSETTINGS=/{quality}",
        "-dNOPAUSE", "-dBATCH", "-dQUIET",
        f"-sOutputFile={output_path}",
        str(input_path)
    ]

    result = subprocess.run(
        cmd,
        capture_output=True,
        text=True,
        check=False  # Don't raise on non-zero exit
    )

    if result.returncode != 0:
        error_msg = result.stderr.strip() or result.stdout.strip()
        raise RuntimeError(f"Ghostscript error: {error_msg}")

    return True

Now when something fails, I get useful information:

Error compressing corrupted.pdf: Ghostscript error: Can't find font 'ArialMT'

Problem 2: Bytes vs Strings

I hit another wall when Ghostscript returned binary garbage mixed with text:

Error: b'\xff\xfe\x00\x00...output data...'

The text=True flag wasn’t enough. I needed to handle encoding explicitly:

def run_command(cmd, cwd=None):
    """Execute command and return (success, output)."""
    result = subprocess.run(
        cmd,
        cwd=cwd,
        capture_output=True,
        check=False,
    )

    # Decode output, handling encoding issues
    stdout = result.stdout.decode('utf-8', errors='replace')
    stderr = result.stderr.decode('utf-8', errors='replace')
    output = (stdout + stderr).strip()

    return result.returncode == 0, output

Problem 3: No CLI Interface

Typing python compress_v1.py and editing the script for different quality settings was tedious. I added argparse:

import argparse

def main():
    parser = argparse.ArgumentParser(
        description="Batch compress PDF files using Ghostscript"
    )
    parser.add_argument(
        "files",
        nargs="+",
        help="PDF files to compress (supports wildcards)"
    )
    parser.add_argument(
        "-q", "--quality",
        choices=["screen", "ebook", "printer", "prepress"],
        default="ebook",
        help="Compression quality (default: ebook)"
    )
    parser.add_argument(
        "-o", "--output-dir",
        help="Output directory (default: same as input)"
    )
    parser.add_argument(
        "--dry-run",
        action="store_true",
        help="Show commands without running"
    )
    args = parser.parse_args()

Now I could use it properly:

$ pdf-compress *.pdf
$ pdf-compress *.pdf -q screen -o compressed/
$ pdf-compress *.pdf --dry-run

Problem 4: Glob Patterns Don’t Work

I discovered that *.pdf isn’t expanded by Python - the shell does it. When running from Python:

# This fails - argparse sees the literal string "*.pdf"
args = parser.parse_args(["*.pdf"])
print(args.files)  # ['*.pdf'] - not the actual files!

I needed to handle glob patterns myself:

from pathlib import Path

def resolve_files(patterns):
    """Expand glob patterns to actual file paths."""
    files = []
    for pattern in patterns:
        path = Path(pattern)
        if path.exists():
            # Direct file reference
            files.append(path)
        else:
            # Try as glob pattern
            parent = path.parent if path.parent.exists() else Path(".")
            matches = list(parent.glob(path.name))
            if not matches:
                print(f"Warning: No files match '{pattern}'")
            files.extend(matches)
    return files

Problem 5: No Progress Feedback

Processing 50 files in silence felt broken. I added progress output:

def process_batch(files, quality, output_dir):
    """Process files with progress feedback."""
    total = len(files)
    success = 0

    for i, pdf_file in enumerate(files, 1):
        output_path = output_dir / f"{pdf_file.stem}_compressed.pdf"

        print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)

        try:
            compress_pdf(pdf_file, output_path, quality)
            print("OK")
            success += 1
        except RuntimeError as e:
            print(f"FAILED: {e}")

    print(f"\nCompleted: {success}/{total} files")
    return success == total

[1/50] report.pdf... OK
[2/50] data.pdf... OK
[3/50] corrupted.pdf... FAILED: Ghostscript error: Invalid PDF
[4/50] memo.pdf... OK
...

Completed: 49/50 files

The Complete Solution

After several iterations, here’s the full script:

#!/usr/bin/env python3
"""
pdf-compress: Batch compress PDF files using Ghostscript.

Usage:
    pdf-compress *.pdf
    pdf-compress *.pdf -q screen -o compressed/
    pdf-compress *.pdf --dry-run
"""

import argparse
import subprocess
import sys
from pathlib import Path


def run_command(cmd, cwd=None):
    """Execute command and return (success, output)."""
    result = subprocess.run(
        cmd,
        cwd=cwd,
        capture_output=True,
        check=False,
    )
    stdout = result.stdout.decode('utf-8', errors='replace')
    stderr = result.stderr.decode('utf-8', errors='replace')
    output = (stdout + stderr).strip()
    return result.returncode == 0, output


def compress_pdf(input_path, output_path, quality="ebook"):
    """Compress a single PDF using Ghostscript."""
    cmd = [
        "gs",
        "-sDEVICE=pdfwrite",
        f"-dPDFSETTINGS=/{quality}",
        "-dNOPAUSE",
        "-dBATCH",
        "-dQUIET",
        f"-sOutputFile={output_path}",
        str(input_path),
    ]
    success, output = run_command(cmd)
    if not success:
        raise RuntimeError(output or "Unknown Ghostscript error")
    return True


def resolve_files(patterns):
    """Expand glob patterns to actual file paths."""
    files = []
    for pattern in patterns:
        path = Path(pattern)
        if path.exists():
            files.append(path)
        else:
            parent = path.parent if str(path.parent) != '.' else Path(".")
            matches = list(parent.glob(path.name))
            if not matches:
                print(f"Warning: No files match '{pattern}'")
            files.extend(matches)

    # Filter to PDFs only
    files = [f for f in files if f.suffix.lower() == '.pdf']
    return sorted(set(files))


def main():
    parser = argparse.ArgumentParser(
        description="Batch compress PDF files using Ghostscript",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Quality levels:
  screen    - 72 dpi, lowest quality, smallest size
  ebook     - 150 dpi, good for screen reading (default)
  printer   - 300 dpi, high quality
  prepress  - 300 dpi, highest quality

Examples:
  %(prog)s *.pdf
  %(prog)s *.pdf -q screen
  %(prog)s docs/*.pdf -o compressed/
  %(prog)s *.pdf --dry-run
        """
    )
    parser.add_argument(
        "files",
        nargs="+",
        help="PDF files to compress (supports glob patterns)"
    )
    parser.add_argument(
        "-q", "--quality",
        choices=["screen", "ebook", "printer", "prepress"],
        default="ebook",
        help="Compression quality (default: ebook)"
    )
    parser.add_argument(
        "-o", "--output-dir",
        type=Path,
        help="Output directory (default: overwrite with _compressed suffix)"
    )
    parser.add_argument(
        "--dry-run",
        action="store_true",
        help="Show commands without running"
    )
    args = parser.parse_args()

    # Resolve file patterns
    files = resolve_files(args.files)
    if not files:
        print("No PDF files found")
        sys.exit(1)

    # Setup output directory
    output_dir = args.output_dir or Path.cwd()
    if args.output_dir:
        output_dir.mkdir(parents=True, exist_ok=True)

    # Process files
    success_count = 0
    total = len(files)

    print(f"Compressing {total} PDF file(s) with quality '{args.quality}'...")
    print()

    for i, pdf_file in enumerate(files, 1):
        if args.output_dir:
            output_path = output_dir / pdf_file.name
        else:
            output_path = pdf_file.parent / f"{pdf_file.stem}_compressed.pdf"

        print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)

        if args.dry_run:
            print("DRY RUN")
            continue

        try:
            compress_pdf(pdf_file, output_path, args.quality)
            print("OK")
            success_count += 1
        except RuntimeError as e:
            print(f"FAILED: {e}")

    if not args.dry_run:
        print(f"\nCompressed {success_count}/{total} files")

    sys.exit(0 if success_count == total else 1)


if __name__ == "__main__":
    main()

Making It Executable

I installed it as a system command:

# Make executable
chmod +x pdf-compress

# Move to PATH
sudo mv pdf-compress /usr/local/bin/

# Now I can run it anywhere
pdf-compress *.pdf -q screen

A General Pattern for CLI Wrappers

This pattern works for any CLI tool. Here’s a reusable base class:

"""Base class for CLI tool wrappers."""
import subprocess
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed


class CLIToolWrapper:
    """Base class for wrapping command-line tools."""

    def __init__(self, tool_name, timeout=300):
        self.tool_name = tool_name
        self.timeout = timeout

    def run(self, args, cwd=None, input_data=None):
        """Run the CLI tool with given arguments."""
        result = subprocess.run(
            [self.tool_name] + args,
            cwd=cwd,
            input=input_data,
            capture_output=True,
            text=True,
            timeout=self.timeout,
            check=False,
        )
        return result

    def process_batch(self, items, processor_func, max_workers=4):
        """Process multiple items in parallel."""
        results = {}
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            futures = {
                executor.submit(processor_func, item): item
                for item in items
            }
            for future in as_completed(futures):
                item = futures[future]
                try:
                    results[str(item)] = future.result()
                except Exception as e:
                    print(f"Error processing {item}: {e}")
                    results[str(item)] = False
        return results

Using this for FFmpeg video conversion:

#!/usr/bin/env python3
"""Batch convert videos using FFmpeg."""
import argparse
from pathlib import Path
from cli_wrapper_base import CLIToolWrapper


class VideoConverter(CLIToolWrapper):
    def __init__(self):
        super().__init__("ffmpeg")

    def convert_to_webm(self, input_path, output_path, crf=28):
        """Convert video to WebM format."""
        args = [
            "-i", str(input_path),
            "-c:v", "libvpx-vp9",
            "-crf", str(crf),
            "-b:v", "0",
            "-c:a", "libopus",
            "-b:a", "128k",
            str(output_path),
            "-y"  # Overwrite
        ]
        result = self.run(args)
        return result.returncode == 0


def main():
    parser = argparse.ArgumentParser(description="Batch convert videos to WebM")
    parser.add_argument("files", nargs="+", help="Video files to convert")
    parser.add_argument("-c", "--crf", type=int, default=28,
                        help="Quality (lower=better, default: 28)")
    parser.add_argument("-p", "--parallel", type=int, default=4,
                        help="Parallel conversions (default: 4)")
    args = parser.parse_args()

    converter = VideoConverter()
    files = [Path(f) for f in args.files if Path(f).exists()]

    def process(video):
        output = video.with_suffix(".webm")
        return converter.convert_to_webm(video, output, args.crf)

    results = converter.process_batch(files, process, args.parallel)
    success = sum(1 for v in results.values() if v)
    print(f"Converted {success}/{len(files)} videos")


if __name__ == "__main__":
    main()

Common Mistakes to Avoid

Mistake 1: Using `shell=True`

# WRONG - Security vulnerability with user input
subprocess.run(f"gs -sOutputFile={output} {input}", shell=True)

# RIGHT - Use list form, no shell
subprocess.run(["gs", "-sOutputFile", output, input])

Mistake 2: Not Handling Non-Zero Exit Codes

result = subprocess.run(cmd)
# Continues even if command failed!

result = subprocess.run(cmd, check=False)
if result.returncode != 0:
    print(f"Error: {result.stderr}")
    sys.exit(1)

Mistake 3: Forgetting to Decode Output

result = subprocess.run(cmd, capture_output=True)
print(result.stdout)  # Prints b'...bytes...'

result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout)  # Proper string

Mistake 4: No Timeout for Long Operations

result = subprocess.run(cmd)  # Hangs forever if command hangs

try:
    result = subprocess.run(cmd, timeout=300)
except subprocess.TimeoutExpired:
    print("Command timed out after 5 minutes")

Summary

The pattern for wrapping CLI tools in Python is:

Use subprocess.run() with list arguments - Never shell=True with user input
Use argparse for friendly CLI - Add help text and examples
Handle glob patterns explicitly - Python doesn’t expand wildcards like bash
Add progress feedback - Users want to know something is happening
Handle errors gracefully - Show what failed and why

The result: A memorable command like pdf-compress *.pdf replaces a 50-character command I’d have to look up every time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!