Skip to content

How Do I Wrap CLI Tools in Python for Batch File Processing

The Problem

I needed to compress 50 PDF files. Ghostscript is the tool for this, but the command syntax is absurdly complex:

A simple PDF compression command
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dQUIET -sOutputFile=output.pdf input.pdf

I stared at this command for 30 seconds trying to remember what each flag does. Then I had to do it again for the next file. And the next.

The problem isn’t just the length - it’s that I use this maybe once a month. Every time, I have to look up the syntax again. And for batch processing? I’d need to write a shell script with loops, error handling, progress reporting…

There had to be a better way.

What I Wanted

A simple command I could actually remember:

What I wanted to type
pdf-compress *.pdf --quality ebook

That’s it. One command. Handles all the files. Shows progress. Handles errors. No googling required.

First Attempt: A Shell Script

I started with bash:

compress.sh
#!/bin/bash
for file in *.pdf; do
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \
-dNOPAUSE -dBATCH -dQUIET \
-sOutputFile="${file%.pdf}_compressed.pdf" "$file"
done

This worked, but I quickly ran into problems:

  1. No error handling - When Ghostscript failed, the script kept going
  2. No progress feedback - Processing 50 files in silence is nerve-wracking
  3. No quality options - I wanted --quality screen|ebook|printer|prepress
  4. Glob pattern limitations - Handling *.pdf vs docs/*.pdf vs absolute paths
  5. No parallel processing - One file at a time on a 8-core CPU

I realized I was reinventing what Python already does well.

Second Attempt: Basic Python Wrapper

I switched to Python for better control:

compress_v1.py
#!/usr/bin/env python3
import subprocess
import sys
from pathlib import Path
def compress_pdf(input_path, output_path, quality="ebook"):
cmd = [
"gs", "-sDEVICE=pdfwrite",
f"-dPDFSETTINGS=/{quality}",
"-dNOPAUSE", "-dBATCH", "-dQUIET",
f"-sOutputFile={output_path}",
str(input_path)
]
subprocess.run(cmd, check=True)
if __name__ == "__main__":
for pdf in Path(".").glob("*.pdf"):
output = f"{pdf.stem}_compressed.pdf"
print(f"Compressing {pdf.name}...")
compress_pdf(pdf, output)

Better! But I still had issues:

Terminal output
$ python compress_v1.py
Compressing report.pdf...
Compressing data.pdf...
Error: Ghostscript returned non-zero exit status 1

Which file failed? What was the error? The script just crashed with no context.

Problem 1: Missing Error Context

I added proper error handling:

error-handling.py
def compress_pdf(input_path, output_path, quality="ebook"):
"""Compress PDF with detailed error reporting."""
cmd = [
"gs", "-sDEVICE=pdfwrite",
f"-dPDFSETTINGS=/{quality}",
"-dNOPAUSE", "-dBATCH", "-dQUIET",
f"-sOutputFile={output_path}",
str(input_path)
]
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=False # Don't raise on non-zero exit
)
if result.returncode != 0:
error_msg = result.stderr.strip() or result.stdout.strip()
raise RuntimeError(f"Ghostscript error: {error_msg}")
return True

Now when something fails, I get useful information:

Terminal output
Error compressing corrupted.pdf: Ghostscript error: Can't find font 'ArialMT'

Problem 2: Bytes vs Strings

I hit another wall when Ghostscript returned binary garbage mixed with text:

Terminal output
Error: b'\xff\xfe\x00\x00...output data...'

The text=True flag wasn’t enough. I needed to handle encoding explicitly:

encoding-fix.py
def run_command(cmd, cwd=None):
"""Execute command and return (success, output)."""
result = subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
check=False,
)
# Decode output, handling encoding issues
stdout = result.stdout.decode('utf-8', errors='replace')
stderr = result.stderr.decode('utf-8', errors='replace')
output = (stdout + stderr).strip()
return result.returncode == 0, output

Problem 3: No CLI Interface

Typing python compress_v1.py and editing the script for different quality settings was tedious. I added argparse:

cli.py
import argparse
def main():
parser = argparse.ArgumentParser(
description="Batch compress PDF files using Ghostscript"
)
parser.add_argument(
"files",
nargs="+",
help="PDF files to compress (supports wildcards)"
)
parser.add_argument(
"-q", "--quality",
choices=["screen", "ebook", "printer", "prepress"],
default="ebook",
help="Compression quality (default: ebook)"
)
parser.add_argument(
"-o", "--output-dir",
help="Output directory (default: same as input)"
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Show commands without running"
)
args = parser.parse_args()

Now I could use it properly:

Usage examples
$ pdf-compress *.pdf
$ pdf-compress *.pdf -q screen -o compressed/
$ pdf-compress *.pdf --dry-run

Problem 4: Glob Patterns Don’t Work

I discovered that *.pdf isn’t expanded by Python - the shell does it. When running from Python:

broken-glob.py
# This fails - argparse sees the literal string "*.pdf"
args = parser.parse_args(["*.pdf"])
print(args.files) # ['*.pdf'] - not the actual files!

I needed to handle glob patterns myself:

glob-handler.py
from pathlib import Path
def resolve_files(patterns):
"""Expand glob patterns to actual file paths."""
files = []
for pattern in patterns:
path = Path(pattern)
if path.exists():
# Direct file reference
files.append(path)
else:
# Try as glob pattern
parent = path.parent if path.parent.exists() else Path(".")
matches = list(parent.glob(path.name))
if not matches:
print(f"Warning: No files match '{pattern}'")
files.extend(matches)
return files

Problem 5: No Progress Feedback

Processing 50 files in silence felt broken. I added progress output:

progress.py
def process_batch(files, quality, output_dir):
"""Process files with progress feedback."""
total = len(files)
success = 0
for i, pdf_file in enumerate(files, 1):
output_path = output_dir / f"{pdf_file.stem}_compressed.pdf"
print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)
try:
compress_pdf(pdf_file, output_path, quality)
print("OK")
success += 1
except RuntimeError as e:
print(f"FAILED: {e}")
print(f"\nCompleted: {success}/{total} files")
return success == total
Terminal output
[1/50] report.pdf... OK
[2/50] data.pdf... OK
[3/50] corrupted.pdf... FAILED: Ghostscript error: Invalid PDF
[4/50] memo.pdf... OK
...
Completed: 49/50 files

The Complete Solution

After several iterations, here’s the full script:

pdf-compress
#!/usr/bin/env python3
"""
pdf-compress: Batch compress PDF files using Ghostscript.
Usage:
pdf-compress *.pdf
pdf-compress *.pdf -q screen -o compressed/
pdf-compress *.pdf --dry-run
"""
import argparse
import subprocess
import sys
from pathlib import Path
def run_command(cmd, cwd=None):
"""Execute command and return (success, output)."""
result = subprocess.run(
cmd,
cwd=cwd,
capture_output=True,
check=False,
)
stdout = result.stdout.decode('utf-8', errors='replace')
stderr = result.stderr.decode('utf-8', errors='replace')
output = (stdout + stderr).strip()
return result.returncode == 0, output
def compress_pdf(input_path, output_path, quality="ebook"):
"""Compress a single PDF using Ghostscript."""
cmd = [
"gs",
"-sDEVICE=pdfwrite",
f"-dPDFSETTINGS=/{quality}",
"-dNOPAUSE",
"-dBATCH",
"-dQUIET",
f"-sOutputFile={output_path}",
str(input_path),
]
success, output = run_command(cmd)
if not success:
raise RuntimeError(output or "Unknown Ghostscript error")
return True
def resolve_files(patterns):
"""Expand glob patterns to actual file paths."""
files = []
for pattern in patterns:
path = Path(pattern)
if path.exists():
files.append(path)
else:
parent = path.parent if str(path.parent) != '.' else Path(".")
matches = list(parent.glob(path.name))
if not matches:
print(f"Warning: No files match '{pattern}'")
files.extend(matches)
# Filter to PDFs only
files = [f for f in files if f.suffix.lower() == '.pdf']
return sorted(set(files))
def main():
parser = argparse.ArgumentParser(
description="Batch compress PDF files using Ghostscript",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Quality levels:
screen - 72 dpi, lowest quality, smallest size
ebook - 150 dpi, good for screen reading (default)
printer - 300 dpi, high quality
prepress - 300 dpi, highest quality
Examples:
%(prog)s *.pdf
%(prog)s *.pdf -q screen
%(prog)s docs/*.pdf -o compressed/
%(prog)s *.pdf --dry-run
"""
)
parser.add_argument(
"files",
nargs="+",
help="PDF files to compress (supports glob patterns)"
)
parser.add_argument(
"-q", "--quality",
choices=["screen", "ebook", "printer", "prepress"],
default="ebook",
help="Compression quality (default: ebook)"
)
parser.add_argument(
"-o", "--output-dir",
type=Path,
help="Output directory (default: overwrite with _compressed suffix)"
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Show commands without running"
)
args = parser.parse_args()
# Resolve file patterns
files = resolve_files(args.files)
if not files:
print("No PDF files found")
sys.exit(1)
# Setup output directory
output_dir = args.output_dir or Path.cwd()
if args.output_dir:
output_dir.mkdir(parents=True, exist_ok=True)
# Process files
success_count = 0
total = len(files)
print(f"Compressing {total} PDF file(s) with quality '{args.quality}'...")
print()
for i, pdf_file in enumerate(files, 1):
if args.output_dir:
output_path = output_dir / pdf_file.name
else:
output_path = pdf_file.parent / f"{pdf_file.stem}_compressed.pdf"
print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)
if args.dry_run:
print("DRY RUN")
continue
try:
compress_pdf(pdf_file, output_path, args.quality)
print("OK")
success_count += 1
except RuntimeError as e:
print(f"FAILED: {e}")
if not args.dry_run:
print(f"\nCompressed {success_count}/{total} files")
sys.exit(0 if success_count == total else 1)
if __name__ == "__main__":
main()

Making It Executable

I installed it as a system command:

Installation
# Make executable
chmod +x pdf-compress
# Move to PATH
sudo mv pdf-compress /usr/local/bin/
# Now I can run it anywhere
pdf-compress *.pdf -q screen

A General Pattern for CLI Wrappers

This pattern works for any CLI tool. Here’s a reusable base class:

cli_wrapper_base.py
"""Base class for CLI tool wrappers."""
import subprocess
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
class CLIToolWrapper:
"""Base class for wrapping command-line tools."""
def __init__(self, tool_name, timeout=300):
self.tool_name = tool_name
self.timeout = timeout
def run(self, args, cwd=None, input_data=None):
"""Run the CLI tool with given arguments."""
result = subprocess.run(
[self.tool_name] + args,
cwd=cwd,
input=input_data,
capture_output=True,
text=True,
timeout=self.timeout,
check=False,
)
return result
def process_batch(self, items, processor_func, max_workers=4):
"""Process multiple items in parallel."""
results = {}
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {
executor.submit(processor_func, item): item
for item in items
}
for future in as_completed(futures):
item = futures[future]
try:
results[str(item)] = future.result()
except Exception as e:
print(f"Error processing {item}: {e}")
results[str(item)] = False
return results

Using this for FFmpeg video conversion:

video-convert
#!/usr/bin/env python3
"""Batch convert videos using FFmpeg."""
import argparse
from pathlib import Path
from cli_wrapper_base import CLIToolWrapper
class VideoConverter(CLIToolWrapper):
def __init__(self):
super().__init__("ffmpeg")
def convert_to_webm(self, input_path, output_path, crf=28):
"""Convert video to WebM format."""
args = [
"-i", str(input_path),
"-c:v", "libvpx-vp9",
"-crf", str(crf),
"-b:v", "0",
"-c:a", "libopus",
"-b:a", "128k",
str(output_path),
"-y" # Overwrite
]
result = self.run(args)
return result.returncode == 0
def main():
parser = argparse.ArgumentParser(description="Batch convert videos to WebM")
parser.add_argument("files", nargs="+", help="Video files to convert")
parser.add_argument("-c", "--crf", type=int, default=28,
help="Quality (lower=better, default: 28)")
parser.add_argument("-p", "--parallel", type=int, default=4,
help="Parallel conversions (default: 4)")
args = parser.parse_args()
converter = VideoConverter()
files = [Path(f) for f in args.files if Path(f).exists()]
def process(video):
output = video.with_suffix(".webm")
return converter.convert_to_webm(video, output, args.crf)
results = converter.process_batch(files, process, args.parallel)
success = sum(1 for v in results.values() if v)
print(f"Converted {success}/{len(files)} videos")
if __name__ == "__main__":
main()

Common Mistakes to Avoid

Mistake 1: Using shell=True

Never do this
# WRONG - Security vulnerability with user input
subprocess.run(f"gs -sOutputFile={output} {input}", shell=True)
Correct approach
# RIGHT - Use list form, no shell
subprocess.run(["gs", "-sOutputFile", output, input])

Mistake 2: Not Handling Non-Zero Exit Codes

Wrong approach
result = subprocess.run(cmd)
# Continues even if command failed!
Correct approach
result = subprocess.run(cmd, check=False)
if result.returncode != 0:
print(f"Error: {result.stderr}")
sys.exit(1)

Mistake 3: Forgetting to Decode Output

Wrong approach
result = subprocess.run(cmd, capture_output=True)
print(result.stdout) # Prints b'...bytes...'
Correct approach
result = subprocess.run(cmd, capture_output=True, text=True)
print(result.stdout) # Proper string

Mistake 4: No Timeout for Long Operations

Wrong approach
result = subprocess.run(cmd) # Hangs forever if command hangs
Correct approach
try:
result = subprocess.run(cmd, timeout=300)
except subprocess.TimeoutExpired:
print("Command timed out after 5 minutes")

Summary

The pattern for wrapping CLI tools in Python is:

  1. Use subprocess.run() with list arguments - Never shell=True with user input
  2. Use argparse for friendly CLI - Add help text and examples
  3. Handle glob patterns explicitly - Python doesn’t expand wildcards like bash
  4. Add progress feedback - Users want to know something is happening
  5. Handle errors gracefully - Show what failed and why

The result: A memorable command like pdf-compress *.pdf replaces a 50-character command I’d have to look up every time.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments