How Do I Wrap CLI Tools in Python for Batch File Processing
The Problem
I needed to compress 50 PDF files. Ghostscript is the tool for this, but the command syntax is absurdly complex:
gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -dQUIET -sOutputFile=output.pdf input.pdfI stared at this command for 30 seconds trying to remember what each flag does. Then I had to do it again for the next file. And the next.
The problem isn’t just the length - it’s that I use this maybe once a month. Every time, I have to look up the syntax again. And for batch processing? I’d need to write a shell script with loops, error handling, progress reporting…
There had to be a better way.
What I Wanted
A simple command I could actually remember:
pdf-compress *.pdf --quality ebookThat’s it. One command. Handles all the files. Shows progress. Handles errors. No googling required.
First Attempt: A Shell Script
I started with bash:
#!/bin/bashfor file in *.pdf; do gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook \ -dNOPAUSE -dBATCH -dQUIET \ -sOutputFile="${file%.pdf}_compressed.pdf" "$file"doneThis worked, but I quickly ran into problems:
- No error handling - When Ghostscript failed, the script kept going
- No progress feedback - Processing 50 files in silence is nerve-wracking
- No quality options - I wanted
--quality screen|ebook|printer|prepress - Glob pattern limitations - Handling
*.pdfvsdocs/*.pdfvs absolute paths - No parallel processing - One file at a time on a 8-core CPU
I realized I was reinventing what Python already does well.
Second Attempt: Basic Python Wrapper
I switched to Python for better control:
#!/usr/bin/env python3import subprocessimport sysfrom pathlib import Path
def compress_pdf(input_path, output_path, quality="ebook"): cmd = [ "gs", "-sDEVICE=pdfwrite", f"-dPDFSETTINGS=/{quality}", "-dNOPAUSE", "-dBATCH", "-dQUIET", f"-sOutputFile={output_path}", str(input_path) ] subprocess.run(cmd, check=True)
if __name__ == "__main__": for pdf in Path(".").glob("*.pdf"): output = f"{pdf.stem}_compressed.pdf" print(f"Compressing {pdf.name}...") compress_pdf(pdf, output)Better! But I still had issues:
$ python compress_v1.pyCompressing report.pdf...Compressing data.pdf...Error: Ghostscript returned non-zero exit status 1Which file failed? What was the error? The script just crashed with no context.
Problem 1: Missing Error Context
I added proper error handling:
def compress_pdf(input_path, output_path, quality="ebook"): """Compress PDF with detailed error reporting.""" cmd = [ "gs", "-sDEVICE=pdfwrite", f"-dPDFSETTINGS=/{quality}", "-dNOPAUSE", "-dBATCH", "-dQUIET", f"-sOutputFile={output_path}", str(input_path) ]
result = subprocess.run( cmd, capture_output=True, text=True, check=False # Don't raise on non-zero exit )
if result.returncode != 0: error_msg = result.stderr.strip() or result.stdout.strip() raise RuntimeError(f"Ghostscript error: {error_msg}")
return TrueNow when something fails, I get useful information:
Error compressing corrupted.pdf: Ghostscript error: Can't find font 'ArialMT'Problem 2: Bytes vs Strings
I hit another wall when Ghostscript returned binary garbage mixed with text:
Error: b'\xff\xfe\x00\x00...output data...'The text=True flag wasn’t enough. I needed to handle encoding explicitly:
def run_command(cmd, cwd=None): """Execute command and return (success, output).""" result = subprocess.run( cmd, cwd=cwd, capture_output=True, check=False, )
# Decode output, handling encoding issues stdout = result.stdout.decode('utf-8', errors='replace') stderr = result.stderr.decode('utf-8', errors='replace') output = (stdout + stderr).strip()
return result.returncode == 0, outputProblem 3: No CLI Interface
Typing python compress_v1.py and editing the script for different quality settings was tedious. I added argparse:
import argparse
def main(): parser = argparse.ArgumentParser( description="Batch compress PDF files using Ghostscript" ) parser.add_argument( "files", nargs="+", help="PDF files to compress (supports wildcards)" ) parser.add_argument( "-q", "--quality", choices=["screen", "ebook", "printer", "prepress"], default="ebook", help="Compression quality (default: ebook)" ) parser.add_argument( "-o", "--output-dir", help="Output directory (default: same as input)" ) parser.add_argument( "--dry-run", action="store_true", help="Show commands without running" ) args = parser.parse_args()Now I could use it properly:
$ pdf-compress *.pdf$ pdf-compress *.pdf -q screen -o compressed/$ pdf-compress *.pdf --dry-runProblem 4: Glob Patterns Don’t Work
I discovered that *.pdf isn’t expanded by Python - the shell does it. When running from Python:
# This fails - argparse sees the literal string "*.pdf"args = parser.parse_args(["*.pdf"])print(args.files) # ['*.pdf'] - not the actual files!I needed to handle glob patterns myself:
from pathlib import Path
def resolve_files(patterns): """Expand glob patterns to actual file paths.""" files = [] for pattern in patterns: path = Path(pattern) if path.exists(): # Direct file reference files.append(path) else: # Try as glob pattern parent = path.parent if path.parent.exists() else Path(".") matches = list(parent.glob(path.name)) if not matches: print(f"Warning: No files match '{pattern}'") files.extend(matches) return filesProblem 5: No Progress Feedback
Processing 50 files in silence felt broken. I added progress output:
def process_batch(files, quality, output_dir): """Process files with progress feedback.""" total = len(files) success = 0
for i, pdf_file in enumerate(files, 1): output_path = output_dir / f"{pdf_file.stem}_compressed.pdf"
print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)
try: compress_pdf(pdf_file, output_path, quality) print("OK") success += 1 except RuntimeError as e: print(f"FAILED: {e}")
print(f"\nCompleted: {success}/{total} files") return success == total[1/50] report.pdf... OK[2/50] data.pdf... OK[3/50] corrupted.pdf... FAILED: Ghostscript error: Invalid PDF[4/50] memo.pdf... OK...
Completed: 49/50 filesThe Complete Solution
After several iterations, here’s the full script:
#!/usr/bin/env python3"""pdf-compress: Batch compress PDF files using Ghostscript.
Usage: pdf-compress *.pdf pdf-compress *.pdf -q screen -o compressed/ pdf-compress *.pdf --dry-run"""
import argparseimport subprocessimport sysfrom pathlib import Path
def run_command(cmd, cwd=None): """Execute command and return (success, output).""" result = subprocess.run( cmd, cwd=cwd, capture_output=True, check=False, ) stdout = result.stdout.decode('utf-8', errors='replace') stderr = result.stderr.decode('utf-8', errors='replace') output = (stdout + stderr).strip() return result.returncode == 0, output
def compress_pdf(input_path, output_path, quality="ebook"): """Compress a single PDF using Ghostscript.""" cmd = [ "gs", "-sDEVICE=pdfwrite", f"-dPDFSETTINGS=/{quality}", "-dNOPAUSE", "-dBATCH", "-dQUIET", f"-sOutputFile={output_path}", str(input_path), ] success, output = run_command(cmd) if not success: raise RuntimeError(output or "Unknown Ghostscript error") return True
def resolve_files(patterns): """Expand glob patterns to actual file paths.""" files = [] for pattern in patterns: path = Path(pattern) if path.exists(): files.append(path) else: parent = path.parent if str(path.parent) != '.' else Path(".") matches = list(parent.glob(path.name)) if not matches: print(f"Warning: No files match '{pattern}'") files.extend(matches)
# Filter to PDFs only files = [f for f in files if f.suffix.lower() == '.pdf'] return sorted(set(files))
def main(): parser = argparse.ArgumentParser( description="Batch compress PDF files using Ghostscript", formatter_class=argparse.RawDescriptionHelpFormatter, epilog="""Quality levels: screen - 72 dpi, lowest quality, smallest size ebook - 150 dpi, good for screen reading (default) printer - 300 dpi, high quality prepress - 300 dpi, highest quality
Examples: %(prog)s *.pdf %(prog)s *.pdf -q screen %(prog)s docs/*.pdf -o compressed/ %(prog)s *.pdf --dry-run """ ) parser.add_argument( "files", nargs="+", help="PDF files to compress (supports glob patterns)" ) parser.add_argument( "-q", "--quality", choices=["screen", "ebook", "printer", "prepress"], default="ebook", help="Compression quality (default: ebook)" ) parser.add_argument( "-o", "--output-dir", type=Path, help="Output directory (default: overwrite with _compressed suffix)" ) parser.add_argument( "--dry-run", action="store_true", help="Show commands without running" ) args = parser.parse_args()
# Resolve file patterns files = resolve_files(args.files) if not files: print("No PDF files found") sys.exit(1)
# Setup output directory output_dir = args.output_dir or Path.cwd() if args.output_dir: output_dir.mkdir(parents=True, exist_ok=True)
# Process files success_count = 0 total = len(files)
print(f"Compressing {total} PDF file(s) with quality '{args.quality}'...") print()
for i, pdf_file in enumerate(files, 1): if args.output_dir: output_path = output_dir / pdf_file.name else: output_path = pdf_file.parent / f"{pdf_file.stem}_compressed.pdf"
print(f"[{i}/{total}] {pdf_file.name}...", end=" ", flush=True)
if args.dry_run: print("DRY RUN") continue
try: compress_pdf(pdf_file, output_path, args.quality) print("OK") success_count += 1 except RuntimeError as e: print(f"FAILED: {e}")
if not args.dry_run: print(f"\nCompressed {success_count}/{total} files")
sys.exit(0 if success_count == total else 1)
if __name__ == "__main__": main()Making It Executable
I installed it as a system command:
# Make executablechmod +x pdf-compress
# Move to PATHsudo mv pdf-compress /usr/local/bin/
# Now I can run it anywherepdf-compress *.pdf -q screenA General Pattern for CLI Wrappers
This pattern works for any CLI tool. Here’s a reusable base class:
"""Base class for CLI tool wrappers."""import subprocessfrom pathlib import Pathfrom concurrent.futures import ThreadPoolExecutor, as_completed
class CLIToolWrapper: """Base class for wrapping command-line tools."""
def __init__(self, tool_name, timeout=300): self.tool_name = tool_name self.timeout = timeout
def run(self, args, cwd=None, input_data=None): """Run the CLI tool with given arguments.""" result = subprocess.run( [self.tool_name] + args, cwd=cwd, input=input_data, capture_output=True, text=True, timeout=self.timeout, check=False, ) return result
def process_batch(self, items, processor_func, max_workers=4): """Process multiple items in parallel.""" results = {} with ThreadPoolExecutor(max_workers=max_workers) as executor: futures = { executor.submit(processor_func, item): item for item in items } for future in as_completed(futures): item = futures[future] try: results[str(item)] = future.result() except Exception as e: print(f"Error processing {item}: {e}") results[str(item)] = False return resultsUsing this for FFmpeg video conversion:
#!/usr/bin/env python3"""Batch convert videos using FFmpeg."""import argparsefrom pathlib import Pathfrom cli_wrapper_base import CLIToolWrapper
class VideoConverter(CLIToolWrapper): def __init__(self): super().__init__("ffmpeg")
def convert_to_webm(self, input_path, output_path, crf=28): """Convert video to WebM format.""" args = [ "-i", str(input_path), "-c:v", "libvpx-vp9", "-crf", str(crf), "-b:v", "0", "-c:a", "libopus", "-b:a", "128k", str(output_path), "-y" # Overwrite ] result = self.run(args) return result.returncode == 0
def main(): parser = argparse.ArgumentParser(description="Batch convert videos to WebM") parser.add_argument("files", nargs="+", help="Video files to convert") parser.add_argument("-c", "--crf", type=int, default=28, help="Quality (lower=better, default: 28)") parser.add_argument("-p", "--parallel", type=int, default=4, help="Parallel conversions (default: 4)") args = parser.parse_args()
converter = VideoConverter() files = [Path(f) for f in args.files if Path(f).exists()]
def process(video): output = video.with_suffix(".webm") return converter.convert_to_webm(video, output, args.crf)
results = converter.process_batch(files, process, args.parallel) success = sum(1 for v in results.values() if v) print(f"Converted {success}/{len(files)} videos")
if __name__ == "__main__": main()Common Mistakes to Avoid
Mistake 1: Using shell=True
# WRONG - Security vulnerability with user inputsubprocess.run(f"gs -sOutputFile={output} {input}", shell=True)# RIGHT - Use list form, no shellsubprocess.run(["gs", "-sOutputFile", output, input])Mistake 2: Not Handling Non-Zero Exit Codes
result = subprocess.run(cmd)# Continues even if command failed!result = subprocess.run(cmd, check=False)if result.returncode != 0: print(f"Error: {result.stderr}") sys.exit(1)Mistake 3: Forgetting to Decode Output
result = subprocess.run(cmd, capture_output=True)print(result.stdout) # Prints b'...bytes...'result = subprocess.run(cmd, capture_output=True, text=True)print(result.stdout) # Proper stringMistake 4: No Timeout for Long Operations
result = subprocess.run(cmd) # Hangs forever if command hangstry: result = subprocess.run(cmd, timeout=300)except subprocess.TimeoutExpired: print("Command timed out after 5 minutes")Summary
The pattern for wrapping CLI tools in Python is:
- Use
subprocess.run()with list arguments - Nevershell=Truewith user input - Use
argparsefor friendly CLI - Add help text and examples - Handle glob patterns explicitly - Python doesn’t expand wildcards like bash
- Add progress feedback - Users want to know something is happening
- Handle errors gracefully - Show what failed and why
The result: A memorable command like pdf-compress *.pdf replaces a 50-character command I’d have to look up every time.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Python subprocess module
- 👨💻 Python argparse tutorial
- 👨💻 Ghostscript documentation
- 👨💻 FFmpeg command line options
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments