How to Read and Modify Existing PDFs in TypeScript
Purpose
This post shows how to read and modify existing PDFs in TypeScript.
Environment
- Node.js 20
- TypeScript 5.3
- pdf-lib 1.17.1
- pdf-parse 1.1.1
The Challenge
I needed to read existing PDF documents, extract text, add annotations, merge files, and fill forms. Most tutorials show only reading OR writing, but I needed both operations on the same file.
I found that different libraries handle different tasks:
- pdf-parse: Extract text from existing PDFs
- pdf-lib: Modify, merge, split, and fill forms
- pdf2json: Convert PDF to JSON format (complex setup)
I’ll use pdf-parse for reading and pdf-lib for modification.
Reading PDFs - Text Extraction
First, install the dependencies:
npm install pdf-parse pdf-libnpm install --save-dev @types/nodeNow extract text from a PDF:
import fs from 'fs/promises'import pdf from 'pdf-parse'
async function extractTextFromPDF(filePath: string): Promise<string> { const buffer = await fs.readFile(filePath) const data = await pdf(buffer)
return data.text}
// Usageconst text = await extractTextFromPDF('existing.pdf')console.log(text)The pdf-parse library returns an object with:
text: All text contentnumPages: Page countinfo: PDF metadata (title, author, etc.)
I can also get page-specific text:
async function extractTextByPage(filePath: string): Promise<string[]> { const buffer = await fs.readFile(filePath) const data = await pdf(buffer)
// Access individual pages return data.text.split('\f') // \f is form feed (page separator)}Modifying PDFs - Adding Content
Now I’ll use pdf-lib to modify existing PDFs:
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib'import fs from 'fs/promises'
async function modifyPDF(inputPath: string, outputPath: string): Promise<void> { const pdfBytes = await fs.readFile(inputPath) const pdfDoc = await PDFDocument.load(pdfBytes)
// Get the first page const pages = pdfDoc.getPages() const firstPage = pages[0]
// Embed a font const font = await pdfDoc.embedFont(StandardFonts.Helvetica)
// Add text to existing page firstPage.drawText('This text was added with TypeScript!', { x: 50, y: 500, size: 24, font: font, color: rgb(0.95, 0.1, 0.1), })
// Add a rectangle annotation firstPage.drawRectangle({ x: 200, y: 300, width: 100, height: 50, borderColor: rgb(0, 0, 0), borderWidth: 2, color: rgb(0.75, 0.75, 0.75), })
const modifiedPdf = await pdfDoc.save() await fs.writeFile(outputPath, modifiedPdf)}
// Usageawait modifyPDF('input.pdf', 'output.pdf')I think the key here is the coordinate system. PDF-lib uses the bottom-left corner as (0, 0), so y increases upward. A typical US Letter page is 612 x 792 points.
Merging Multiple PDFs
I needed to combine several PDFs into one document:
async function mergePDFs(filePaths: string[], outputPath: string): Promise<void> { const mergedPdf = await PDFDocument.create()
for (const filePath of filePaths) { const pdfBytes = await fs.readFile(filePath) const pdf = await PDFDocument.load(pdfBytes) const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices()) copiedPages.forEach((page) => mergedPdf.addPage(page)) }
const mergedPdfBytes = await mergedPdf.save() await fs.writeFile(outputPath, mergedPdfBytes)}
// Usageawait mergePDFs( ['document1.pdf', 'document2.pdf', 'document3.pdf'], 'merged.pdf')This works well for combining reports, invoices, or multi-page documents.
Splitting PDFs
Sometimes I need to split a large PDF into separate files:
async function splitPDF(inputPath: string, outputDir: string): Promise<void> { const pdfBytes = await fs.readFile(inputPath) const pdfDoc = await PDFDocument.load(pdfBytes) const totalPages = pdfDoc.getPageCount()
for (let i = 0; i < totalPages; i++) { const newPdf = await PDFDocument.create() const [page] = await newPdf.copyPages(pdfDoc, [i]) newPdf.addPage(page)
const pdfBytes = await newPdf.save() await fs.writeFile(`${outputDir}/page-${i + 1}.pdf`, pdfBytes) }}
// Usageawait splitPDF('large-document.pdf', './split-pages')Filling PDF Forms
I think this is the most useful feature - programmatically filling form templates:
async function fillPDFForm( templatePath: string, formData: Record<string, string>, outputPath: string): Promise<void> { const pdfBytes = await fs.readFile(templatePath) const pdfDoc = await PDFDocument.load(pdfBytes) const form = pdfDoc.getForm()
// Fill text fields Object.entries(formData).forEach(([fieldName, value]) => { const field = form.getTextField(fieldName) field.setText(value) })
const filledPdf = await pdfDoc.save() await fs.writeFile(outputPath, filledPdf)}
// Usageawait fillPDFForm( 'form-template.pdf', { name: 'John Doe', address: '123 Main St', city: 'San Francisco', zip: '94102' }, 'filled-form.pdf')I can also extract form data:
async function extractFormData(filePath: string): Promise<Record<string, string>> { const pdfBytes = await fs.readFile(filePath) const pdfDoc = await PDFDocument.load(pdfBytes) const form = pdfDoc.getForm() const fields = form.getFields() const data: Record<string, string> = {}
fields.forEach(field => { const fieldName = field.getName() const fieldType = field.constructor.name
if (fieldType === 'PDFTextField') { data[fieldName] = field.getText() } else if (fieldType === 'PDFCheckBox') { data[fieldName] = field.isChecked() ? 'checked' : 'unchecked' } })
return data}Page Manipulation
I can also remove or reorder pages:
async function removePages( inputPath: string, pagesToRemove: number[], outputPath: string): Promise<void> { const pdfBytes = await fs.readFile(inputPath) const pdfDoc = await PDFDocument.load(pdfBytes)
// Sort in descending order to avoid index shifting pagesToRemove.sort((a, b) => b - a)
pagesToRemove.forEach(pageIndex => { const pages = pdfDoc.getPages() if (pageIndex < pages.length) { pdfDoc.removePage(pageIndex) } })
const modifiedPdf = await pdfDoc.save() await fs.writeFile(outputPath, modifiedPdf)}
// Usage - remove pages 2 and 5await removePages('document.pdf', [1, 4], 'modified.pdf')The Reason
I think PDF manipulation in TypeScript works well because:
- Type safety: pdf-lib provides full TypeScript definitions, so I catch errors at compile time
- No external dependencies: These libraries don’t require system-level PDF tools like Ghostscript
- Cross-platform: Works the same on macOS, Linux, and Windows
- Browser support: pdf-lib also works in browser environments, not just Node.js
The main limitation is text extraction - pdf-lib can’t extract text content, which is why I use pdf-parse for reading and pdf-lib for writing.
Summary
In this post, I showed how to read and modify existing PDFs in TypeScript using pdf-parse and pdf-lib. The key point is using the right library for each task: pdf-parse for text extraction, pdf-lib for modifications like merging, splitting, and form filling.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments