Skip to content

How to Read and Modify Existing PDFs in TypeScript

Purpose

This post shows how to read and modify existing PDFs in TypeScript.

Environment

  • Node.js 20
  • TypeScript 5.3
  • pdf-lib 1.17.1
  • pdf-parse 1.1.1

The Challenge

I needed to read existing PDF documents, extract text, add annotations, merge files, and fill forms. Most tutorials show only reading OR writing, but I needed both operations on the same file.

I found that different libraries handle different tasks:

  • pdf-parse: Extract text from existing PDFs
  • pdf-lib: Modify, merge, split, and fill forms
  • pdf2json: Convert PDF to JSON format (complex setup)

I’ll use pdf-parse for reading and pdf-lib for modification.

Reading PDFs - Text Extraction

First, install the dependencies:

Terminal window
npm install pdf-parse pdf-lib
npm install --save-dev @types/node

Now extract text from a PDF:

"extract-text.ts
import fs from 'fs/promises'
import pdf from 'pdf-parse'
async function extractTextFromPDF(filePath: string): Promise<string> {
const buffer = await fs.readFile(filePath)
const data = await pdf(buffer)
return data.text
}
// Usage
const text = await extractTextFromPDF('existing.pdf')
console.log(text)

The pdf-parse library returns an object with:

  • text: All text content
  • numPages: Page count
  • info: PDF metadata (title, author, etc.)

I can also get page-specific text:

"extract-text-pages.ts
async function extractTextByPage(filePath: string): Promise<string[]> {
const buffer = await fs.readFile(filePath)
const data = await pdf(buffer)
// Access individual pages
return data.text.split('\f') // \f is form feed (page separator)
}

Modifying PDFs - Adding Content

Now I’ll use pdf-lib to modify existing PDFs:

"modify-pdf.ts
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib'
import fs from 'fs/promises'
async function modifyPDF(inputPath: string, outputPath: string): Promise<void> {
const pdfBytes = await fs.readFile(inputPath)
const pdfDoc = await PDFDocument.load(pdfBytes)
// Get the first page
const pages = pdfDoc.getPages()
const firstPage = pages[0]
// Embed a font
const font = await pdfDoc.embedFont(StandardFonts.Helvetica)
// Add text to existing page
firstPage.drawText('This text was added with TypeScript!', {
x: 50,
y: 500,
size: 24,
font: font,
color: rgb(0.95, 0.1, 0.1),
})
// Add a rectangle annotation
firstPage.drawRectangle({
x: 200,
y: 300,
width: 100,
height: 50,
borderColor: rgb(0, 0, 0),
borderWidth: 2,
color: rgb(0.75, 0.75, 0.75),
})
const modifiedPdf = await pdfDoc.save()
await fs.writeFile(outputPath, modifiedPdf)
}
// Usage
await modifyPDF('input.pdf', 'output.pdf')

I think the key here is the coordinate system. PDF-lib uses the bottom-left corner as (0, 0), so y increases upward. A typical US Letter page is 612 x 792 points.

Merging Multiple PDFs

I needed to combine several PDFs into one document:

"merge-pdfs.ts
async function mergePDFs(filePaths: string[], outputPath: string): Promise<void> {
const mergedPdf = await PDFDocument.create()
for (const filePath of filePaths) {
const pdfBytes = await fs.readFile(filePath)
const pdf = await PDFDocument.load(pdfBytes)
const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices())
copiedPages.forEach((page) => mergedPdf.addPage(page))
}
const mergedPdfBytes = await mergedPdf.save()
await fs.writeFile(outputPath, mergedPdfBytes)
}
// Usage
await mergePDFs(
['document1.pdf', 'document2.pdf', 'document3.pdf'],
'merged.pdf'
)

This works well for combining reports, invoices, or multi-page documents.

Splitting PDFs

Sometimes I need to split a large PDF into separate files:

"split-pdf.ts
async function splitPDF(inputPath: string, outputDir: string): Promise<void> {
const pdfBytes = await fs.readFile(inputPath)
const pdfDoc = await PDFDocument.load(pdfBytes)
const totalPages = pdfDoc.getPageCount()
for (let i = 0; i < totalPages; i++) {
const newPdf = await PDFDocument.create()
const [page] = await newPdf.copyPages(pdfDoc, [i])
newPdf.addPage(page)
const pdfBytes = await newPdf.save()
await fs.writeFile(`${outputDir}/page-${i + 1}.pdf`, pdfBytes)
}
}
// Usage
await splitPDF('large-document.pdf', './split-pages')

Filling PDF Forms

I think this is the most useful feature - programmatically filling form templates:

"fill-form.ts
async function fillPDFForm(
templatePath: string,
formData: Record<string, string>,
outputPath: string
): Promise<void> {
const pdfBytes = await fs.readFile(templatePath)
const pdfDoc = await PDFDocument.load(pdfBytes)
const form = pdfDoc.getForm()
// Fill text fields
Object.entries(formData).forEach(([fieldName, value]) => {
const field = form.getTextField(fieldName)
field.setText(value)
})
const filledPdf = await pdfDoc.save()
await fs.writeFile(outputPath, filledPdf)
}
// Usage
await fillPDFForm(
'form-template.pdf',
{
name: 'John Doe',
address: '123 Main St',
city: 'San Francisco',
zip: '94102'
},
'filled-form.pdf'
)

I can also extract form data:

"extract-form.ts
async function extractFormData(filePath: string): Promise<Record<string, string>> {
const pdfBytes = await fs.readFile(filePath)
const pdfDoc = await PDFDocument.load(pdfBytes)
const form = pdfDoc.getForm()
const fields = form.getFields()
const data: Record<string, string> = {}
fields.forEach(field => {
const fieldName = field.getName()
const fieldType = field.constructor.name
if (fieldType === 'PDFTextField') {
data[fieldName] = field.getText()
} else if (fieldType === 'PDFCheckBox') {
data[fieldName] = field.isChecked() ? 'checked' : 'unchecked'
}
})
return data
}

Page Manipulation

I can also remove or reorder pages:

"manipulate-pages.ts
async function removePages(
inputPath: string,
pagesToRemove: number[],
outputPath: string
): Promise<void> {
const pdfBytes = await fs.readFile(inputPath)
const pdfDoc = await PDFDocument.load(pdfBytes)
// Sort in descending order to avoid index shifting
pagesToRemove.sort((a, b) => b - a)
pagesToRemove.forEach(pageIndex => {
const pages = pdfDoc.getPages()
if (pageIndex < pages.length) {
pdfDoc.removePage(pageIndex)
}
})
const modifiedPdf = await pdfDoc.save()
await fs.writeFile(outputPath, modifiedPdf)
}
// Usage - remove pages 2 and 5
await removePages('document.pdf', [1, 4], 'modified.pdf')

The Reason

I think PDF manipulation in TypeScript works well because:

  1. Type safety: pdf-lib provides full TypeScript definitions, so I catch errors at compile time
  2. No external dependencies: These libraries don’t require system-level PDF tools like Ghostscript
  3. Cross-platform: Works the same on macOS, Linux, and Windows
  4. Browser support: pdf-lib also works in browser environments, not just Node.js

The main limitation is text extraction - pdf-lib can’t extract text content, which is why I use pdf-parse for reading and pdf-lib for writing.

Summary

In this post, I showed how to read and modify existing PDFs in TypeScript using pdf-parse and pdf-lib. The key point is using the right library for each task: pdf-parse for text extraction, pdf-lib for modifications like merging, splitting, and form filling.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments