How to Read and Modify Existing PDFs in TypeScript

Feb 21, 2026

Cowrie

Dev @ Bswen

Purpose

This post shows how to read and modify existing PDFs in TypeScript.

Environment

Node.js 20
TypeScript 5.3
pdf-lib 1.17.1
pdf-parse 1.1.1

The Challenge

I needed to read existing PDF documents, extract text, add annotations, merge files, and fill forms. Most tutorials show only reading OR writing, but I needed both operations on the same file.

I found that different libraries handle different tasks:

pdf-parse: Extract text from existing PDFs
pdf-lib: Modify, merge, split, and fill forms
pdf2json: Convert PDF to JSON format (complex setup)

I’ll use pdf-parse for reading and pdf-lib for modification.

Reading PDFs - Text Extraction

First, install the dependencies:

npm install pdf-parse pdf-lib
npm install --save-dev @types/node

Now extract text from a PDF:

import fs from 'fs/promises'
import pdf from 'pdf-parse'

async function extractTextFromPDF(filePath: string): Promise<string> {
  const buffer = await fs.readFile(filePath)
  const data = await pdf(buffer)

  return data.text
}

// Usage
const text = await extractTextFromPDF('existing.pdf')
console.log(text)

The pdf-parse library returns an object with:

text: All text content
numPages: Page count
info: PDF metadata (title, author, etc.)

I can also get page-specific text:

async function extractTextByPage(filePath: string): Promise<string[]> {
  const buffer = await fs.readFile(filePath)
  const data = await pdf(buffer)

  // Access individual pages
  return data.text.split('\f')  // \f is form feed (page separator)
}

Modifying PDFs - Adding Content

Now I’ll use pdf-lib to modify existing PDFs:

import { PDFDocument, rgb, StandardFonts } from 'pdf-lib'
import fs from 'fs/promises'

async function modifyPDF(inputPath: string, outputPath: string): Promise<void> {
  const pdfBytes = await fs.readFile(inputPath)
  const pdfDoc = await PDFDocument.load(pdfBytes)

  // Get the first page
  const pages = pdfDoc.getPages()
  const firstPage = pages[0]

  // Embed a font
  const font = await pdfDoc.embedFont(StandardFonts.Helvetica)

  // Add text to existing page
  firstPage.drawText('This text was added with TypeScript!', {
    x: 50,
    y: 500,
    size: 24,
    font: font,
    color: rgb(0.95, 0.1, 0.1),
  })

  // Add a rectangle annotation
  firstPage.drawRectangle({
    x: 200,
    y: 300,
    width: 100,
    height: 50,
    borderColor: rgb(0, 0, 0),
    borderWidth: 2,
    color: rgb(0.75, 0.75, 0.75),
  })

  const modifiedPdf = await pdfDoc.save()
  await fs.writeFile(outputPath, modifiedPdf)
}

// Usage
await modifyPDF('input.pdf', 'output.pdf')

I think the key here is the coordinate system. PDF-lib uses the bottom-left corner as (0, 0), so y increases upward. A typical US Letter page is 612 x 792 points.

Merging Multiple PDFs

I needed to combine several PDFs into one document:

async function mergePDFs(filePaths: string[], outputPath: string): Promise<void> {
  const mergedPdf = await PDFDocument.create()

  for (const filePath of filePaths) {
    const pdfBytes = await fs.readFile(filePath)
    const pdf = await PDFDocument.load(pdfBytes)
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices())
    copiedPages.forEach((page) => mergedPdf.addPage(page))
  }

  const mergedPdfBytes = await mergedPdf.save()
  await fs.writeFile(outputPath, mergedPdfBytes)
}

// Usage
await mergePDFs(
  ['document1.pdf', 'document2.pdf', 'document3.pdf'],
  'merged.pdf'
)

This works well for combining reports, invoices, or multi-page documents.

Splitting PDFs

Sometimes I need to split a large PDF into separate files:

async function splitPDF(inputPath: string, outputDir: string): Promise<void> {
  const pdfBytes = await fs.readFile(inputPath)
  const pdfDoc = await PDFDocument.load(pdfBytes)
  const totalPages = pdfDoc.getPageCount()

  for (let i = 0; i < totalPages; i++) {
    const newPdf = await PDFDocument.create()
    const [page] = await newPdf.copyPages(pdfDoc, [i])
    newPdf.addPage(page)

    const pdfBytes = await newPdf.save()
    await fs.writeFile(`${outputDir}/page-${i + 1}.pdf`, pdfBytes)
  }
}

// Usage
await splitPDF('large-document.pdf', './split-pages')

Filling PDF Forms

I think this is the most useful feature - programmatically filling form templates:

async function fillPDFForm(
  templatePath: string,
  formData: Record<string, string>,
  outputPath: string
): Promise<void> {
  const pdfBytes = await fs.readFile(templatePath)
  const pdfDoc = await PDFDocument.load(pdfBytes)
  const form = pdfDoc.getForm()

  // Fill text fields
  Object.entries(formData).forEach(([fieldName, value]) => {
    const field = form.getTextField(fieldName)
    field.setText(value)
  })

  const filledPdf = await pdfDoc.save()
  await fs.writeFile(outputPath, filledPdf)
}

// Usage
await fillPDFForm(
  'form-template.pdf',
  {
    name: 'John Doe',
    email: '[email protected]',
    address: '123 Main St',
    city: 'San Francisco',
    zip: '94102'
  },
  'filled-form.pdf'
)

I can also extract form data:

async function extractFormData(filePath: string): Promise<Record<string, string>> {
  const pdfBytes = await fs.readFile(filePath)
  const pdfDoc = await PDFDocument.load(pdfBytes)
  const form = pdfDoc.getForm()
  const fields = form.getFields()
  const data: Record<string, string> = {}

  fields.forEach(field => {
    const fieldName = field.getName()
    const fieldType = field.constructor.name

    if (fieldType === 'PDFTextField') {
      data[fieldName] = field.getText()
    } else if (fieldType === 'PDFCheckBox') {
      data[fieldName] = field.isChecked() ? 'checked' : 'unchecked'
    }
  })

  return data
}

Page Manipulation

I can also remove or reorder pages:

async function removePages(
  inputPath: string,
  pagesToRemove: number[],
  outputPath: string
): Promise<void> {
  const pdfBytes = await fs.readFile(inputPath)
  const pdfDoc = await PDFDocument.load(pdfBytes)

  // Sort in descending order to avoid index shifting
  pagesToRemove.sort((a, b) => b - a)

  pagesToRemove.forEach(pageIndex => {
    const pages = pdfDoc.getPages()
    if (pageIndex < pages.length) {
      pdfDoc.removePage(pageIndex)
    }
  })

  const modifiedPdf = await pdfDoc.save()
  await fs.writeFile(outputPath, modifiedPdf)
}

// Usage - remove pages 2 and 5
await removePages('document.pdf', [1, 4], 'modified.pdf')

The Reason

I think PDF manipulation in TypeScript works well because:

Type safety: pdf-lib provides full TypeScript definitions, so I catch errors at compile time
No external dependencies: These libraries don’t require system-level PDF tools like Ghostscript
Cross-platform: Works the same on macOS, Linux, and Windows
Browser support: pdf-lib also works in browser environments, not just Node.js

The main limitation is text extraction - pdf-lib can’t extract text content, which is why I use pdf-parse for reading and pdf-lib for writing.

Summary

In this post, I showed how to read and modify existing PDFs in TypeScript using pdf-parse and pdf-lib. The key point is using the right library for each task: pdf-parse for text extraction, pdf-lib for modifications like merging, splitting, and form filling.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 pdf-lib Documentation
👨‍💻 PDF-parse GitHub

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!