How to Handle Images When Converting Word Documents to HTML with Apache POI

Mar 26, 2026

I was converting Word documents to HTML using Apache POI when I noticed something troubling. The generated HTML showed broken image placeholders everywhere. The text converted perfectly, but all the embedded images were missing.

After hours of debugging, I realized the conversion wasn’t automatically extracting images from the Word documents. I needed to handle image extraction explicitly. Here’s what I learned.

The Problem with Default Conversion

When I first tried converting a docx file to HTML, I used a simple approach:

XWPFDocument document = new XWPFDocument(new FileInputStream("document.docx"));
XHTMLOptions options = XHTMLOptions.create();
XHTMLConverter.getInstance()
    .convert(document, new FileOutputStream("output.html"), options);

The HTML generated correctly, but every <img> tag had an empty or broken src attribute. The images embedded in the Word document weren’t being extracted to the filesystem.

This makes sense in hindsight. The converter can’t assume where you want images stored or what URL structure you want to use. You need to tell it explicitly.

Solution for DOCX Files Using XDocReport

For modern .docx files, XDocReport provides an ImageManager class that handles image extraction automatically. Here’s the proper approach:

import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.xhtml.base64.Base64EmbedImgManager;
import fr.opensagres.poi.xwpf.converter.xhtml.core.FileImageManager;

// First, create the images directory
Path imagesDir = Paths.get("output/images");
Files.createDirectories(imagesDir);

// Load the document
XWPFDocument document = new XWPFDocument(new FileInputStream("document.docx"));

// Create options with ImageManager
XHTMLOptions options = XHTMLOptions.create();
options.setImageManager(new FileImageManager(new File("output"), "images"));

// Convert
XHTMLConverter.getInstance()
    .convert(document, new FileOutputStream("output/index.html"), options);

The FileImageManager constructor takes two parameters:

The base output directory where your HTML file will be saved
The subdirectory name for images (relative to the base directory)

After conversion, your output structure looks like this:

output/
├── index.html
└── images/
    ├── image1.png
    ├── image2.jpg
    └── image3.png

The generated HTML contains correct relative paths:

<img src="images/image1.png" alt="embedded image">

Understanding the ImageManager Parameters

I initially confused the parameters and got broken paths. Let me clarify:

// CORRECT: Base dir is where HTML is saved, images folder is relative
XHTMLOptions options = XHTMLOptions.create();
options.setImageManager(new FileImageManager(
    new File("output"),     // Base directory (where index.html is saved)
    "images"                // Subfolder name for images
));

// WRONG: Using absolute paths can cause issues
options.setImageManager(new FileImageManager(
    new File("/absolute/path/to/output"),
    "/absolute/path/to/images"  // This creates broken relative paths!
));

The key insight is that ImageManager generates relative paths. The src attribute in HTML will be images/filename.ext, which is relative to where your HTML file is saved.

Alternative: Base64 Embedded Images

If you prefer self-contained HTML without external image files, use Base64EmbedImgManager:

XHTMLOptions options = XHTMLOptions.create();
options.setImageManager(new Base64EmbedImgManager());

XHTMLConverter.getInstance()
    .convert(document, new FileOutputStream("output.html"), options);

This embeds images directly in HTML as base64 data URLs. The HTML file becomes larger but has no external dependencies.

Solution for Legacy DOC Files

For older .doc files (not .docx), you need a different approach using HWPF and WordToHtmlConverter:

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.converter.WordToHtmlConverter;
import org.w3c.dom.Document;

// Create images directory first
Path imagesDir = Paths.get("output/images");
Files.createDirectories(imagesDir);

// Load the document
HWPFDocument document = new HWPFDocument(new FileInputStream("document.doc"));

// Create the converter
WordToHtmlConverter converter = new WordToHtmlConverter(
    DocumentBuilderFactory.newInstance()
        .newDocumentBuilder()
        .newDocument()
);

// Set custom PicturesManager to handle image extraction
converter.setPicturesManager((content, pictureType, suggestedName, width, height) -> {
    // Generate unique filename
    String imageName = "images/" + suggestedName;

    // Write image to disk
    try {
        Path imagePath = Paths.get("output", imageName);
        Files.write(imagePath, content);
    } catch (IOException e) {
        throw new RuntimeException("Failed to write image: " + imageName, e);
    }

    // Return the relative path for the HTML src attribute
    return imageName;
});

// Convert
converter.processDocument(document);

// Save HTML output
Document htmlDocument = converter.getDocument();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(
    new DOMSource(htmlDocument),
    new StreamResult(new FileOutputStream("output/index.html"))
);

Understanding the PicturesManager Interface

The PicturesManager is a functional interface with one method:

public interface PicturesManager {
    String savePicture(
        byte[] content,        // Raw image bytes
        int pictureType,       // PNG, JPEG, etc.
        String suggestedName,  // Suggested filename from Word
        float width,           // Image width in points
        float height           // Image height in points
    );
}

Your implementation must:

Save the image bytes to disk
Return the path (relative to HTML file) to use in the src attribute

Here’s a more robust implementation:

converter.setPicturesManager((content, pictureType, suggestedName, width, height) -> {
    // Determine file extension based on picture type
    String extension = switch (pictureType) {
        case PictureType.PNG -> ".png";
        case PictureType.JPEG -> ".jpg";
        case PictureType.BMP -> ".bmp";
        case PictureType.WMF -> ".wmf";
        default -> ".bin";
    };

    // Create unique filename
    String baseName = suggestedName != null ?
        suggestedName.replaceAll("[^a-zA-Z0-9.-]", "_") :
        "image_" + System.currentTimeMillis();
    String fileName = baseName + extension;
    String relativePath = "images/" + fileName;

    // Ensure directory exists
    Path imagePath = Paths.get("output/images", fileName);
    Files.createDirectories(imagePath.getParent());

    // Write image
    Files.write(imagePath, content);

    return relativePath;
});

Common Mistakes I Made

Mistake 1: Not Creating the Images Directory

// WRONG: Directory doesn't exist yet
options.setImageManager(new FileImageManager(new File("output"), "images"));
// Conversion fails with "directory not found" error

// CORRECT: Create directory first
Files.createDirectories(Paths.get("output/images"));
options.setImageManager(new FileImageManager(new File("output"), "images"));

The ImageManager doesn’t create directories automatically. You must create the target directory before conversion.

Mistake 2: Using Absolute Paths

// WRONG: Returns absolute path, breaks when HTML is moved
return "/Users/me/project/output/images/image1.png";

// CORRECT: Returns relative path, works anywhere
return "images/image1.png";

The returned path from PicturesManager goes directly into the HTML src attribute. Use relative paths for portability.

Mistake 3: Not Closing Streams

// WRONG: Stream never closed
FileOutputStream fos = new FileOutputStream(imagePath);
fos.write(content);
// fos.close() never called!

// CORRECT: Use try-with-resources
try (FileOutputStream fos = new FileOutputStream(imagePath)) {
    fos.write(content);
}

Or better, use Files.write() which handles everything:

Files.write(imagePath, content);

Complete Working Example for DOCX

Here’s a complete, ready-to-use solution for docx files:

import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter;
import fr.opensagres.poi.xwpf.converter.xhtml.XHTMLOptions;
import fr.opensagres.poi.xwpf.converter.xhtml.core.FileImageManager;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

import java.io.*;
import java.nio.file.*;

public class WordToHtmlConverter {

    public static void convert(Path docxPath, Path outputDir) throws IOException {
        // Create output directories
        Files.createDirectories(outputDir);
        Files.createDirectories(outputDir.resolve("images"));

        // Load document
        try (InputStream is = Files.newInputStream(docxPath);
             XWPFDocument document = new XWPFDocument(is)) {

            // Configure HTML options
            XHTMLOptions options = XHTMLOptions.create();
            options.setImageManager(new FileImageManager(
                outputDir.toFile(),
                "images"
            ));

            // Convert
            Path htmlPath = outputDir.resolve("index.html");
            try (OutputStream os = Files.newOutputStream(htmlPath)) {
                XHTMLConverter.getInstance().convert(document, os, options);
            }
        }
    }

    public static void main(String[] args) throws IOException {
        convert(
            Paths.get("input.docx"),
            Paths.get("output")
        );
    }
}

Dependencies Required

For docx conversion with XDocReport:

<dependencies>
    <!-- Apache POI for Word documents -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>5.2.5</version>
    </dependency>

    <!-- XDocReport for docx to HTML conversion -->
    <dependency>
        <groupId>fr.opensagres.xdocreport</groupId>
        <artifactId>fr.opensagres.poi.xwpf.converter.xhtml</artifactId>
        <version>2.0.4</version>
    </dependency>
</dependencies>

For legacy doc conversion:

<dependencies>
    <!-- Apache POI HWPF for .doc files -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-scratchpad</artifactId>
        <version>5.2.5</version>
    </dependency>

    <!-- For HTML output -->
    <dependency>
        <groupId>org.apache.poi</groupId>
        <artifactId>poi-ooxml</artifactId>
        <version>5.2.5</version>
    </dependency>
</dependencies>

When to Use Each Approach

Use XDocReport with ImageManager for:

Modern .docx files
Automatic image extraction
Clean, maintainable code

Use PicturesManager with WordToHtmlConverter for:

Legacy .doc files
Full control over image processing
Custom naming or transformation logic

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Apache POI Documentation
👨‍💻 XDocReport Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

The key takeaway: Apache POI doesn’t automatically extract images when converting to HTML. You must use ImageManager for docx files or implement PicturesManager for doc files. Always create the images directory before conversion, and use relative paths for portability.