How to Handle Large Word Documents When Converting to HTML in Java

Mar 26, 2026

The Problem

I was building a document conversion service that converts Word documents (.docx) to HTML. Everything worked fine with small files, but when I tried to convert a 50MB Word document, I got this:

java.lang.OutOfMemoryError: Java heap space
    at org.apache.xmlbeans.impl.store.Saver$TextSaver.emit(Saver.java:2045)
    at org.apache.xmlbeans.impl.store.Saver$TextSaver.preContent(Saver.java:1038)
    at org.apache.xmlbeans.impl.store.Saver.process(Saver.java:456)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.write(XWPFDocument.java:582)

Even worse, when I tried to process multiple large documents in a web application, the server became unresponsive:

INFO  [http-nio-8080-exec-1] c.e.d.DocumentController : Starting conversion for document.docx
INFO  [http-nio-8080-exec-2] c.e.d.DocumentController : Starting conversion for report.docx
... (30 seconds of silence)
ERROR [http-nio-8080-exec-1] o.a.c.c.C.[.[localhost].[/].[dispatcherServlet] : Servlet.service() threw exception
java.util.concurrent.TimeoutException: Request timed out after 30000ms

The HTTP request thread was blocked, waiting for the conversion to complete, and eventually timed out.

My Environment

Java 17
Apache POI 5.2.3
Spring Boot 3.1.0
Maven 3.9.0

The Initial Code

Here was my original document conversion code:

@Service
public class DocumentConverter {

    public String convertToHtml(Path docxPath) throws IOException {
        FileInputStream fis = new FileInputStream(docxPath.toFile());
        XWPFDocument document = new XWPFDocument(fis);

        XHTMLOptions options = XHTMLOptions.create();
        options.setImageManager(new Base64EmbedImgManager());

        StringWriter writer = new StringWriter();
        XHTMLConverter.getInstance()
            .convert(document, writer, options);

        fis.close();  // I thought this was enough
        return writer.toString();
    }
}

I thought I was doing everything right - opening the file, converting, and closing it. But there were several problems:

Memory leak: If an exception occurred, the fis.close() was never called
Blocking thread: The conversion ran on the HTTP request thread
No resource limits: Any size document could be loaded into memory

First Attempt: try-with-resources

I knew I should use try-with-resources to ensure proper cleanup:

@Service
public class DocumentConverter {

    public String convertToHtml(Path docxPath) throws IOException {
        try (FileInputStream fis = new FileInputStream(docxPath.toFile());
             XWPFDocument document = new XWPFDocument(fis)) {

            XHTMLOptions options = XHTMLOptions.create();
            options.setImageManager(new Base64EmbedImgManager());

            StringWriter writer = new StringWriter();
            XHTMLConverter.getInstance()
                .convert(document, writer, options);

            return writer.toString();
        }
    }
}

This fixed the memory leak issue - streams and document data were released immediately after conversion, even if an exception occurred.

But I still had the timeout problem for large files. The HTTP request thread was still blocked.

Second Attempt: Async Processing

I needed to run conversions asynchronously. Here’s my updated service:

@Service
public class DocumentConverter {

    private final ExecutorService executorService;

    public DocumentConverter() {
        this.executorService = Executors.newFixedThreadPool(4);
    }

    public CompletableFuture&lt;String&gt; convertToHtmlAsync(Path docxPath) {
        return CompletableFuture.supplyAsync(() -&gt; {
            try {
                return convertToHtml(docxPath);
            } catch (IOException e) {
                throw new CompletionException(e);
            }
        }, executorService);
    }

    private String convertToHtml(Path docxPath) throws IOException {
        try (FileInputStream fis = new FileInputStream(docxPath.toFile());
             XWPFDocument document = new XWPFDocument(fis)) {

            XHTMLOptions options = XHTMLOptions.create();
            options.setImageManager(new Base64EmbedImgManager());

            StringWriter writer = new StringWriter();
            XHTMLConverter.getInstance()
                .convert(document, writer, options);

            return writer.toString();
        }
    }
}

Now the controller could return immediately:

@RestController
@RequestMapping("/api/documents")
public class DocumentController {

    private final DocumentConverter converter;
    private final ConversionRepository repository;

    @PostMapping("/convert")
    public ResponseEntity&lt;ConversionResponse&gt; startConversion(
            @RequestParam("file") MultipartFile file) {

        String jobId = UUID.randomUUID().toString();

        Path tempPath = saveTempFile(file);

        converter.convertToHtmlAsync(tempPath)
            .thenAccept(html -&gt; {
                repository.saveResult(jobId, html);
            })
            .exceptionally(ex -&gt; {
                repository.saveError(jobId, ex.getMessage());
                return null;
            });

        return ResponseEntity.accepted()
            .body(new ConversionResponse(jobId, "PROCESSING"));
    }
}

The API now returns immediately with a job ID, and clients can poll for results.

Third Attempt: Add Timeout and Validation

For production use, I needed timeouts and input validation:

@Service
public class DocumentConverter {

    private static final long MAX_FILE_SIZE = 100 * 1024 * 1024; // 100MB
    private static final int CONVERSION_TIMEOUT_MINUTES = 10;

    private final ExecutorService executorService;

    public DocumentConverter() {
        this.executorService = Executors.newFixedThreadPool(4);
    }

    public CompletableFuture&lt;String&gt; convertToHtmlAsync(Path docxPath) {
        // Validate file size first
        validateDocument(docxPath);

        return CompletableFuture.supplyAsync(() -&gt; {
            try {
                return convertToHtml(docxPath);
            } catch (IOException e) {
                throw new CompletionException(e);
            }
        }, executorService)
        .orTimeout(CONVERSION_TIMEOUT_MINUTES, TimeUnit.MINUTES);
    }

    private void validateDocument(Path path) {
        if (!Files.exists(path)) {
            throw new IllegalArgumentException("File not found: " + path);
        }

        try {
            long size = Files.size(path);
            if (size > MAX_FILE_SIZE) {
                throw new IllegalArgumentException(
                    "File too large: " + size + " bytes. Maximum: " + MAX_FILE_SIZE
                );
            }
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot read file size", e);
        }
    }

    private String convertToHtml(Path docxPath) throws IOException {
        try (FileInputStream fis = new FileInputStream(docxPath.toFile());
             XWPFDocument document = new XWPFDocument(fis)) {

            XHTMLOptions options = XHTMLOptions.create();
            options.setImageManager(new Base64EmbedImgManager());

            StringWriter writer = new StringWriter();
            XHTMLConverter.getInstance()
                .convert(document, writer, options);

            return writer.toString();
        }
    }

    @PreDestroy
    public void shutdown() {
        executorService.shutdown();
    }
}

Why This Matters

Apache POI loads the entire document into memory. When you have a 50MB Word document, it can easily consume 200-300MB of heap space due to:

XML parsing overhead (the .docx format is a ZIP of XML files)
Object model creation (every paragraph, run, table becomes an object)
Image handling and base64 encoding

The key insights:

Resource lifecycle matters: Unclosed streams lead to memory leaks that accumulate over time
Thread blocking matters: Large document processing on HTTP threads causes timeouts
Input validation matters: Processing a 500MB file without limits will crash your application

Common Mistakes to Avoid

XWPFDocument document = new XWPFDocument(new FileInputStream(path));
// process document
document.close();  // What if an exception happens before this?

Always use try-with-resources:

try (XWPFDocument document = new XWPFDocument(new FileInputStream(path))) {
    // process document
}

Summary

When converting large Word documents to HTML in Java:

Always use try-with-resources to ensure streams and document objects are released immediately
Process asynchronously to prevent blocking HTTP request threads
Validate input size before loading documents into memory
Set reasonable timeouts for conversion operations
Monitor heap usage for very large files

The combination of proper resource management and async processing makes document conversion reliable and scalable.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Apache POI - the Java API for Microsoft Documents
👨‍💻 Apache POI XWPF Usermodel Documentation
👨‍💻 Java CompletableFuture Documentation

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!