Skip to content

MultipartFile.getBytes() vs getInputStream() in Spring Boot: Memory Management for File Uploads

Our production server crashed. Again.

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.String.<init>(String.java:411)
at java.lang.String.<init>(String.java:621)
at com.example.service.FileService.uploadFile(FileService.java:42)

I stared at the logs. The file upload endpoint had been working perfectly in development. But in production, a 200MB file upload killed the JVM.

The Problem: getBytes() Loads Everything Into Memory

Here’s what I had written:

FileService.java
@Service
public class FileService {
public void uploadFile(MultipartFile file) throws IOException {
// This looks innocent enough
byte[] bytes = file.getBytes();
// Process the bytes...
String content = new String(bytes, StandardCharsets.UTF_8);
// Save to storage
saveToDatabase(content);
}
}

In my tests, I was uploading small files—5KB, 10KB, maybe 100KB. Everything worked fine.

But getBytes() does exactly what it says: it loads the entire file into heap memory as a byte array. A 200MB file requires 200MB of heap space. And if multiple users upload large files simultaneously? The JVM spirals into GC thrashing before finally throwing OutOfMemoryError.

Understanding the Two Approaches

What getBytes() Actually Does

Internal behavior of getBytes()
// MultipartFile.getBytes() essentially does this:
public byte[] getBytes() throws IOException {
InputStream is = getInputStream();
byte[] bytes = new byte[(int) getSize()]; // Allocates full file size!
// ... reads entire stream into array
return bytes;
}

The entire file content sits in your JVM heap. For a 500MB upload:

Heap Memory with getBytes():
┌────────────────────────────────────────────────┐
│ ████████████████████████████████████████████ │ 500MB+ occupied
└────────────────────────────────────────────────┘

What getInputStream() Does

Streaming approach
// getInputStream() returns a stream that reads on-demand
InputStream is = file.getInputStream();
// You control how much memory is used
byte[] buffer = new byte[8192]; // Only 8KB in memory
int bytesRead = is.read(buffer);

With streaming, memory usage stays constant:

Heap Memory with getInputStream():
┌────────────────────────────────────────────────┐
│ ████ │ ~8KB buffer
└────────────────────────────────────────────────┘

The Fix: Streaming with getInputStream()

I rewrote the upload method:

FileService.java
@Service
public class FileService {
public void uploadFile(MultipartFile file) throws IOException {
Path outputPath = Paths.get("/uploads", file.getOriginalFilename());
try (InputStream inputStream = file.getInputStream();
OutputStream outputStream = Files.newOutputStream(outputPath)) {
byte[] buffer = new byte[8192]; // 8KB buffer
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
}
// File written with constant memory usage
}
}

The key changes:

  1. Use try-with-resources — Ensures streams are closed properly
  2. Stream with a buffer — Only 8KB lives in memory at any time
  3. No byte array allocation — The file never fully loads into heap

This handles a 500MB file the same way it handles a 5KB file: constant ~8KB memory footprint.

When Each Method Makes Sense

Use getBytes() When:

Small, bounded files
public String calculateHash(MultipartFile file) throws IOException {
// Acceptable: checksums need the full content anyway
if (file.getSize() > 10_000_000) { // 10MB limit
throw new IllegalArgumentException("File too large for this operation");
}
byte[] bytes = file.getBytes();
return DigestUtils.md5Hex(bytes);
}

Conditions for getBytes():

  • File size is guaranteed small (under 10MB is a reasonable threshold)
  • You need random access to bytes (hashing, encryption)
  • Size validation happens before calling getBytes()

Use getInputStream() When:

Unpredictable or large files
public void uploadToS3(MultipartFile file, String bucket, String key) throws IOException {
S3Client s3 = S3Client.create();
PutObjectRequest request = PutObjectRequest.builder()
.bucket(bucket)
.key(key)
.contentLength(file.getSize())
.build();
// S3 SDK streams directly - no memory spike
s3.putObject(request, RequestBody.fromInputStream(
file.getInputStream(),
file.getSize()
));
}

Conditions for getInputStream():

  • File size is unknown or potentially large
  • Processing can be done incrementally (line by line, chunk by chunk)
  • Writing to another stream destination (file, database, cloud storage)

Real-World Example: Processing Large CSV Files

A colleague had this code for importing CSV data:

Original approach - OOM with large files
public void importCsv(MultipartFile file) throws IOException {
// Loads entire CSV into memory!
String content = new String(file.getBytes(), StandardCharsets.UTF_8);
String[] lines = content.split("\n");
for (String line : lines) {
processRow(line);
}
}

It worked for test files with 100 rows. Production CSVs had 500,000 rows.

The streaming fix:

Streaming CSV processing
public void importCsv(MultipartFile file) throws IOException {
try (InputStreamReader reader = new InputStreamReader(file.getInputStream(), StandardCharsets.UTF_8);
BufferedReader br = new BufferedReader(reader)) {
String line;
int lineNumber = 0;
while ((line = br.readLine()) != null) {
lineNumber++;
processRow(line);
// Batch commit every 1000 lines
if (lineNumber % 1000 == 0) {
commitBatch();
}
}
}
}

Now memory usage stays flat regardless of CSV size.

Comparison Summary

AspectgetBytes()getInputStream()
Memory UsageO(file size)O(buffer size)
Max File SizeLimited by heapVirtually unlimited
Code ComplexitySimpleSlightly more complex
Suitable ForSmall, bounded filesAny file size
RiskOutOfMemoryErrorMinimal

The Lesson

What works in development often fails in production. Test files are small; real user data is unpredictable.

My rule now: default to getInputStream(). Only use getBytes() when I can prove the file size is bounded and small (with an explicit size check).

The production server hasn’t crashed since the fix. Memory usage stays stable even during heavy upload periods.

Common Mistakes That Compound Memory Issues

Multiple copies in memory
// BAD: Creates multiple copies
byte[] bytes = file.getBytes(); // Copy 1: byte array
String str = new String(bytes); // Copy 2: String
JSONObject json = new JSONObject(str); // Copy 3: parsed object
// Each copy multiplies memory usage

Spring Boot Multipart Configuration

application.properties
# Limit file upload size (prevents processing huge files)
spring.servlet.multipart.max-file-size=50MB
spring.servlet.multipart.max-request-size=50MB
# Enable multipart parsing
spring.servlet.multipart.enabled=true

These limits provide a safety net, but streaming is still essential for handling files near the limit.

Why Files Are Stored in /tmp First

Spring’s MultipartFile implementation typically stores uploads in a temporary directory. When you call getBytes(), it reads from that temp file into memory. When you call getInputStream(), it streams from the temp file.

This means even with streaming, there’s disk I/O happening. For true streaming without temp files, you’d need to configure Spring to use streaming multipart handling—but that’s a topic for another post.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments