MultipartFile.getBytes() vs getInputStream() in Spring Boot: Memory Management for File Uploads
Our production server crashed. Again.
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.String.<init>(String.java:411) at java.lang.String.<init>(String.java:621) at com.example.service.FileService.uploadFile(FileService.java:42)I stared at the logs. The file upload endpoint had been working perfectly in development. But in production, a 200MB file upload killed the JVM.
The Problem: getBytes() Loads Everything Into Memory
Here’s what I had written:
@Servicepublic class FileService {
public void uploadFile(MultipartFile file) throws IOException { // This looks innocent enough byte[] bytes = file.getBytes();
// Process the bytes... String content = new String(bytes, StandardCharsets.UTF_8);
// Save to storage saveToDatabase(content); }}In my tests, I was uploading small files—5KB, 10KB, maybe 100KB. Everything worked fine.
But getBytes() does exactly what it says: it loads the entire file into heap memory as a byte array. A 200MB file requires 200MB of heap space. And if multiple users upload large files simultaneously? The JVM spirals into GC thrashing before finally throwing OutOfMemoryError.
Understanding the Two Approaches
What getBytes() Actually Does
// MultipartFile.getBytes() essentially does this:public byte[] getBytes() throws IOException { InputStream is = getInputStream(); byte[] bytes = new byte[(int) getSize()]; // Allocates full file size! // ... reads entire stream into array return bytes;}The entire file content sits in your JVM heap. For a 500MB upload:
Heap Memory with getBytes():┌────────────────────────────────────────────────┐│ ████████████████████████████████████████████ │ 500MB+ occupied└────────────────────────────────────────────────┘What getInputStream() Does
// getInputStream() returns a stream that reads on-demandInputStream is = file.getInputStream();
// You control how much memory is usedbyte[] buffer = new byte[8192]; // Only 8KB in memoryint bytesRead = is.read(buffer);With streaming, memory usage stays constant:
Heap Memory with getInputStream():┌────────────────────────────────────────────────┐│ ████ │ ~8KB buffer└────────────────────────────────────────────────┘The Fix: Streaming with getInputStream()
I rewrote the upload method:
@Servicepublic class FileService {
public void uploadFile(MultipartFile file) throws IOException { Path outputPath = Paths.get("/uploads", file.getOriginalFilename());
try (InputStream inputStream = file.getInputStream(); OutputStream outputStream = Files.newOutputStream(outputPath)) {
byte[] buffer = new byte[8192]; // 8KB buffer int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1) { outputStream.write(buffer, 0, bytesRead); } }
// File written with constant memory usage }}The key changes:
- Use try-with-resources — Ensures streams are closed properly
- Stream with a buffer — Only 8KB lives in memory at any time
- No byte array allocation — The file never fully loads into heap
This handles a 500MB file the same way it handles a 5KB file: constant ~8KB memory footprint.
When Each Method Makes Sense
Use getBytes() When:
public String calculateHash(MultipartFile file) throws IOException { // Acceptable: checksums need the full content anyway if (file.getSize() > 10_000_000) { // 10MB limit throw new IllegalArgumentException("File too large for this operation"); }
byte[] bytes = file.getBytes(); return DigestUtils.md5Hex(bytes);}Conditions for getBytes():
- File size is guaranteed small (under 10MB is a reasonable threshold)
- You need random access to bytes (hashing, encryption)
- Size validation happens before calling
getBytes()
Use getInputStream() When:
public void uploadToS3(MultipartFile file, String bucket, String key) throws IOException { S3Client s3 = S3Client.create();
PutObjectRequest request = PutObjectRequest.builder() .bucket(bucket) .key(key) .contentLength(file.getSize()) .build();
// S3 SDK streams directly - no memory spike s3.putObject(request, RequestBody.fromInputStream( file.getInputStream(), file.getSize() ));}Conditions for getInputStream():
- File size is unknown or potentially large
- Processing can be done incrementally (line by line, chunk by chunk)
- Writing to another stream destination (file, database, cloud storage)
Real-World Example: Processing Large CSV Files
A colleague had this code for importing CSV data:
public void importCsv(MultipartFile file) throws IOException { // Loads entire CSV into memory! String content = new String(file.getBytes(), StandardCharsets.UTF_8);
String[] lines = content.split("\n"); for (String line : lines) { processRow(line); }}It worked for test files with 100 rows. Production CSVs had 500,000 rows.
The streaming fix:
public void importCsv(MultipartFile file) throws IOException { try (InputStreamReader reader = new InputStreamReader(file.getInputStream(), StandardCharsets.UTF_8); BufferedReader br = new BufferedReader(reader)) {
String line; int lineNumber = 0;
while ((line = br.readLine()) != null) { lineNumber++; processRow(line);
// Batch commit every 1000 lines if (lineNumber % 1000 == 0) { commitBatch(); } } }}Now memory usage stays flat regardless of CSV size.
Comparison Summary
| Aspect | getBytes() | getInputStream() |
|---|---|---|
| Memory Usage | O(file size) | O(buffer size) |
| Max File Size | Limited by heap | Virtually unlimited |
| Code Complexity | Simple | Slightly more complex |
| Suitable For | Small, bounded files | Any file size |
| Risk | OutOfMemoryError | Minimal |
The Lesson
What works in development often fails in production. Test files are small; real user data is unpredictable.
My rule now: default to getInputStream(). Only use getBytes() when I can prove the file size is bounded and small (with an explicit size check).
The production server hasn’t crashed since the fix. Memory usage stays stable even during heavy upload periods.
Related Knowledge
Common Mistakes That Compound Memory Issues
// BAD: Creates multiple copiesbyte[] bytes = file.getBytes(); // Copy 1: byte arrayString str = new String(bytes); // Copy 2: StringJSONObject json = new JSONObject(str); // Copy 3: parsed object
// Each copy multiplies memory usageSpring Boot Multipart Configuration
# Limit file upload size (prevents processing huge files)spring.servlet.multipart.max-file-size=50MBspring.servlet.multipart.max-request-size=50MB
# Enable multipart parsingspring.servlet.multipart.enabled=trueThese limits provide a safety net, but streaming is still essential for handling files near the limit.
Why Files Are Stored in /tmp First
Spring’s MultipartFile implementation typically stores uploads in a temporary directory. When you call getBytes(), it reads from that temp file into memory. When you call getInputStream(), it streams from the temp file.
This means even with streaming, there’s disk I/O happening. For true streaming without temp files, you’d need to configure Spring to use streaming multipart handling—but that’s a topic for another post.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments