How to Use Java 26 Vector API for SIMD Performance
Problem
I have a numerical computation that processes millions of values. My CPU supports SIMD (Single Instruction, Multiple Data), but Java code can’t take advantage of it. Each operation processes one value at a time.
Here’s my scalar code:
// Process one value at a timevoid scalarComputation(float[] a, float[] b, float[] c) { for (int i = 0; i < a.length; i++) { c[i] = -(a[i] * a[i] + b[i] * b[i]); }}My CPU has AVX2 instructions that can process 8 floats at once. But Java can’t use them.
Environment
- Java 26 (non-LTS)
- x86_64 CPU with AVX2 support
- Numerical processing workloads
What Is Vector API?
Vector API (JEP 529) is in its eleventh incubator release. It lets you write SIMD operations in pure Java:
Scalar (1 value at a time):┌───┐│ a │ → multiply → add → negate → ┌───┐└───┘ │ c │┌───┐ └───┘│ b │ → multiply ──────────────────→└───┘
SIMD (8 values at once):┌───────────────────────┐│ a0 a1 a2 a3 a4 a5 a6 a7 │ → multiply → add → negate → ┌───────────────────────┐└───────────────────────┘ │ c0 c1 c2 c3 c4 c5 c6 c7 │┌───────────────────────┐ └───────────────────────┘│ b0 b1 b2 b3 b4 b5 b6 b7 │ → multiply ─────────────────────────────────────────→└───────────────────────┘Same operations, but 8x throughput.
How to Use Vector API
First, add the incubator module:
java --add-modules jdk.incubator.vector -jar myapp.jarHere’s the vectorized version:
import jdk.incubator.vector.FloatVector;import jdk.incubator.vector.VectorSpecies;
public class VectorComputation { // Define vector size (256-bit = 8 floats) static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_256;
static void vectorComputation(float[] a, float[] b, float[] c) { int i = 0; int upperBound = SPECIES.loopBound(a.length);
// Process 8 floats at a time for (; i < upperBound; i += SPECIES.length()) { var va = FloatVector.fromArray(SPECIES, a, i); var vb = FloatVector.fromArray(SPECIES, b, i); var vc = va.mul(va).add(vb.mul(vb)).neg(); vc.intoArray(c, i); }
// Handle remaining elements (not divisible by 8) for (; i < a.length; i++) { c[i] = -(a[i] * a[i] + b[i] * b[i]); } }}Key concepts:
VectorSpeciesdefines the vector width (256-bit = 8 floats)FloatVector.fromArray()loads 8 floats into a vector register- Operations like
mul(),add(),neg()work on the entire vector loopBound()ensures we don’t read past array bounds
What Happens Under the Hood
The JVM compiles vector operations to CPU-specific instructions:
Java Vector Code │ ▼JIT Compiler │ ├─→ x86_64 with AVX2 → vmulps, vaddps, vnegps │ ├─→ x86_64 with AVX-512 → 16 floats at once │ └─→ ARM with NEON → 4 floats at once
Same Java code, optimal native instructionsNo JNI. No native code. The JVM handles platform differences automatically.
Why Eleventh Incubator?
Eleven iterations might seem excessive. But SIMD is tricky:
- Different CPUs have different vector widths
- Edge cases with alignment and overflow
- Interaction with garbage collection
- API ergonomics take time to get right
Each incubator refines based on developer feedback. The API is close to final, but still evolving.
When to Use Vector API
Good use cases:
- Image/video processing (pixel operations)
- Machine learning inference
- Scientific computing
- Financial calculations
- Cryptographic operations
- Audio processing
Not worth it:
- Small arrays (overhead exceeds benefit)
- Branch-heavy code (vectors don’t help)
- Memory-bound operations (CPU waits on RAM)
Performance Expectations
Real-world gains vary:
Operation Type Speedup vs Scalar─────────────────────────────────────Simple arithmetic 4-8xComplex expressions 3-6xMemory-bound 1.5-2xSmall arrays ~1x (no benefit)Your mileage depends on CPU, data size, and operation complexity.
Summary
In this post, I showed how to use Java 26’s Vector API for SIMD operations. The key points are:
- Vector API lets you write vectorized code in pure Java
- Same code compiles to optimal instructions on different CPUs
- Use
--add-modules jdk.incubator.vectorto enable - Best for large arrays with arithmetic operations
- Eleventh incubator means API is mature but still evolving
The Vector API brings high-performance computing to Java without native code. When it exits incubator status, expect it to become a standard tool for numerical workloads.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 JEP 529: Vector API
- 👨💻 Project Panama
- 👨💻 Oracle Java 26 Release
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments