ElasticJob DataflowJob vs SimpleJob: Which Job Type Should You Use?
I spent an hour staring at ElasticJob’s documentation, trying to figure out which job type I should use for my batch processing task. SimpleJob looked straightforward—just implement one method. But DataflowJob had two methods and something called “streaming mode.”
The documentation explained what each does, but not when to use which. After building several jobs and learning from mistakes, here’s what I wish someone had told me upfront.
The Core Difference
The choice comes down to one question: Do you need to keep processing items until a queue is empty?
┌─────────────────────┐ │ Your Task │ └──────────┬──────────┘ │ ▼ ┌───────────────────────────────┐ │ Do you fetch items from │ │ DB/queue and process them │ │ in batches? │ └───────────────┬───────────────┘ │ ┌────────────────┴────────────────┐ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ YES │ │ NO │ │ DataflowJob │ │ SimpleJob │ │ │ │ │ │ - fetchData() │ │ - execute() │ │ - processData() │ │ - One method │ │ - Streaming │ │ - One shot │ └─────────────────┘ └─────────────────┘SimpleJob: The One-Shot Executor
SimpleJob is for tasks that run once per trigger. You put all your logic in a single execute() method.
I started with SimpleJob because it seemed simpler. Here’s what it looks like:
public class ReportGenerationJob implements SimpleJob {
@Override public void execute(ShardingContext context) { int shardId = context.getShardingItem();
// Everything happens here List<User> users = userRepository.findByShard(shardId); Report report = generateReport(users); reportRepository.save(report);
log.info("Generated report for shard {}", shardId); }}The cron trigger fires, execute() runs once, and you’re done.
When SimpleJob works well:
- Generating daily reports
- Sending scheduled notifications
- Running cleanup tasks
- Calling external APIs
- File operations
The key characteristic: you know exactly what needs to happen when the job fires. There’s no “keep going until empty” logic.
DataflowJob: The Batch Processor
DataflowJob separates data fetching from processing. This separation matters when you have large datasets or queues.
My first attempt at batch processing used SimpleJob:
public class OrderProcessingJob implements SimpleJob {
@Override public void execute(ShardingContext context) { // Problem: How many orders to process? // Process too few = slow progress // Process too many = timeout risk List<Order> orders = orderRepository.findUnprocessed(1000);
for (Order order : orders) { processOrder(order); } // What if there are 5000 more orders waiting? // They won't be processed until next cron trigger }}This approach has a flaw: if there are more orders than the batch size, they wait until the next trigger. With a 5-minute cron, that’s a 5-minute delay minimum.
DataflowJob solves this with streaming mode:
public class OrderDataflowJob implements DataflowJob<Order> {
@Override public List<Order> fetchData(ShardingContext context) { int shardId = context.getShardingItem(); // Fetch a batch of unprocessed orders for this shard return orderRepository.findUnprocessed(shardId, 100); }
@Override public void processData(ShardingContext context, List<Order> orders) { for (Order order : orders) { processOrder(order); order.markProcessed(); } orderRepository.saveAll(orders); }}Streaming Mode: The Game Changer
Here’s where DataflowJob shines. With streaming enabled, ElasticJob loops:
┌─────────────────────────────────────────────────────────────┐│ STREAMING MODE ENABLED │├─────────────────────────────────────────────────────────────┤│ ││ Cron fires ──┐ ││ │ ││ ▼ ││ ┌────────────────┐ ││ │ fetchData() │ ││ └───────┬────────┘ ││ │ ││ ▼ ││ ┌────────────────┐ ┌──────────────────┐ ││ │ Returns items? │─YES─►│ processData() │ ││ └───────┬────────┘ └────────┬─────────┘ ││ │ │ ││ NO │ ││ │ │ ││ ▼ ▼ ││ ┌────────────────┐ ┌──────────────────┐ ││ │ Job ends │◄─────│ Loop back to │ ││ └────────────────┘ │ fetchData() │ ││ └──────────────────┘ ││ │└─────────────────────────────────────────────────────────────┘Enable streaming mode in your job configuration:
JobConfiguration jobConfig = JobConfiguration.newBuilder("OrderDataflowJob", 3) .cron("0/5 * * * * ?") .setProperty("streaming.process", "true") // Enable streaming .build();When streaming is enabled:
fetchData()is called- If it returns items,
processData()handles them fetchData()is called again- Repeat until
fetchData()returnsnullor empty list - Job completes
This means you can process all pending orders in one job execution, not just one batch.
When I Made the Wrong Choice
I once built a queue consumer using SimpleJob:
public class QueueConsumerJob implements SimpleJob {
@Override public void execute(ShardingContext context) { // Process messages from SQS while (true) { List<Message> messages = sqsClient.receiveMessages(10); if (messages.isEmpty()) { break; // Queue empty, stop } for (Message msg : messages) { processMessage(msg); } } }}This worked, but had problems:
- No sharding awareness: The while loop ran on every instance, causing duplicate processing
- No graceful shutdown: Long-running while loops don’t respond well to job stops
- Mixed concerns: Fetching and processing tangled together
Rewriting with DataflowJob:
public class QueueConsumerJob implements DataflowJob<Message> {
@Override public List<Message> fetchData(ShardingContext context) { int shardId = context.getShardingItem(); // Only fetch messages for this shard return sqsClient.receiveMessagesForShard(shardId, 10); }
@Override public void processData(ShardingContext context, List<Message> messages) { for (Message msg : messages) { processMessage(msg); } }}With streaming mode, this processes messages until the queue is empty, then stops gracefully.
ShardingContext: Both Types Get It
Both SimpleJob and DataflowJob receive ShardingContext. This gives you:
public void execute(ShardingContext context) { // Which shard am I responsible for? int shardId = context.getShardingItem();
// Total shards in the job int totalShards = context.getShardingTotalCount();
// Job name String jobName = context.getJobName();
// Use shardId to partition your data List<Order> orders = orderRepository.findByShard(shardId);}This enables parallel processing across multiple instances. If you configure 3 shards and have 3 instances, each handles one shard.
Quick Reference: Which to Choose
| Scenario | Job Type | Why |
|---|---|---|
| Daily report generation | SimpleJob | One-shot task |
| Cleanup expired sessions | SimpleJob | One-shot task |
| Process pending orders | DataflowJob | Batch processing |
| Queue consumer | DataflowJob | Stream until empty |
| ETL pipeline | DataflowJob | Fetch-process pattern |
| Send birthday emails | SimpleJob | Known set of users |
| Sync external data | DataflowJob | Might need multiple batches |
The Mental Model
Think of it this way:
- SimpleJob = “Do this thing now”
- DataflowJob = “Process all items until done”
If your job naturally fits “do X every hour” with a clear endpoint, use SimpleJob.
If your job is “process pending items” and the count varies, use DataflowJob with streaming.
Configuration Differences
Both use similar configuration, but DataflowJob has the streaming option:
// SimpleJob configurationJobConfiguration simpleConfig = JobConfiguration.newBuilder("MySimpleJob", 3) .cron("0 0 2 * * ?") // Daily at 2 AM .build();// DataflowJob configurationJobConfiguration dataflowConfig = JobConfiguration.newBuilder("MyDataflowJob", 3) .cron("0/10 * * * * ?") // Every 10 seconds .setProperty("streaming.process", "true") .build();The 3 in newBuilder is the shard count—how many parallel instances can run.
Common Mistakes to Avoid
Mistake 1: Using SimpleJob with a while loop
// DON'T do thispublic void execute(ShardingContext context) { while (hasMoreData()) { processData(); }}This blocks the job thread and doesn’t respect sharding. Use DataflowJob with streaming instead.
Mistake 2: Not enabling streaming for DataflowJob
Without streaming, DataflowJob only calls fetchData() once per trigger. You lose the “process until empty” behavior.
Mistake 3: fetchData() returning too many items
// DON'T return huge listspublic List<Order> fetchData(ShardingContext context) { return orderRepository.findAll(); // Could be millions!}Return a reasonable batch size (100-1000 typically). Streaming mode will call fetchData() again if needed.
Summary
Choose SimpleJob when your job is a self-contained operation that runs once per trigger. Choose DataflowJob when you need to fetch and process items in batches, especially with streaming mode for continuous processing.
The DataflowJob pattern—separating fetch from process—makes your batch jobs more resilient, shard-aware, and easier to reason about. When in doubt, ask: “Do I need to keep going until there’s nothing left to process?” If yes, DataflowJob is your answer.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments