Skip to content

What is ElasticJob? A Guide to Distributed Job Scheduling with Apache ShardingSphere

Purpose

When I first encountered the need to process millions of records across multiple databases on a schedule, I reached for Quartz - the de facto standard for Java job scheduling. But as our workload grew, I hit a wall: a single scheduler became a bottleneck, and worse, a single point of failure.

That’s when I discovered ElasticJob - a distributed job scheduling framework from Apache ShardingSphere that solves exactly these problems. In this post, I’ll explain what ElasticJob is, why you might need it, and how to get started.

The Problem with Single-Node Schedulers

Traditional job schedulers like Quartz work great for single-node applications. But when you need to:

  • Process data across multiple databases in parallel
  • Handle millions of records that won’t fit in one job execution
  • Ensure jobs continue running even if a server fails
  • Scale horizontally by adding more processing nodes

…a single-node scheduler becomes the bottleneck. You could deploy multiple instances, but then you’d need to build your own coordination logic to prevent duplicate job execution.

What ElasticJob Provides

ElasticJob is a distributed scheduling solution that handles the hard problems for you:

ElasticJob Architecture Overview
+------------------+ +------------------+
| Job Node 1 | | Job Node 2 |
| (Shards 0, 1) | | (Shards 2, 3) |
+--------+---------+ +--------+---------+
| |
+----------+-------------+
|
v
+----------------------+
| ZooKeeper |
| (Coordination Hub) |
+----------------------+
|
+----------+-------------+
| |
+--------v---------+ +--------v---------+
| Job Node 3 | | Job Node 4 |
| (Standby) | | (Shards 4, 5) |
+------------------+ +------------------+

Key Features

1. Sharded Execution

Jobs are split into N shards and distributed automatically across cluster nodes. Each node processes its assigned shards independently.

2. Automatic Failover

If a node fails, its shards are automatically reassigned to surviving nodes. No manual intervention required.

3. Dynamic Scaling

When you add or remove nodes from the cluster, shards redistribute automatically. No restart needed.

4. Multiple Job Types

  • Java Jobs: Write your job as a Java class
  • Script Jobs: Run shell scripts, Python, or any executable
  • HTTP Jobs: Call REST endpoints on a schedule

How It Works

ElasticJob uses ZooKeeper as the coordination layer. Here’s what happens when a job runs:

Job Execution Flow
1. Job Node registers with ZooKeeper
2. ZooKeeper assigns shard(s) to the node
3. At scheduled time:
- Node executes its assigned shard(s)
- Reports status to ZooKeeper
4. If node fails:
- ZooKeeper detects the failure
- Reassigns orphaned shards to healthy nodes

This design means you don’t need to build any coordination logic - ElasticJob handles it all through ZooKeeper.

Getting Started

Add the Dependency

pom.xml
<dependency>
<groupId>org.apache.shardingsphere.elasticjob</groupId>
<artifactId>elasticjob-bootstrap</artifactId>
<version>3.0.5</version>
</dependency>

Create Your Job

Implement the SimpleJob interface:

MyDataJob.java
public class MyDataJob implements SimpleJob {
@Override
public void execute(ShardingContext context) {
// Get the shard assigned to this node
int shardId = context.getShardingItem();
int totalShards = context.getShardingTotalCount();
// Process data for this shard only
List<Data> data = fetchDataForShard(shardId, totalShards);
process(data);
log.info("Processed {} records for shard {}",
data.size(), shardId);
}
private List<Data> fetchDataForShard(int shardId, int totalShards) {
// Example: shard by ID using modulo
return repository.findByIdModulo(shardId, totalShards);
}
}

The key insight: your job logic only needs to handle ONE shard. ElasticJob ensures all shards get processed across the cluster.

Configure the Job

JobConfiguration.java
@Configuration
public class JobConfiguration {
@Bean
public CoordinatorRegistryCenter registryCenter() {
return new ZookeeperRegistryCenter(
new ZookeeperConfiguration("localhost:2181", "elastic-job")
);
}
@Bean
public JobScheduler myDataJobScheduler(
CoordinatorRegistryCenter registryCenter) {
JobConfiguration jobConfig = JobConfiguration.newBuilder(
new MyDataJob().getClass().getName(),
4 // Total number of shards
)
.cron("0 0 2 * * ?") // Run at 2 AM daily
.shardingItemParameters("0=beijing,1=shanghai,2=guangzhou,3=shenzhen")
.build();
return new JobScheduler(registryCenter, new MyDataJob(), jobConfig);
}
}

When to Use ElasticJob

I found ElasticJob particularly useful for:

Use CaseWhy It Works
Batch data processing across multiple databasesShards can target different databases
Distributed log processingScale horizontally as log volume grows
Scheduled report generation across regionsEach shard handles one region
Data synchronization jobsFailover ensures no data is skipped

ElasticJob vs Other Solutions

Here’s how I compare it with alternatives:

Comparison Matrix
+----------------+------------+---------------+------------------+-----------+
| Feature | ElasticJob | Quartz | Spring Batch | XXL-Job |
+----------------+------------+---------------+------------------+-----------+
| Distributed | Yes | No (cluster) | Yes (with partitioning) | Yes |
| Failover | Automatic | Manual | Partial | Automatic |
| Sharding | Built-in | No | Manual config | Built-in |
| Coordination | ZooKeeper | Database | None | Database |
| Learning Curve | Medium | Low | High | Medium |
+----------------+------------+---------------+------------------+-----------+

Common Pitfalls

ZooKeeper dependency: ElasticJob requires ZooKeeper for coordination. Make sure you have a reliable ZooKeeper cluster running.

Shard count: Choose shard count carefully. Too few = underutilized cluster; too many = coordination overhead. I typically start with shard count = number of nodes * 2.

Idempotent jobs: Since jobs can be retried after failures, your job logic should be idempotent - running the same shard twice shouldn’t cause data corruption.

Summary

ElasticJob simplifies distributed job scheduling by handling shard distribution, failover, and coordination through ZooKeeper. It’s ideal for Java applications that need scalable, fault-tolerant scheduled jobs without building custom coordination logic.

If you’re struggling with single-node scheduler limitations or building your own distributed job coordination, give ElasticJob a try. The initial setup cost (ZooKeeper) pays off quickly when you need horizontal scaling and automatic failover.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments