What is ElasticJob? A Guide to Distributed Job Scheduling with Apache ShardingSphere
Purpose
When I first encountered the need to process millions of records across multiple databases on a schedule, I reached for Quartz - the de facto standard for Java job scheduling. But as our workload grew, I hit a wall: a single scheduler became a bottleneck, and worse, a single point of failure.
That’s when I discovered ElasticJob - a distributed job scheduling framework from Apache ShardingSphere that solves exactly these problems. In this post, I’ll explain what ElasticJob is, why you might need it, and how to get started.
The Problem with Single-Node Schedulers
Traditional job schedulers like Quartz work great for single-node applications. But when you need to:
- Process data across multiple databases in parallel
- Handle millions of records that won’t fit in one job execution
- Ensure jobs continue running even if a server fails
- Scale horizontally by adding more processing nodes
…a single-node scheduler becomes the bottleneck. You could deploy multiple instances, but then you’d need to build your own coordination logic to prevent duplicate job execution.
What ElasticJob Provides
ElasticJob is a distributed scheduling solution that handles the hard problems for you:
+------------------+ +------------------+| Job Node 1 | | Job Node 2 || (Shards 0, 1) | | (Shards 2, 3) |+--------+---------+ +--------+---------+ | | +----------+-------------+ | v +----------------------+ | ZooKeeper | | (Coordination Hub) | +----------------------+ | +----------+-------------+ | |+--------v---------+ +--------v---------+| Job Node 3 | | Job Node 4 || (Standby) | | (Shards 4, 5) |+------------------+ +------------------+Key Features
1. Sharded Execution
Jobs are split into N shards and distributed automatically across cluster nodes. Each node processes its assigned shards independently.
2. Automatic Failover
If a node fails, its shards are automatically reassigned to surviving nodes. No manual intervention required.
3. Dynamic Scaling
When you add or remove nodes from the cluster, shards redistribute automatically. No restart needed.
4. Multiple Job Types
- Java Jobs: Write your job as a Java class
- Script Jobs: Run shell scripts, Python, or any executable
- HTTP Jobs: Call REST endpoints on a schedule
How It Works
ElasticJob uses ZooKeeper as the coordination layer. Here’s what happens when a job runs:
1. Job Node registers with ZooKeeper2. ZooKeeper assigns shard(s) to the node3. At scheduled time: - Node executes its assigned shard(s) - Reports status to ZooKeeper4. If node fails: - ZooKeeper detects the failure - Reassigns orphaned shards to healthy nodesThis design means you don’t need to build any coordination logic - ElasticJob handles it all through ZooKeeper.
Getting Started
Add the Dependency
<dependency> <groupId>org.apache.shardingsphere.elasticjob</groupId> <artifactId>elasticjob-bootstrap</artifactId> <version>3.0.5</version></dependency>Create Your Job
Implement the SimpleJob interface:
public class MyDataJob implements SimpleJob {
@Override public void execute(ShardingContext context) { // Get the shard assigned to this node int shardId = context.getShardingItem(); int totalShards = context.getShardingTotalCount();
// Process data for this shard only List<Data> data = fetchDataForShard(shardId, totalShards); process(data);
log.info("Processed {} records for shard {}", data.size(), shardId); }
private List<Data> fetchDataForShard(int shardId, int totalShards) { // Example: shard by ID using modulo return repository.findByIdModulo(shardId, totalShards); }}The key insight: your job logic only needs to handle ONE shard. ElasticJob ensures all shards get processed across the cluster.
Configure the Job
@Configurationpublic class JobConfiguration {
@Bean public CoordinatorRegistryCenter registryCenter() { return new ZookeeperRegistryCenter( new ZookeeperConfiguration("localhost:2181", "elastic-job") ); }
@Bean public JobScheduler myDataJobScheduler( CoordinatorRegistryCenter registryCenter) {
JobConfiguration jobConfig = JobConfiguration.newBuilder( new MyDataJob().getClass().getName(), 4 // Total number of shards ) .cron("0 0 2 * * ?") // Run at 2 AM daily .shardingItemParameters("0=beijing,1=shanghai,2=guangzhou,3=shenzhen") .build();
return new JobScheduler(registryCenter, new MyDataJob(), jobConfig); }}When to Use ElasticJob
I found ElasticJob particularly useful for:
| Use Case | Why It Works |
|---|---|
| Batch data processing across multiple databases | Shards can target different databases |
| Distributed log processing | Scale horizontally as log volume grows |
| Scheduled report generation across regions | Each shard handles one region |
| Data synchronization jobs | Failover ensures no data is skipped |
ElasticJob vs Other Solutions
Here’s how I compare it with alternatives:
+----------------+------------+---------------+------------------+-----------+| Feature | ElasticJob | Quartz | Spring Batch | XXL-Job |+----------------+------------+---------------+------------------+-----------+| Distributed | Yes | No (cluster) | Yes (with partitioning) | Yes || Failover | Automatic | Manual | Partial | Automatic || Sharding | Built-in | No | Manual config | Built-in || Coordination | ZooKeeper | Database | None | Database || Learning Curve | Medium | Low | High | Medium |+----------------+------------+---------------+------------------+-----------+Common Pitfalls
ZooKeeper dependency: ElasticJob requires ZooKeeper for coordination. Make sure you have a reliable ZooKeeper cluster running.
Shard count: Choose shard count carefully. Too few = underutilized cluster; too many = coordination overhead. I typically start with shard count = number of nodes * 2.
Idempotent jobs: Since jobs can be retried after failures, your job logic should be idempotent - running the same shard twice shouldn’t cause data corruption.
Summary
ElasticJob simplifies distributed job scheduling by handling shard distribution, failover, and coordination through ZooKeeper. It’s ideal for Java applications that need scalable, fault-tolerant scheduled jobs without building custom coordination logic.
If you’re struggling with single-node scheduler limitations or building your own distributed job coordination, give ElasticJob a try. The initial setup cost (ZooKeeper) pays off quickly when you need horizontal scaling and automatic failover.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments