jobber

module

v0.0.0-...-d24e03e Latest Latest Go to latest Published: Oct 30, 2022 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nmjmdr/jobber

Links

Open Source Insights

README ¶

Jobber

Jobber is a job queue service implemented using GO and Redis. Its scalable and resilient. Each component can be horizontally scalled. It can recover jobs from workers who fail to execute a job and assign them to other workers.

Version 2.0 of Walrus

Jobber is the version 2 of Walrus project (https://github.com/nmjmdr/walrus)

Jobber simplifies the design and improves the project structure. It does not have built in scheduler like Walrus (A scheduler is used to schedule jobs for execution - execute job at a point intime). It implements dispatcher, worker and recoverer.

Jobber can be easily extended to perform the functionality of Walrus by adding the scheduler component to it.

Job Type

Jobber has the concept of a Job Type. Each job type gets its own worker queue. One can have a number of worker instances running to execute job depending upon the load on the worker queue. A future enhancment would be be auto-scale the number of workers depending upon the jobs that are queued in the worker queue.

Recovering a job

Jobber supports recovering a job. If a worker fails during the execution of a job, the job can be recovered and executed by another worker. Recovery is supported using the concept of Visiblity timeout

Visiblity timeout

When a worker picks up a job to execute it has fixed amount of time in which to complete it. Within this time window the job is not visible to other workers. This time period is called as Visiblity timeout If the worker fails to do so, then a recoverer process recovers the job and pushes it back onto the worker queue.

Jobber API

The API supports operation to queue a job: Request

POST: https://<host>/jobber/queue
Body:
{
    "type": "job-type"
    "payload": { 
        /* free-form json payload */
    }
}

Response

201 OK
{
    "job-id": "uuid"
}

Design details

I have discussed the design details in this 4-part video series

Design

Sequence

The API accepts a new job and queues it a queue named job_queue_job-type.
An instance of the worker is ready to take a new job, it tries to acquire lock on the job (job-id). The lock is set to auto expire after visibility_timeout time period.
If successful, it then does RPOPLPUSH, poping the job from the worker queue and pushing it onto in_process_queue
Worker then works to finish the job and then deletes it from in_process_queue
If in case worker is unable to finish the job and delete it from in_process_queue, then the recoverer process pushes it back onto the worker queue

How does the Recoverer recovers jobs?

Recoverer regularly scans the in_process_queue for jobs that do not have an active lock
If it finds a job present in the in_process_queue but without an active lock, it then pops that job and pushes it back onto the worker queue

Steps followed by the worker

The worker follows the below steps:

Read the head of the queue
Try and lock the job
If it cant, return and go back to waiting for the next job
If the job is locked, then the worker pops the job from the worker queue and pushes it to in_process_queue (It does this using RPOPLPUSH so that the push and pop operations are done in a single step)
Meanwhile if the recoverer tries to recover a job, it finds that there is an active lock on the job and it returns
Process the job
Delete from in_process_queue
Delete the lock

If the worker fails in processing the job, the job remains in in_process_queue and the locks expires. The recoverer can then recover the job.

Currently the recoverer attempts to recover only the job at the head of the queue. It does not look further down the queue. This should not be a problem as along the visibility timeouts are small and it is not highly critical to recover the jobs relatively early.

Implementation of Visibility timeout

Visiblity time out is implemented using SETNX with expiry. SETNX sets a key only if it does not exist. Lock attempts to create a new key using SETNX for the given job id. If it can create it then the a lock has successfully placed on the job. The key is set to expire within the visibility time out period.

Note that currently I have not used Redlock mechanism https://redis.io/topics/distlock and only done a SETNX without a random value. The drawback of this is that in a master slave setup of redis, if the master goes down, then there is a chance that a valid lock could be removed by another process (in our case the recoverer). Currently this is not handled and the lock can be easily enhanced to handle it.

Future enhancements

Making the recovere look through the in_proceses_queue to recover jobs

Directories ¶

Path	Synopsis
common
constants
models
redisqueue
redisqueue/mock_redisqueue Package mock_redisqueue is a generated GoMock package.	Package mock_redisqueue is a generated GoMock package.
dispatcher
mock_dispatcher Package mock_dispatcher is a generated GoMock package.	Package mock_dispatcher is a generated GoMock package.
dlock
mock_dlock Package mock_dlock is a generated GoMock package.	Package mock_dlock is a generated GoMock package.
hosts
api
recoverer
sampleworker
recoverer
worker

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL