- Published on
Lock Your Cronjobs, Enjoy Your Sleep
- Authors

- Name
- Kevin van Zonneveld
- @kvz
If you use EC2 you may have heard of Tim Kay's aws commandline tool. It provides access to most of Amazon's API and is less cumbersome than Amazon's own CLI utilities in day to day use.
A lesser known tool by Tim Kay is solo. It's basically one line of Perl, but it's incredibly useful to defeat a common problem with cronjobs: overlap.
The Problem
You've probably dealt with this before; you write a pretty neat yourscript.sh,
schedule it to run every or so minute on your production server.
One night, your server reaches a load of 90 and you get pagerdutied to fix this.
You login, which takes about 15 minutes, succeed executing ps auxf,
and it appears your server now has 8325 instances of yourscript.sh running.
What happened?!
Maybe there was an infinite loop in your script, maybe there were NFS timeouts,
you tried to update a database that had write-locks during backup,
but whatever the cause; there was overlap 8324 times, and this should never happen.
Not even once.
The Solution
One way to defeat it, is to write perfect code and have 0 external dependencies that can increase your script's execution time.
But since that is never going to happen ; ) I recommend taking a look at solo.
Tim Kay realised that operating systems typically can only ever have 1 process listening on a port, and makes clever use of that as a locking mechanism.
The Flow
/usr/bin/solo -port=3000 /path/to/yourscript.sh- Solo tries to open port 3000
- Can it open port 3000?
- Start
yourscript.sh
- Start
- Can't open port 3000?
- Never mind,
yourscript.shis probably still running, will try again next time
- Never mind,
Naturally this beats working with lock/PID files, because an open port is directly tied to a running process, and chances of inconsistency and having to detect and cleanup orphaned PID files, are zero.
The Example
Your crontab could e.g. look like this:
$ crontab -e
* * * * * /usr/bin/solo -port=3001 /path/to/yourscript1.sh
* * * * * /usr/bin/solo -port=3002 /path/to/yourscript2.sh
*/10 * * * * /usr/bin/solo -port=3003 /path/to/yourscript3.sh
You can now be sure that only one instance of each script will run at any given time.
Clever chaps may realise this can be used as a keepalive system for daemon-like scripts. However I suggest looking into monit or upstart for that.
The Installation
This is what makes solo great, it has basically 0 dependencies (ok Perl, but I'll assume you have that) and is a breeze to deploy.
$ sudo curl -q https://raw.github.com/timkay/solo/master/solo -o /usr/bin/solo \
&& sudo chmod a+x $_
Happy crontab -eing, and happy dreams with few pagerduties, you've earned it :)
The Alternative
As Jason mentioned, if your jobs don't need to necessarily finish running, but just restart without overlap, another option is to use timeout:
$ crontab -e
* * * * * timeout -s9 50s /path/to/yourscript1.sh
The Next Level: Go Distributed With Cronlock
Ok so solo is the bomb in terms of simplicity for running 1 instance of 1 script on 1 server.
But what if you want to make sure only 1 instance of 1 script can run throughout many servers? Install the cronjobs on just one server? Hm.. what if it goes down. That means someone will have to intervene, chances are they will forget, and your nodes aren't really expendable.
Especially in volatile environments where nodes come & go as they please, you want cronjobs to be the responsibility of the collective, not just 1 machine.
For this purpose I wrote Cronlock.
The Good
- You can deploy all nodes equally, install all cronjobs on all servers, if a node goes down, another will make sure your jobs are executed
The Bad
- It relies on a central redis server. If your cluster already relies on redis, you're not adding reliability or a SPOF. If your cluster doesn't, reconsider using Cronlock.
The Ugly
- I use straight up Bash for everything. I don't even use
redis-clito communicate with Redis. This is because I want deployment to be as easy as with solo. Just a
$ sudo curl -q https://raw.github.com/kvz/cronlock/master/cronlock -o /usr/bin/cronlock \
&& sudo chmod a+x $_
and you're set.
You can visit Cronlock on Github for docs on how configure it.
Legacy Comments (16)
These comments were imported from the previous blog system (Disqus).
Hallo,
Ik heb problemen met mijn Synology NAS, hij is super traag. Kun jij ij helpen?
Vriendelijke groet,
Hans
Beste even bij synology support (forum) rondvragen.
If your jobs don't need to necessarily finish running, but just restart without overlap, another option is "timeout".
Eg. */1 timeout 59 /usr/bin/php yourCronFile.php
Thanks Jason, I've updated the post!
What's wrong with using flock? Just do wrap your script.sh in
( flock 9; script.sh ) 9>script.lock
There's no sleep-loop or anything, flock will simply wait until script.lock is not locked by another flock. If you want it to exit if script.sh is already running, simply change that to
( if flock -n 9; script.sh; fi ) 9>script.lock
How would this work in a distributed environment where all nodes are deployed equal as they should be replaceable?
This is a very old problem.
Eventhough cronlock is quite impressive, imho it tries to solve locking & distributed problems from a cron perspective, instead of questioning why cron is not used for distributed stuff.
This adds unnecessary complexity (locking ports etc).
What you'd want is just a jobqueue without overlapping jobs, not cron.
The locking problem can be solved by giving a job a TTL, and letting a worker reject the job when a similar one is being processed in the queue.
If distributed is not needed, than cron + flock will do fine.
You hear about run-one ? http://manpages.ubuntu.com/...
I did not - looks like a great replacement for `solo`, however in a distributed setup you'd still need a global lock.
yes, i tried `solo` in a ubuntu 14.04 but doesn't work (i don't know why, or how to debug :( ) and I was looking for options
how would somebody go about implementing a global lock? would it be a centralized server that holds the lock advisory file?
solo command not found
Well I did something similiar, but i run the cron job via a browser window, so every user is essentiallly running a cron job, but if a cron job is writing it locks the file and no other cron jobs operate, and before i did this it was being corrrupted, but when i did put this in practice it worked like a charm, plus an added bonus of not having to cron jobs on a shared host etc , result
FYI - the location has changed to https://raw.githubuserconte...
Hi guys... My problem is some cronlocks inserts a redis key with incorrect date (NOW+1Day) :|. Has anybody had this issue?