If you use EC2 you may have heard of Tim Kay's aws commandline tool. It provides access to most of Amazon's API and is less cumbersome than Amazon's own CLI utilities in day to day use.
A lesser known tool by Tim Kay is solo. It's basically one line of Perl, but it's incredibly useful to defeat a common problem with cronjobs: overlap.
The Problem
You've probably dealt with this before; you write a pretty neat yourscript.sh
, schedule it to run every or so minute on your production server. One night, your server reaches a load of 90 and you get pagerdutied to fix this.
You login, which takes about 15 minutes, succeed executng ps auxf
, and it appears your server now has 8325
instances of yourscript.sh
running. What happened?!
Maybe there was an infinite loop in your script, maybe there were NFS timeouts, you tried to update a database that had write-locks during backup, but whatever the cause; there was overlap 8324
times, and this should never happen. Not even once.
The Solution
One way to defeat it, is to write perfect code and have 0 external dependecies that can increase your script's execution time.
But since that is never going to happen ; ) I recommend taking a look at solo
.
Tim Kay realised that operating systems typically can only ever have 1 process listening on a port, and makes clever use of that as a locking mechanism.
The Flow
/usr/bin/solo -port=3000 /path/to/yourscript.sh
- Solo tries to open port 3000
- Can it open port 3000?
- Start
yourscript.sh
- Start
- Can't open port 3000?
- Never mind,
yourscript.sh
is probably still running, will try again next time
- Never mind,
Naturally this beats working with lock/PID files, because an open port is directly tied to a runnin process, and chances of inconsistency and having to detect and cleanup orphaned PID files, are zero.
The Example
Your crontab could e.g. look like this:
$ crontab -e
* * * * * /usr/bin/solo -port=3001 /path/to/yourscript1.sh
* * * * * /usr/bin/solo -port=3002 /path/to/yourscript2.sh
*/10 * * * * /usr/bin/solo -port=3003 /path/to/yourscript3.sh
You can now be sure that only one instance of each script will run at any given time.
Clever chaps may realise this can be used as a keepalive system for daemon-like scripts. However I suggest looking into monit or upstart for that.
The Installation
This is what makes solo great, it has basically 0 dependencies (ok Perl, but I'll assume you have that) and is a breeze to deploy.
$ sudo curl -q https://raw.github.com/timkay/solo/master/solo -o /usr/bin/solo \
&& sudo chmod a+x $_
Happy crontab -e
ing, and happy dreams with few pagerduties, you've earned it :)
The Alternative
As Jason mentioned, if your jobs don't need to necessarily finish running, but just restart without overlap, another option is to use timeout:
$ crontab -e
* * * * * timeout -s9 50s /path/to/yourscript1.sh
The Next Level: Go Distributed With Cronlock
Ok so solo is the bomb in terms of simplicity for running 1 instance of 1 script on 1 server.
But what if you want to make sure only 1 instance of 1 script can run throughout many servers? Install the cronjobs on just one server? Hm.. what if it goes down. That means someone will have to intervene, chances are they will forget, and your nodes aren't really expendable.
Especially in volatile environments where nodes come & go as they please, you want cronjobs to be the responsibility of the collective, not just 1 machine.
For this purpose I wrote Cronlock.
The Good
- You can deploy all nodes equally, install all cronjobs on all servers, if a node goes down, another will make sure your jobs are executed
The Bad
- It relies on a central redis server. If your cluster already relies on redis, you're not adding reliability or a SPOF. If your cluster doesn't, reconsider using Cronlock.
The Ugly
- I use straight up Bash for everything. I don't even use
redis-cli
to communicate with Redis. This is because I want deployment to be as easy as with solo. Just a
$ sudo curl -q https://raw.github.com/kvz/cronlock/master/cronlock -o /usr/bin/cronlock \
&& sudo chmod a+x $_
and you're set.
You can visit Cronlock on Github for docs on how configure it.