If you use EC2 you may have heard of Tim Kay's aws commandline tool. It provides access to most of Amazon's API and is less cumbersome than Amazon's own CLI utilities in day to day use.
A lesser known tool by Tim Kay is solo. It's basically one line of Perl, but it's incredibly useful to defeat a common problem with cronjobs: overlap.
You've probably dealt with this before; you write a pretty neat
yourscript.sh, schedule it to run every or so minute on your production server. One night, your server reaches a load of 90 and you get pagerdutied to fix this.
You login, which takes about 15 minutes, succeed executng
ps auxf, and it appears your server now has
8325 instances of
yourscript.sh running. What happened?!
Maybe there was an infinite loop in your script, maybe there were NFS timeouts, you tried to update a database that had write-locks during backup, but whatever the cause; there was overlap
8324 times, and this should never happen. Not even once.
One way to defeat it, is to write perfect code and have 0 external dependecies that can increase your script's execution time.
But since that is never going to happen ; ) I recommend taking a look at
Tim Kay realised that operating systems typically can only ever have 1 process listening on a port, and makes clever use of that as a locking mechanism.
/usr/bin/solo -port=3000 /path/to/yourscript.sh
- Solo tries to open port 3000
- Can it open port 3000?
- Can't open port 3000?
- Never mind,
yourscript.shis probably still running, will try again next time
- Never mind,
Naturally this beats working with lock/PID files, because an open port is directly tied to a runnin process, and chances of inconsistency and having to detect and cleanup orphaned PID files, are zero.
Your crontab could e.g. look like this:
$ crontab -e * * * * * /usr/bin/solo -port=3001 /path/to/yourscript1.sh * * * * * /usr/bin/solo -port=3002 /path/to/yourscript2.sh */10 * * * * /usr/bin/solo -port=3003 /path/to/yourscript3.sh
You can now be sure that only one instance of each script will run at any given time.
This is what makes solo great, it has basically 0 dependencies (ok Perl, but I'll assume you have that) and is a breeze to deploy.
$ sudo curl -q https://raw.github.com/timkay/solo/master/solo -o /usr/bin/solo \ && sudo chmod a+x $_
crontab -eing, and happy dreams with few pagerduties, you've earned it :)
As Jason mentioned, if your jobs don't need to necessarily finish running, but just restart without overlap, another option is to use timeout:
$ crontab -e * * * * * timeout -s9 50s /path/to/yourscript1.sh
The Next Level: Go Distributed With Cronlock
Ok so solo is the bomb in terms of simplicity for running 1 instance of 1 script on 1 server.
But what if you want to make sure only 1 instance of 1 script can run throughout many servers? Install the cronjobs on just one server? Hm.. what if it goes down. That means someone will have to intervene, chances are they will forget, and your nodes aren't really expendable.
Especially in volatile environments where nodes come & go as they please, you want cronjobs to be the responsibility of the collective, not just 1 machine.
For this purpose I wrote Cronlock.
- You can deploy all nodes equally, install all cronjobs on all servers, if a node goes down, another will make sure your jobs are executed
- It relies on a central redis server. If your cluster already relies on redis, you're not adding reliability or a SPOF. If your cluster doesn't, reconsider using Cronlock.
- I use straight up Bash for everything. I don't even use
redis-clito communicate with Redis. This is because I want deployment to be as easy as with solo. Just a
$ sudo curl -q https://raw.github.com/kvz/cronlock/master/cronlock -o /usr/bin/cronlock \ && sudo chmod a+x $_
and you're set.
You can visit Cronlock on Github for docs on how configure it.