kvz.io
Published on

Lock Your Cronjobs, Enjoy Your Sleep

Authors
  • avatar
    Name
    Kevin van Zonneveld
    Twitter
    @kvz

If you use EC2 you may have heard of Tim Kay's aws commandline tool. It provides access to most of Amazon's API and is less cumbersome than Amazon's own CLI utilities in day to day use.

A lesser known tool by Tim Kay is solo. It's basically one line of Perl, but it's incredibly useful to defeat a common problem with cronjobs: overlap.

The Problem

You've probably dealt with this before; you write a pretty neat yourscript.sh, schedule it to run every or so minute on your production server. One night, your server reaches a load of 90 and you get pagerdutied to fix this.

You login, which takes about 15 minutes, succeed executing ps auxf, and it appears your server now has 8325 instances of yourscript.sh running. What happened?!

Maybe there was an infinite loop in your script, maybe there were NFS timeouts, you tried to update a database that had write-locks during backup, but whatever the cause; there was overlap 8324 times, and this should never happen. Not even once.

The Solution

One way to defeat it, is to write perfect code and have 0 external dependencies that can increase your script's execution time.

But since that is never going to happen ; ) I recommend taking a look at solo.

Tim Kay realised that operating systems typically can only ever have 1 process listening on a port, and makes clever use of that as a locking mechanism.

The Flow

  • /usr/bin/solo -port=3000 /path/to/yourscript.sh
  • Solo tries to open port 3000
  • Can it open port 3000?
    • Start yourscript.sh
  • Can't open port 3000?
    • Never mind, yourscript.sh is probably still running, will try again next time

Naturally this beats working with lock/PID files, because an open port is directly tied to a running process, and chances of inconsistency and having to detect and cleanup orphaned PID files, are zero.

The Example

Your crontab could e.g. look like this:

$ crontab -e
*    * * * * /usr/bin/solo -port=3001 /path/to/yourscript1.sh
*    * * * * /usr/bin/solo -port=3002 /path/to/yourscript2.sh
*/10 * * * * /usr/bin/solo -port=3003 /path/to/yourscript3.sh

You can now be sure that only one instance of each script will run at any given time.

Clever chaps may realise this can be used as a keepalive system for daemon-like scripts. However I suggest looking into monit or upstart for that.

The Installation

This is what makes solo great, it has basically 0 dependencies (ok Perl, but I'll assume you have that) and is a breeze to deploy.

$ sudo curl -q https://raw.github.com/timkay/solo/master/solo -o /usr/bin/solo \
  && sudo chmod a+x $_

Happy crontab -eing, and happy dreams with few pagerduties, you've earned it :)

The Alternative

As Jason mentioned, if your jobs don't need to necessarily finish running, but just restart without overlap, another option is to use timeout:

$ crontab -e
* * * * * timeout -s9 50s /path/to/yourscript1.sh

The Next Level: Go Distributed With Cronlock

Ok so solo is the bomb in terms of simplicity for running 1 instance of 1 script on 1 server.

But what if you want to make sure only 1 instance of 1 script can run throughout many servers? Install the cronjobs on just one server? Hm.. what if it goes down. That means someone will have to intervene, chances are they will forget, and your nodes aren't really expendable.

Especially in volatile environments where nodes come & go as they please, you want cronjobs to be the responsibility of the collective, not just 1 machine.

For this purpose I wrote Cronlock.

The Good

  • You can deploy all nodes equally, install all cronjobs on all servers, if a node goes down, another will make sure your jobs are executed

The Bad

  • It relies on a central redis server. If your cluster already relies on redis, you're not adding reliability or a SPOF. If your cluster doesn't, reconsider using Cronlock.

The Ugly

  • I use straight up Bash for everything. I don't even use redis-cli to communicate with Redis. This is because I want deployment to be as easy as with solo. Just a
$ sudo curl -q https://raw.github.com/kvz/cronlock/master/cronlock -o /usr/bin/cronlock \
  && sudo chmod a+x $_

and you're set.

You can visit Cronlock on Github for docs on how configure it.

Legacy Comments (16)

These comments were imported from the previous blog system (Disqus).

Hans
Hans·

Hallo,

Ik heb problemen met mijn Synology NAS, hij is super traag. Kun jij ij helpen?

Vriendelijke groet,

Hans

Kev van Zonneveld
Kev van Zonneveld·

Beste even bij synology support (forum) rondvragen.

Jason
Jason·

If your jobs don't need to necessarily finish running, but just restart without overlap, another option is "timeout".

Eg. */1 timeout 59 /usr/bin/php yourCronFile.php

Kev van Zonneveld
Kev van Zonneveld·

Thanks Jason, I've updated the post!

flock
flock·

What's wrong with using flock? Just do wrap your script.sh in

( flock 9; script.sh ) 9>script.lock

There's no sleep-loop or anything, flock will simply wait until script.lock is not locked by another flock. If you want it to exit if script.sh is already running, simply change that to

( if flock -n 9; script.sh; fi ) 9>script.lock

Kev van Zonneveld
Kev van Zonneveld·

How would this work in a distributed environment where all nodes are deployed equal as they should be replaceable?

coderofsalvation
coderofsalvation·

This is a very old problem.
Eventhough cronlock is quite impressive, imho it tries to solve locking & distributed problems from a cron perspective, instead of questioning why cron is not used for distributed stuff.
This adds unnecessary complexity (locking ports etc).

What you'd want is just a jobqueue without overlapping jobs, not cron.

The locking problem can be solved by giving a job a TTL, and letting a worker reject the job when a similar one is being processed in the queue.

If distributed is not needed, than cron + flock will do fine.

j0anj0an
j0anj0an·

You hear about run-one ? http://manpages.ubuntu.com/...

Kev van Zonneveld
Kev van Zonneveld··1 like

I did not - looks like a great replacement for `solo`, however in a distributed setup you'd still need a global lock.

j0anj0an
j0anj0an·

yes, i tried `solo` in a ubuntu 14.04 but doesn't work (i don't know why, or how to debug :( ) and I was looking for options

Ere-Philip
Ere-Philip··1 like

how would somebody go about implementing a global lock? would it be a centralized server that holds the lock advisory file?

jay
jay·

solo command not found

Ere-Philip
Ere-Philip··2 likes

Well I did something similiar, but i run the cron job via a browser window, so every user is essentiallly running a cron job, but if a cron job is writing it locks the file and no other cron jobs operate, and before i did this it was being corrrupted, but when i did put this in practice it worked like a charm, plus an added bonus of not having to cron jobs on a shared host etc , result

Evie
Evie··1 like

FYI - the location has changed to https://raw.githubuserconte...

Federico Aguirre
Federico Aguirre··73 likes

Hi guys... My problem is some cronlocks inserts a redis key with incorrect date (NOW+1Day) :|. Has anybody had this issue?