"Simplicity is prerequisite for reliability."

Edsger W. Dijkstra

As our experience grows, we learn from past mistakes and discover what's truely important in reliable systems. When designing systems, simplicity is an often heard mantra, but it isn't getting applied nearly as much as spoken off. I'm guilty of this too. I think it's mainly because engineers love to, well, engineer :) and will naturally try to outsmart problems by throwing more tech at it.

Article vs Article

In the light of this, I revisit my 2008 article Enhance PHP session management. The article explains how you can use a central memcache server to store sessions for performance & scalability purposes.

Having a shared something when you can avoid it is asking for problems, and I was just throwing unneeded tech at this: network protocols, pecl modules, configuration. All vulnerable to bugs, maintenance, performance penalties and outage.

Using 2007 article Create turbocharged storage using tmpfs, we can defeat some of this over-engineering and take a simpler approach to speeding up sessions in PHP. We'll store them decentralized in memory by mounting RAM onto the existing /var/lib/php5 session directories throughout your application servers, which I will call nodes from now on.

Make Session Dir Live in RAM

Add this to your /etc/fstab:

$ # Make PHP Sessions live in RAM
$ tmpfs /var/lib/php5 tmpfs size=300M,atime 0 0

This will make sure the 300MB RAM device will be available on your next reboot as well.

300MB is a lot.

You can decrease it later on by changing the /etc/fstab entry and

executing mount -o remount /var/lib/php5

Activate & Migrate Existing Sessions

Then execute:

$ # Create a temporary place for current sessions
$ mkdir -p /tmp/phpsessions/

$ # Move current sessions to it
$ mv /var/lib/php5/* /tmp/phpsessions/

$ # Activate our ramdisk
$ mount -a

$ # Move the current sessions back
$ mv /tmp/phpsessions/* /var/lib/php5/

$ # Remove the temporary placeholder
$ rmdir /tmp/phpsessions

Advantages

What's nice about saving sessions in a tmpfs device compared with saving in memcache is:

  • you can migrate to this solution without logging people out :)
  • nothing needs to be installed
  • instead of throwing errors, it degrades gracefully as disk storage if implementation fails
  • you can restart/flush/upgrade any existing memcache instances without people losing sessions
  • it uses the default /var/lib/php5 directory, so no .ini changes, and PHP's garbage collector will still purge old sessions
  • it takes away a bottleneck & single point of failure in your architecture
  • it's just a mountpoint, so existing monitoring tools will automatically trigger alerts when you need to allocate more space
  • no locking issues with ajax calls (though I believe fixed in memcached-3.0.4beta)
  • no protocol overhead
  • less tech, so less prone to errors & bugs, easier upgrade process

Decentralizing

Now this doesn't work in clusters without Sticky Sessions. But you've got to ask yourself: in huge clusters, do you really want Shared Sessions? The bigger the cluster, the more vulerable you'll become as it really only adds a bottle-neck & single point of failure to your architecture.

With decent loadbalancers like EC2's ELB, Pound, HAProxy it becomes childsplay to implement Sticky Sessions so that people keep ending up on the node that has their session.

When you're designing to tolerate failure, this architecture is much more robust than depending on anything shared.

Yes, some people will be logged out when you shut down a node (vs all when your session store goes down).

To counter you could:

  • drain a node's connections before you take it into planned maintenance, this way nobody is affected
  • rsync sessions between nodes if it's crucial that all sessions survive outage.

This could even be automated where nodes can cover for eachother. If it's worth the investment depends on your application. Are your nodes likely to go down completely? How many customers will get logged out? What kind of data is lost?

Even if your session store is clustered and uses persistent storage like Redis or MySQL (not the right tool for the job people): network outage, maintenance and misconfiguration can hurt you badly, logging out all customers or worse, throwing errors throughout your platform.

Problems will be bigger and harder to solve.

Whereas if the RAM mountpoint fails, /var/lib/php5 just degrades gracefully as normal disk-based storage. Making sessions slower on that 1 node, but at you'll still be serving customers.

I welcome your thoughts on this!