- Published on
Survive Heavy Traffic With Your Webserver
- Authors

- Name
- Kevin van Zonneveld
- @kvz
Recently two of my articles reached the Digg frontpage at the same day. My web server isn't state of the art and it had to handle gigantic amounts of traffic. But still it served pages to visitors swiftly thanks to a lot of optimizations. This is how you can prevent heavy traffic from killing your server.
About This Article
There are many things you can do to speed up your website. This article focuses on practical things that I used, without any spending money on additional hardware or commercial software.
In this article I assume that you're already familiar with system administration and hosting / creating websites. In examples I use Ubuntu, but if you use another distro, just make some minor adjustments (like package management) and it should work as well.
Beware, if you don't know what you're doing you could seriously mess up your system.
Cache PHP Output
Every time a request hits your server, PHP has to do a lot of processing, all of your code has to be compiled & executed for every single visit. Even though the outcome of all this processing is identical for both visitor 21600 and 21601. So why not save the flat HTML generated for visitor 21600, and serve that to 21601 as well? This will relieve resources of your web server and database server because less PHP often means less database queries.
Now you could write such a system yourself but there's a neat package in PEAR called Cache_Lite that can do this for us, benefits:
- it saves us the time of inventing the wheel
- it's been thoroughly tested
- it's easy to implement
- it's got some cool features like lifetime, read/write control, etc.
Installing is like taking candy from a baby. On Ubuntu I would:
$ sudo aptitude install php-pear
$ sudo pear install Cache_Lite
And we're ready to use one of our most important assets!
To learn exactly how to implement Cache_Lite into your code I've written another article called: Speedup your website with Cache_Lite.
Create Turbo Charged Storage
With the PHP caching mechanism in place, we take away a lot of stress from your CPU & RAM, but not from your disk. This can be solved by creating a storage device with your system's RAM, like this:
$ mkdir -p /var/www/www.mysite.com/ramdrive
$ mount -t tmpfs -o size=500M,mode=0744 tmpfs /var/www/www.mysite.com/ramdrive
Now the directory /var/www/www.mysite.com/ramdrive is not located on your
disk, but in your system's memory. And that's about 30 times faster :) So why
not store your PHP cache files in this directory? You could even copy all
static files (images, css, js) to this device to minimize disk IO. Two
things to remember:
- All files in your ramdrive are lost on reboot, so create a script to restore files from disk to RAM
- The ramdrive itself is lost on reboot, but you can add an entry to
/etc/fstabto prevent that
To learn exactly how to tackle the above, I've written another article called: Create turbocharged storage using tmpfs.
Leave Heavy Processing to Cronjobs
For example. I count the number of visits for every single article. But instead of updating a counter for an article every visit (which involves row locking and a WHERE statement), I use simple and relatively performance-cheap SQL INSERTS into a separate table.
The gathered data is processed every 5 minutes by a separate PHP script that's automatically run by my server. It counts the hits per article, then deletes the gathered data and updates the grand totals in a separate field in my article table. So finally accessing the hit count of an article takes no extra processing time or heavy queries.
If you want more in depth information on writing cronjobs, I've written another article called: Schedule tasks on Linux using crontab.
Optimize Your Database
Use the InnoDB Storage Engine
If you use MySQL, the default storage engine for tables is MyISAM. That's not ideal for a high traffic website because MyISAM uses table level locking, which means during an UPDATE, nobody can access any other record of the same table. It puts everyone on hold!
InnoDB however, uses Row level locking. Row level locking ensures that during an UPDATE, nobody can access that particular row, until the locking transaction issues a COMMIT.
phpmyadmin allows you to easily change the table type in the Operations tab. Though it never caused me any problems, it's wise to first create a backup of the table you're going to ALTER.
Use Optimal Field Types
Wherever you can, make integer fields as small as possible. Not by changing the length but by changing its actual integer type. The length is only used padding.
So if you don't need negative numbers in a column, always make a field unsigned. That way you can store maximum values with minimum space (bytes). Also make sure foreign keys have matching field types, and place indexes on them. This will greatly speedup queries.
In phpmyadmin there's a link Propose Table Structure. Take a look sometime, it will try to tell you what fields can be optimized for your specific db layout.
Queries
Never select more fields than strictly necessary. Sometimes when you're lazy you might do a:
SELECT * FROM `blog_posts`
even though a
SELECT `blog_post_id`,`title` FROM `blog_posts`
would suffice. Normally that's OK, but not when performance is your no.1 priority.
Tweak the MySQL Config
Furthermore there are quite some things you can do to the my.cnf file, but
I'll save that for another article as it's a bit out of this article's scope.
Save Some Bandwidth
Save Some Sockets First
Small optimizations make for big bandwidth savings when volumes are high. If
traffic is a big issue, or you really need that extra server capacity, you
could throw all CSS code into one big .css file. Do this with the JS code as
well. This will save you some Apache sockets that other visitors can use
for their requests. It will also give you better compression rations, should
you choose to mod_deflate or compress your javascript with Dean Edwards
Packer.
I know what you're thinking. No, don't throw all the CSS and JS in the main page. You still really want this separation to:
- make use of the visitor's browser cache. Once they've got your CSS, it won't be downloaded again
- not pollute your HTML with that stuff
And Now Some Bandwidth ; )
- Limit the number of images on your site
- Compress your images
- Eliminate unnecessary whitespace or even compress JS with tools available everywhere.
- Apache can compress the output before it's sent back to the client through mod_deflate. This results in a smaller page being sent over the Internet at the expense of CPU cycles on the Web server. For those servers that can afford the CPU overhead, this is an excellent way of saving bandwidth. But I would turn all compression off to save some extra CPU cycles.
Store PHP Sessions in Your Database
If you use PHP sessions to keep track of your logged in users, then you may want to have a look at PHP's function: session_set_save_handler. With this function you can overrule PHP's session handling system with your own class, and store sessions in a database table or in Memcached.
Now a key attribute to success, is to make this table's storage engine: MEMORY (also known as HEAP). This stores all session information (should be tiny variables) in the database server's RAM. Taking away disk IO stress from your web server, plus allowing to share the sessions with multiple web servers in the future, so that if you're logged in on server A, you're also logged in on server B, making it possible to load balance.
Sessions on tmpfs
If it's too much of a hassle to store sessions in a MEMORY database, storing
session files on a ramdisk is also a good options to gain some performance.
Just make the /var/lib/php5 live in RAM. To learn exactly how to do this,
I've written another article called: Create turbocharged storage using tmpfs
».
Sessions in Memcached
I recently (22nd June, 08) found another (better) way to store sessions in a cluster-proof, resource-cheap way and dedicated a separate article on it called: Enhance PHP session management.
More Tips
Some other things to google on if you want even more:
- eAccelerator
- memcached
- tweak the apache config
- squid
- turn off apache logging
- Add 'noatime' in /etc/fstab on your web and data drives to prevent disk writes on every read
Legacy Comments (49)
These comments were imported from the previous blog system (Disqus).
That php session class is gold :) I had something like that myself but its not half as good as that one!
Storing session files on a ramdisk is also gain some performance.
Hi Nima, excelent idea, I\'ve updated the article. Thanks!
You mentioned that compressing JS and CSS will save bandwidth but use CPU.
Surely if you cache the compressed result then you only have to do it once. Everyones a winner then.
@ Simon: There I\'m not talking about JS obfuscation or compression, but I\'m talking about compression on the apache level, which cannot be cached.
Good Article
Adding \'noatime\' in /etc/fstab on your web and data drives. .. prevents file system updating \'access time\' each and every time a file is accessed.
@ Ray: I don\'t know if you need that with a RAM device, but it\'s a good tip anyway so thanks!
Not a tip for after you are on Digg but rather one to help you know whether you are ready for Digg.
If you receive steady, regular traffic, make sure your server\'s CPU usage rarely goes above 30%. This might seem low but remember that Digg can drive a lot of traffic to your site in a very short amount of time. Whenever the load on our servers reaches 40% at the peak time we buy another one and put it into the load balancer. This is a rule-of-thumb and works fairly well for us. We run 70-odd websites this way, some of which receive over 300,000 unique visitors per day and have survived day-long front page Diggings without degrading performance. We also go over 130MBits/sec while being Dugg although, to be fair, some of the pages could be a little lighter... 4MB is normal for a home page isn\'t it ? :-P
eAccelerator is the BIZ NIZ!!!! awesome article!
Regards,
http://www.olemera.com/loan...
A few tips for MySQL checks: http://hackmysql.com/mysqlr... has an excellent reporting tool for looking at your logs.
Also: on the note of using deflate in apache: most webservers have CPU to spare but no memory to spare and thus mod_deflate might be handy (connections are handled faster etc and thus apache can handle requests more quickly, thus reducing the concurrent load)
Great Share, I will definately put these ideas into practice.
Thanks.
Excellent..........:) it make more people happy.
Indeed a nice article an congratulation for your digg positions.
Just to mention it. tmpfs is allowed to use virtual memory to swap pages back to disk. May be you dont want that? ramfs won\'t behave itself in that manner.
if you\'ve enough ram your files are cached by os (here ubuntu).
Check free command. So there is no need to put them in a ramdisk. iostat will show no disk activity then.
@ brainextender: Didn\'t know that, thanks for the headsup!
I must say that this will be a useful article for large projects.
This could be good part of server load balancing.
Tonu
Software Engr.
ha ha i wish i had the traffic on my blog to worry about things like this! :)
@ ephman: write about things like this and you will : ) chicken-egg situation ; )
I rarely make comments on blogs, but this article was so good I had to. It touches on a lot of different areas that can take you months to fully optimize. I know, because it took me quite a few months until i was satisfied.
Not only are these good tips for heavy traffic\'d sites, but good tips in general for a speedy and responsive website.
@ brant: that\'s nice of you thanks! :) Though some things are already outdated again. I think I\'m going to have to do another version some day.
Nice article, some of the mentioned techniqes might be too much for a normal Admin but there are some nice ideas in there.
You did not mention PHP Op-Code Cachers like APC and XCache - They can reduce the load by simply installing them and let them cache your PHP Scripts.
Also your MyISAM and InnoDB tipp is no general fact, it has to be selected wisely on your setup and website.
You also forget to mention moving static files to Amazon S3 cloud or simply using Lighttpd or Nginx for static files.
@ Julius Beckmann: Yeah like I said, I should probably redo this again some time cause the article is almost 2 years old now.
I don\'t know if anyone has mentioned it yet, but nginx is an excellent webserver for those in need to save memory/CPU while serving max amount of users. I use nginx in conjunction with php-fpm and xcache, and things fly.
This is a great article!
@ DV: Yeah like I said, I should probably redo this again some time cause the article is more than 2 years old now : )
excelant article nicely done
awesome tips. your website has been a pot of gold for me, keep up the good work!
thank yuo
very good
thank yuo
vary good
To optimize your MySQL queries further, use the LIMIT claus.
e.g. SELECT username FROM table WHERE id=\'1\' LIMIT 1
That way MySQL will end the query as soon as the WHERE claus is satisfied and with 1 record (or how ever many records you will need)
Also, always use Persistant MySQL connections like pconnect()...
Memcached does have some interesting advantages/disadvantages.
Memcached is best used for high read/low write situations. However sessions are re-written every script execution which means it\'s faster to store your sessions in a DB. However if you have data sets that update infrequently, then it\'s better to use Memcached.
A problem I\'ve also discovered with Memcached is when using multiple Memcached servers (using the php binary, not the pecl module) and one of those servers looses connectivity, Apache starts throwing segfaults. This includes cases where you flush 1 Memcached server, but not all of them.
@ Matt Kukowski & Tom: Thanks for chiming in!
I am a newbie in blogging and I am really impressed by the above given article. I ll try to implemented most of things from this article to my website to save money
Thank for your article.
As far as performance goes, what would you suggest is the best way to add another server into the apache mix?
Would it be installing a private cloud with eucalytus? Perhaps an ubuntu cluster - more for reliabilty really? What about that old SETI concept, the cluster of workstations COW? Does that exist in any form today?
@ Andrew: Not sure if I really understand your question. But if you mean: What\'s the best way to scale webservers, the answer is there is none. It really depends on your specific situation. But if money is an issue: there\'s a lot you can do with lvs. So that means a linux based loadbalancer dividing traffic between as many webservers as you like. There are many other ways but lvs is very powerful considering it costs nothing, and is kernel-based.
Hi Kevin.. It looks as if that is exactly what I am after. Thank you. I assume this is what you mean.. http://www.linuxvirtualserv...
@ Andrew: Exactly. It can be pretty rough, but there are nice wrappers out there that can make it a breeze: ldirectord is one. Just a perl script that reads a simplified config file, and feeds lvs (using the ipvsadm command) the rules that are needed to balance the traffic.
If you want something slower but easier, have a look at pound. That doesn\'t rewrite IP packets at kernel level, but just forwards level 7 traffic. So yeah: slower, but easier and in some cases (different networks/whatever) the only option.
Thank you! This script worked great for me. I’ve been search for about four hours and finally found
Very good article
I want to use Cache PHP output. Thank for guide.
try caching mysql slow queries in xcache - helps a lot ;) -> http://lifeofguenter.de/kil...
nice article. I'm going to have to try a few of these this week.
I was under the impression that MyISAM is faster for most use cases (which is why it's the default), and that InnoDB is most useful when you've got a large number of parallel UPDATE/INSERTs going on in the same table. That is, even if you've got 10,000 people reading your website all at the same time, row-level locking isn't going to make a bit of difference as far as read speeds -- it's only if you've got a large number of people editing content (e.g. a busy web forum, or image posting site, perhaps) that you're going to see a significant performance boost from switching to MyISAM. And since most sites are doing much more SELECTing than UPDATE/INSERTing, MyISAM is generally a better choice. There are also other considerations, such as fulltext searching, which MyISAM can do, but not InnoDB; and MyISAM's smaller resource usage. (Google for 'myisam vs. innodb' for more detailed comparisons)
Anyhow, I think in many cases, you might actually be doing more harm than good by switching to InnoDB.
Oops -- meant to say "...see a significant performance boost from switching to InnoDB" there.
And I also forgot to say thanks for the useful post.
Another good way to increase the number of page the server can render is to move all the ressources to S3 so the server only have to take care of rendering pages and not providing static content
Thank you for that advice - I think using a RAM disk would be a lot quicker and easier to set-up than trying to install and learn APC or any of the other caching libraries.
Also, I think it's worth mentioning that Apache can be optimised to serve different kinds of requests from the same server; using lighttpd (instead of or alongside Apache) is also an option.
i have no idea how to do these things and it sounds great. Willing to hire you to help me if interested. :)
Thanks for the info :)
I really liked the advice of storing php sessions in a mysql table using memory engine and using it for load balancing.
Great blog you have here.. It's hard to find high quality writing like yours these days. I seriously appreciate individuals like you! Take care!!
Thanks for the easy-to-understand and informative article.