kvz.io
Published on

Survive Heavy Traffic With Your Webserver

Authors
  • avatar
    Name
    Kevin van Zonneveld
    Twitter
    @kvz

Recently two of my articles reached the Digg frontpage at the same day. My web server isn't state of the art and it had to handle gigantic amounts of traffic. But still it served pages to visitors swiftly thanks to a lot of optimizations. This is how you can prevent heavy traffic from killing your server.

About This Article

There are many things you can do to speed up your website. This article focuses on practical things that I used, without any spending money on additional hardware or commercial software.

In this article I assume that you're already familiar with system administration and hosting / creating websites. In examples I use Ubuntu, but if you use another distro, just make some minor adjustments (like package management) and it should work as well.

Beware, if you don't know what you're doing you could seriously mess up your system.

Cache PHP Output

Every time a request hits your server, PHP has to do a lot of processing, all of your code has to be compiled & executed for every single visit. Even though the outcome of all this processing is identical for both visitor 21600 and 21601. So why not save the flat HTML generated for visitor 21600, and serve that to 21601 as well? This will relieve resources of your web server and database server because less PHP often means less database queries.

Now you could write such a system yourself but there's a neat package in PEAR called Cache_Lite that can do this for us, benefits:

  • it saves us the time of inventing the wheel
  • it's been thoroughly tested
  • it's easy to implement
  • it's got some cool features like lifetime, read/write control, etc.

Installing is like taking candy from a baby. On Ubuntu I would:

$ sudo aptitude install php-pear
$ sudo pear install Cache_Lite

And we're ready to use one of our most important assets!

To learn exactly how to implement Cache_Lite into your code I've written another article called: Speedup your website with Cache_Lite.

Create Turbo Charged Storage

With the PHP caching mechanism in place, we take away a lot of stress from your CPU & RAM, but not from your disk. This can be solved by creating a storage device with your system's RAM, like this:

$ mkdir -p /var/www/www.mysite.com/ramdrive
$ mount -t tmpfs -o size=500M,mode=0744 tmpfs /var/www/www.mysite.com/ramdrive

Now the directory /var/www/www.mysite.com/ramdrive is not located on your disk, but in your system's memory. And that's about 30 times faster :) So why not store your PHP cache files in this directory? You could even copy all static files (images, css, js) to this device to minimize disk IO. Two things to remember:

  • All files in your ramdrive are lost on reboot, so create a script to restore files from disk to RAM
  • The ramdrive itself is lost on reboot, but you can add an entry to /etc/fstab to prevent that

To learn exactly how to tackle the above, I've written another article called: Create turbocharged storage using tmpfs.

Leave Heavy Processing to Cronjobs

For example. I count the number of visits for every single article. But instead of updating a counter for an article every visit (which involves row locking and a WHERE statement), I use simple and relatively performance-cheap SQL INSERTS into a separate table.

The gathered data is processed every 5 minutes by a separate PHP script that's automatically run by my server. It counts the hits per article, then deletes the gathered data and updates the grand totals in a separate field in my article table. So finally accessing the hit count of an article takes no extra processing time or heavy queries.

If you want more in depth information on writing cronjobs, I've written another article called: Schedule tasks on Linux using crontab.

Optimize Your Database

Use the InnoDB Storage Engine

If you use MySQL, the default storage engine for tables is MyISAM. That's not ideal for a high traffic website because MyISAM uses table level locking, which means during an UPDATE, nobody can access any other record of the same table. It puts everyone on hold!

InnoDB however, uses Row level locking. Row level locking ensures that during an UPDATE, nobody can access that particular row, until the locking transaction issues a COMMIT.

phpmyadmin allows you to easily change the table type in the Operations tab. Though it never caused me any problems, it's wise to first create a backup of the table you're going to ALTER.

Use Optimal Field Types

Wherever you can, make integer fields as small as possible. Not by changing the length but by changing its actual integer type. The length is only used padding.

So if you don't need negative numbers in a column, always make a field unsigned. That way you can store maximum values with minimum space (bytes). Also make sure foreign keys have matching field types, and place indexes on them. This will greatly speedup queries.

In phpmyadmin there's a link Propose Table Structure. Take a look sometime, it will try to tell you what fields can be optimized for your specific db layout.

Queries

Never select more fields than strictly necessary. Sometimes when you're lazy you might do a:

SELECT * FROM `blog_posts`

even though a

SELECT `blog_post_id`,`title` FROM `blog_posts`

would suffice. Normally that's OK, but not when performance is your no.1 priority.

Tweak the MySQL Config

Furthermore there are quite some things you can do to the my.cnf file, but I'll save that for another article as it's a bit out of this article's scope.

Save Some Bandwidth

Save Some Sockets First

Small optimizations make for big bandwidth savings when volumes are high. If traffic is a big issue, or you really need that extra server capacity, you could throw all CSS code into one big .css file. Do this with the JS code as well. This will save you some Apache sockets that other visitors can use for their requests. It will also give you better compression rations, should you choose to mod_deflate or compress your javascript with Dean Edwards Packer.

I know what you're thinking. No, don't throw all the CSS and JS in the main page. You still really want this separation to:

  • make use of the visitor's browser cache. Once they've got your CSS, it won't be downloaded again
  • not pollute your HTML with that stuff

And Now Some Bandwidth ; )

  • Limit the number of images on your site
  • Compress your images
  • Eliminate unnecessary whitespace or even compress JS with tools available everywhere.
  • Apache can compress the output before it's sent back to the client through mod_deflate. This results in a smaller page being sent over the Internet at the expense of CPU cycles on the Web server. For those servers that can afford the CPU overhead, this is an excellent way of saving bandwidth. But I would turn all compression off to save some extra CPU cycles.

Store PHP Sessions in Your Database

If you use PHP sessions to keep track of your logged in users, then you may want to have a look at PHP's function: session_set_save_handler. With this function you can overrule PHP's session handling system with your own class, and store sessions in a database table or in Memcached.

Now a key attribute to success, is to make this table's storage engine: MEMORY (also known as HEAP). This stores all session information (should be tiny variables) in the database server's RAM. Taking away disk IO stress from your web server, plus allowing to share the sessions with multiple web servers in the future, so that if you're logged in on server A, you're also logged in on server B, making it possible to load balance.

Sessions on tmpfs

If it's too much of a hassle to store sessions in a MEMORY database, storing session files on a ramdisk is also a good options to gain some performance. Just make the /var/lib/php5 live in RAM. To learn exactly how to do this, I've written another article called: Create turbocharged storage using tmpfs ».

Sessions in Memcached

I recently (22nd June, 08) found another (better) way to store sessions in a cluster-proof, resource-cheap way and dedicated a separate article on it called: Enhance PHP session management.

More Tips

Some other things to google on if you want even more:

  • eAccelerator
  • memcached
  • tweak the apache config
  • squid
  • turn off apache logging
  • Add 'noatime' in /etc/fstab on your web and data drives to prevent disk writes on every read

Legacy Comments (49)

These comments were imported from the previous blog system (Disqus).

Unrated.be
Unrated.be·

That php session class is gold :) I had something like that myself but its not half as good as that one!

Nima
Nima·

Storing session files on a ramdisk is also gain some performance.

Kevin
Kevin·

Hi Nima, excelent idea, I\'ve updated the article. Thanks!

Simon
Simon·

You mentioned that compressing JS and CSS will save bandwidth but use CPU.

Surely if you cache the compressed result then you only have to do it once. Everyones a winner then.

Kevin
Kevin·

@ Simon: There I\'m not talking about JS obfuscation or compression, but I\'m talking about compression on the apache level, which cannot be cached.

Yang
Yang·

Good Article

Ray
Ray·

Adding \'noatime\' in /etc/fstab on your web and data drives. .. prevents file system updating \'access time\' each and every time a file is accessed.

http://www.oneunified.net/blog

Kevin
Kevin·

@ Ray: I don\'t know if you need that with a RAM device, but it\'s a good tip anyway so thanks!

Dave
Dave·

Not a tip for after you are on Digg but rather one to help you know whether you are ready for Digg.

If you receive steady, regular traffic, make sure your server\'s CPU usage rarely goes above 30%. This might seem low but remember that Digg can drive a lot of traffic to your site in a very short amount of time. Whenever the load on our servers reaches 40% at the peak time we buy another one and put it into the load balancer. This is a rule-of-thumb and works fairly well for us. We run 70-odd websites this way, some of which receive over 300,000 unique visitors per day and have survived day-long front page Diggings without degrading performance. We also go over 130MBits/sec while being Dugg although, to be fair, some of the pages could be a little lighter... 4MB is normal for a home page isn\'t it ? :-P

Gregory
Gregory·

eAccelerator is the BIZ NIZ!!!! awesome article!

Regards,
http://www.olemera.com/loan...

Robin Speekenbrink
Robin Speekenbrink·

A few tips for MySQL checks: http://hackmysql.com/mysqlr... has an excellent reporting tool for looking at your logs.

Also: on the note of using deflate in apache: most webservers have CPU to spare but no memory to spare and thus mod_deflate might be handy (connections are handled faster etc and thus apache can handle requests more quickly, thus reducing the concurrent load)

Dilli R. Maharjan
Dilli R. Maharjan·

Great Share, I will definately put these ideas into practice.
Thanks.

ahydra
ahydra·

Excellent..........:) it make more people happy.

brainextender
brainextender·

Indeed a nice article an congratulation for your digg positions.

Just to mention it. tmpfs is allowed to use virtual memory to swap pages back to disk. May be you dont want that? ramfs won\'t behave itself in that manner.

if you\'ve enough ram your files are cached by os (here ubuntu).

Check free command. So there is no need to put them in a ramdisk. iostat will show no disk activity then.

Kev van Zonneveld
Kev van Zonneveld·

@ brainextender: Didn\'t know that, thanks for the headsup!

M A Hossain Tonu
M A Hossain Tonu·

I must say that this will be a useful article for large projects.

This could be good part of server load balancing.

Tonu
Software Engr.

ephman
ephman·

ha ha i wish i had the traffic on my blog to worry about things like this! :)

Kev van Zonneveld
Kev van Zonneveld·

@ ephman: write about things like this and you will : ) chicken-egg situation ; )

Brant Tedeschi
Brant Tedeschi·

I rarely make comments on blogs, but this article was so good I had to. It touches on a lot of different areas that can take you months to fully optimize. I know, because it took me quite a few months until i was satisfied.

Not only are these good tips for heavy traffic\'d sites, but good tips in general for a speedy and responsive website.

Kev van Zonneveld
Kev van Zonneveld·

@ brant: that\'s nice of you thanks! :) Though some things are already outdated again. I think I\'m going to have to do another version some day.

Julius Beckmann
Julius Beckmann·

Nice article, some of the mentioned techniqes might be too much for a normal Admin but there are some nice ideas in there.

You did not mention PHP Op-Code Cachers like APC and XCache - They can reduce the load by simply installing them and let them cache your PHP Scripts.

Also your MyISAM and InnoDB tipp is no general fact, it has to be selected wisely on your setup and website.

You also forget to mention moving static files to Amazon S3 cloud or simply using Lighttpd or Nginx for static files.

Kev van Zonneveld
Kev van Zonneveld·

@ Julius Beckmann: Yeah like I said, I should probably redo this again some time cause the article is almost 2 years old now.

vindimy
vindimy·

I don\'t know if anyone has mentioned it yet, but nginx is an excellent webserver for those in need to save memory/CPU while serving max amount of users. I use nginx in conjunction with php-fpm and xcache, and things fly.
This is a great article!

Kev van Zonneveld
Kev van Zonneveld·

@ DV: Yeah like I said, I should probably redo this again some time cause the article is more than 2 years old now : )

earth host
earth host·

excelant article nicely done

Shawn
Shawn·

awesome tips. your website has been a pot of gold for me, keep up the good work!

aobeda
aobeda·

thank yuo
very good

aobeda
aobeda·

thank yuo

vary good

Guest
Guest·

To optimize your MySQL queries further, use the LIMIT claus.

e.g. SELECT username FROM table WHERE id=\'1\' LIMIT 1

That way MySQL will end the query as soon as the WHERE claus is satisfied and with 1 record (or how ever many records you will need)

Also, always use Persistant MySQL connections like pconnect()...

Tom
Tom·

Memcached does have some interesting advantages/disadvantages.

Memcached is best used for high read/low write situations. However sessions are re-written every script execution which means it\'s faster to store your sessions in a DB. However if you have data sets that update infrequently, then it\'s better to use Memcached.

A problem I\'ve also discovered with Memcached is when using multiple Memcached servers (using the php binary, not the pecl module) and one of those servers looses connectivity, Apache starts throwing segfaults. This includes cases where you flush 1 Memcached server, but not all of them.

Kev van Zonneveld
Kev van Zonneveld·

@ Matt Kukowski & Tom: Thanks for chiming in!

azhar
azhar·

I am a newbie in blogging and I am really impressed by the above given article. I ll try to implemented most of things from this article to my website to save money

Andrew
Andrew·

Thank for your article.
As far as performance goes, what would you suggest is the best way to add another server into the apache mix?
Would it be installing a private cloud with eucalytus? Perhaps an ubuntu cluster - more for reliabilty really? What about that old SETI concept, the cluster of workstations COW? Does that exist in any form today?

Kev van Zonneveld
Kev van Zonneveld·

@ Andrew: Not sure if I really understand your question. But if you mean: What\'s the best way to scale webservers, the answer is there is none. It really depends on your specific situation. But if money is an issue: there\'s a lot you can do with lvs. So that means a linux based loadbalancer dividing traffic between as many webservers as you like. There are many other ways but lvs is very powerful considering it costs nothing, and is kernel-based.

Andrew
Andrew·

Hi Kevin.. It looks as if that is exactly what I am after. Thank you. I assume this is what you mean.. http://www.linuxvirtualserv...

Kev van Zonneveld
Kev van Zonneveld·

@ Andrew: Exactly. It can be pretty rough, but there are nice wrappers out there that can make it a breeze: ldirectord is one. Just a perl script that reads a simplified config file, and feeds lvs (using the ipvsadm command) the rules that are needed to balance the traffic.

If you want something slower but easier, have a look at pound. That doesn\'t rewrite IP packets at kernel level, but just forwards level 7 traffic. So yeah: slower, but easier and in some cases (different networks/whatever) the only option.

emily moore
emily moore·

Thank you! This script worked great for me. I’ve been search for about four hours and finally found

vikash
vikash·

Very good article

Chris
Chris·

I want to use Cache PHP output. Thank for guide.

lifeofguenter
lifeofguenter·

try caching mysql slow queries in xcache - helps a lot ;) -> http://lifeofguenter.de/kil...

frank
frank·

nice article. I'm going to have to try a few of these this week.

Jesse R. Taylor
Jesse R. Taylor·

I was under the impression that MyISAM is faster for most use cases (which is why it's the default), and that InnoDB is most useful when you've got a large number of parallel UPDATE/INSERTs going on in the same table. That is, even if you've got 10,000 people reading your website all at the same time, row-level locking isn't going to make a bit of difference as far as read speeds -- it's only if you've got a large number of people editing content (e.g. a busy web forum, or image posting site, perhaps) that you're going to see a significant performance boost from switching to MyISAM. And since most sites are doing much more SELECTing than UPDATE/INSERTing, MyISAM is generally a better choice. There are also other considerations, such as fulltext searching, which MyISAM can do, but not InnoDB; and MyISAM's smaller resource usage. (Google for 'myisam vs. innodb' for more detailed comparisons)

Anyhow, I think in many cases, you might actually be doing more harm than good by switching to InnoDB.

Jesse R. Taylor
Jesse R. Taylor·

Oops -- meant to say "...see a significant performance boost from switching to InnoDB" there.

And I also forgot to say thanks for the useful post.

kanjiroushi
kanjiroushi·

Another good way to increase the number of page the server can render is to move all the ressources to S3 so the server only have to take care of rendering pages and not providing static content

Noel
Noel·

Thank you for that advice - I think using a RAM disk would be a lot quicker and easier to set-up than trying to install and learn APC or any of the other caching libraries.

Also, I think it's worth mentioning that Apache can be optimised to serve different kinds of requests from the same server; using lighttpd (instead of or alongside Apache) is also an option.

Jdouk
Jdouk·

i have no idea how to do these things and it sounds great. Willing to hire you to help me if interested. :)

Reynold
Reynold·

Thanks for the info :)

I really liked the advice of storing php sessions in a mysql table using memory engine and using it for load balancing.

Marlon
Marlon·

Great blog you have here.. It's hard to find high quality writing like yours these days. I seriously appreciate individuals like you! Take care!!

Gopal Aggarwal
Gopal Aggarwal·

Thanks for the easy-to-understand and informative article.