Let's say your site is becoming a big success and as a result it's becoming slower and slower. There are several things you do without buying additional hardware:

In this article I assume you have basic knowledge of server administration, be careful because you could really mess things up if you've got no clue what you're doing. I warned you!

Squid?

If you don't have the money to buy additional hardware you should know that it's always an option to install it on the same server that your apache runs on. This is how it works:

  • We're going to install Squid
  • We're going to run Apache on port 8080
  • We're going to run Squid on port 80
  • When a request from a web browser reaches port 80, squid will first check if it has the result stored in memory.
  • If so, it is:
  • served to the web browser immediately without troubling the Apache server
  • If not, it is:
  • fetched from the Apache server
  • stored in memory for the next time
  • served to the web browser

Now that you have an idea of the logic behind Squid, let's put it to use!

Let's do this

Installing squid is easy, just use your distro's package manager. On Ubuntu it would look like this:

$ sudo aptitude install squid

You can make Apache run on port 8080 by editing the file: /etc/apache2/ports.conf

Listen 127.0.0.1:8080

Now let's edit squid's config file: /etc/squid/squid.conf

# Define the HTTP por
http_port _123.123.123.123_:80 vhost vport=8080 defaultsite=*www.example.com*
# Specify the local and remote peers
cache_peer 127.0.0.1 parent 8080 0 no-query originserver name=server1

# Tell squid which domains to forward to which servers
acl sitedomains dstdomain _.example.com_
cache_peer_access server1 allow sitedomains
# some restriction definitions
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
#acl webcluster src 87.233.132.114
acl webcluster src 87.233.132.112/28
acl purge method PURGE
acl CONNECT method CONNECT

# some restrictions
http_access allow manager localhost
http_access allow manager webcluster
http_access deny manager
http_access allow purge localhost
http_access allow purge webcluster
http_access deny purge
# Make sure that access to your accelerated sites is allowed
http_access allow sitedomains
# Deny everything else
http_access deny all

# Do not cache cgi-bin, ? urls, posts, etc.
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
acl POST method POST
no_cache deny QUERY
no_cache deny POST
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
refresh_pattern .              60       100%     4320

# Do not cache 404s 403s, etc
negative_ttl 0 minutes
# Debug info in cache.log?
# debug_options ALL,1 33,2

# Cache properties
_cache_mem 500 MB_
maximum_object_size_in_memory 2048 KB
cache_replacement_policy heap LRU
memory_replacement_policy heap LRU
cache_dir ufs /var/spool/squid 20000 16 256
access_log /var/log/squid/access.log squid
hosts_file /etc/hosts

I underlined the things you might want to change and I've placed some comments for you to read. Some extra notes on:

  • cache_mem 500MB

Squid claims this much RAM, change it to fit your needs. See how much memory is availabe on your server. Limit it to relax other processes

  • 123.123.123.123

change this to your public ip address

  • example.com

change this to your domain name

You may want to play with the config a little more. Every site is different and some sites just don't like it that they're being cached, but this should definitely get you started. That reminds me, you'll have to restart the services in order for this to work of course.

$ /etc/init.d/apache restart
$ /etc/init.d/squid restart

Some Final Notes

  • If your webapplication gives you a hard time, concider only to cache media files like jpg's, flv's etc, and have the rest directed to Apache. It's the safest setup, and it can still save you quite a bit disk I/O on the server.
  • You can use htaccess files to control what kind of files should be cached, and for how long.
  • It could be that your web statistics (awstats of webalizer maybe) display incorrect graphs because they parse apache log files, and the log files contain less records because Squid is handling a lot of them. You could:
  • teach your stats program to read Squid's logfiles
  • use a stats program like google analytics, which does not interfere because the clients direct a separate request to a stats server.
  • A nice overview of Squid's configuration options can be found here.