Beautify URLs

Readable URLs are nice. A well made website will have a logical layout, with intelligent folder and file names, and as few technical details as possible. In the most well designed sites, readers can guess at filenames with a high level of success. Clean URLs are great because they:

  • allow search engines to better spider your site and increase your PageRank
  • are easy to remember
  • hide underlying technology, and reduce hacking attempts
  • look nice
  • are easier to link to
  • reduce the number of typos

Now before we go any further explaining about beautified URLs, let's first see what I mean by an ugly URL:

http://yourdomain.com/index.php?cat_id=12&artcl_id=9&act=edit

This just does not look professional, and invites people to temper with your variables. So how can we make this into a nice URL:

http://yourdomain.com/blog_posts/edit/beautify_urls/

There are several ways, but the one that I used is with an Apache module called mod_rewrite. This module can take any URL and change it quickly before your pages are accessed. You can tell mod_rewrite to do things by writing an .htaccess file.

OK, Lets do this

Fine, create a file in your web root (the directory with your main index), call it .htaccess (include the dot), and add the following line that tells Apache to enable the module I told you about:

RewriteEngine on

Now on the next line we need to tell the module to secretly rewrite the new & beautiful URLs, to the ugly URLs that lie beneath (we still need those otherwise your site won't function, right?). So lets say we wanted to rewrite: /blog to /index.php?page=blog

RewriteRule ^([a-zA-Z0-9_]+)$ /index.php?page=$1

What just happened?

Let's explain what all these nasty characters (Regular Expression) mean:

  • ^ marks the beginning of the URL
  • ([a-zA-Z0-9_]+)
  • ( ) try to match something between these
  • [ ] any of the matches between these will do
  • + try to find multiple matches
  • a-z match all lowercase characters
  • A-Z match all uppercase characters
  • 0-9 match all numbers
  • _ let's also match the underscore character
  • $ marks the end of the URL

Everything that's matched is stored into a variable: $1. This variable now contains blog. And we can use it to rewrite blog to /index.php?page=**blog** . The beauty of this is that it also works for other words now.

Making it more solid

So far for the basics. If you want to know more about Regular Expressions, .htaccess, Apache, mod_rewrite. I suggest you look it up somewhere else, this article is not about those.

You may find that blog is now secretly directed to /index.php?page=**blog** but what about blog/ ****? How about first visibly redirect people from blog to blog/ and then secretly directed them to /index.php?page=**blog** ? For this we would need the following .htaccess file:

RewriteEngine on
RewriteRule ^([a-zA-Z0-9_]+)$ /$1/ [R]
RewriteRule ^([a-zA-Z0-9_]+)/$ /index.php?page=$1

Notice the new line in the middle? It says rewrite blog to blog/ and let the people know. This is done with the [R]. It makes it a visible rewrite.

Next secretly rewrite blog/ to index.php?page=**blog**

But what if I have more than one variable to pass on?

What you could do is just repeat yourself in the .htaccess file like so:

RewriteEngine on
RewriteRule ^([a-zA-Z0-9_]+)$ /$1/ [R]
RewriteRule ^([a-zA-Z0-9_]+)/$ /index.php?page=$1

RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)$  /$1/$2/ [R]
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)/$ /index.php?page=$1&subpage=$2

There's a lot of things you can do with mod_rewrite. You can add conditions, create complicated regexes, etc. If you're interested, just google for it.

Doesn't work?

There are 2 things that need to be in place in order for this to work:

1. The htaccess needs to be allowed to control the Rewrite module. Make sure the Vhost contains:

AllowOverride All

2. The Rewrite module must be enabled, in the terminal type:

$ a2enmod rewrite