Code spends more time being read then being written. I think naturally this is true for queries as well. So it might help if we teach ourselves some guidelines as how to nicely format them.

I've searched but at the time of writing, could not find a public Style Guide for SQL formatting. I'll try to keep this guide short and pragmatic, so that it has a chance of actually being read & stuck to : )

This targets MySQL but should work as well for other dialects/engines.

We have a couple of tools at our exposal inside SQL to format our queries.

  • SQL whitespace is ignored by MySQL's parser, so let's use it to make our lives easier, right?
  • Dummy conditions have no effect in SQL statements. Things like: WHERE 1.

Ok let's just begin with some of the SQL layout habits we settled on at our company.

All Clauses Get a Newline

This will provide a clear seperation of different parts of the query, making it more easy to read & comprehend. And it will enable you to better maintain the query cause you can jump between different clauses bij just pressing up & down.

Before

SELECT * FROM `books` ORDER BY `title`

After

SELECT *
FROM `books`
ORDER BY `title`

N.b.: Whitespace is ignored by MySQL's parser, but not by MySQL's query cache on versions <5 , thanks to 'foobar' for pointing that out. So if you want query cache to pay off.. Pick one format & stick with it ; )

Accompanied Fieldnames Get Their Own Newline

This will allow you to very easily (or even dynamically) add or remove certain fields from the query.

Before

SELECT `id`, `title`, `rating`
FROM `books`
ORDER BY `title`

After

SELECT
  `id`,
  `title`,
  `rating`
FROM `books`
ORDER BY `title`

Lonely Fieldnames Stay on the Same Line

This will keep the query more or less compact and avoid that even the simplest query takes up 10 lines.

Before

SELECT
  `id`,
  `title`
FROM
  `books`
ORDER BY
  `title`

After

SELECT
  `id`,
  `title`
FROM `books`
ORDER BY `title`

Where 1

So now that we've played with whitespace a bit, there are other things that have no effect in SQL as well. Like 1. This short expression can really help us define conditions in a more uniform way.

Let's look at an example.

Before

SELECT `id`
FROM `books`
WHERE
  `published` = 'yes'
  AND `rating` > 5`

Hm, too bad, published = 'yes' is formatted differently from rating > 5 because it doens't have the AND word. We'd have to account for that every time we change the conditions, or if we were generating this query automatically: 'Always prefix with AND... UNLESS it's the first condition'. Bad for layout. Bad for automation.

After

SELECT `id`
FROM `books`
WHERE 1
  AND `published` = 'yes'
  AND `rating` > 5`

One other big advantage of writing your conditions in such a uniform format, is that it becomes really easy to conditionally add — or temporarily turn off conditions with MySQL Comments:

SELECT `id`
FROM `books`
WHERE 1
  -- AND `published` = 'yes'
  AND `rating` > 5`

..without breaking SQL syntax. Cause remember that if I had done this with the before query, I would have gotten a syntax error.

The Performance Hit of 1

While this dummy syntax provides developers with some great comfort, of course we have to make sure this addition doesn't come at a price. MySQL guru Erwin Bleeker benchmarked on multiple occasions with 1 bilion queries (no cache), to find that the results only differed by one hundreds of a second on average.

This is how he benchmarked:

mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees WHERE 1 limit 1));
+-----------------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers WHERE 1 limit 1)) |
+-----------------------------------------------------------------------------------+
| 0                                         |
+-----------------------------------------------------------------------------------+


1 row in set (24.84 sec)


mysql> select benchmark(1000000000, (select SQL_NO_CACHE 1 from employees limit 1));
+-------------------------------------------------------------------------+
| benchmark(1000000000, (select SQL_NO_CACHE 1 from medewerkers limit 1)) |
+-------------------------------------------------------------------------+
| 0                                     |
+-------------------------------------------------------------------------+


1 row in set (24.83 sec)

Where 0

As you may have guessed, our dummy condition also works for OR queries. Just negate the dummy condition: 0, look:

SELECT `id`
FROM `books`
WHERE 0
  OR `published` = 'yes'
  OR `rating` > 5`

Joins

Let me just show you how I go about this:

SELECT
  `authors`.`id`,
  `authors`.`name`,
  `authors`.`birthday`,
  COUNT(`books`.`id`) AS book_pub_cnt
FROM `authors`
LEFT JOIN `books` ON (1
  AND `books`.`author_id` = `authors`.`id`
  AND `books`.`published` = 'yes'
)
WHERE 1
  AND `authors`.`alive` = 'yes'
GROUP BY `authors`.`id`
ORDER BY `authors`.`name`

Furthermore

You may have noticed that I:

Use Backticks to Enclose All Database Entities

That's just good habit, this way if you ever have an ambiguous fieldname ('active', or 'status', could be interpretted as statements), your database will know that you mean the fieldname or table, and not the statement.

Use Single Quotes to Enclose Strings

So you can enclose the entire query in double quotes, and be able to use variables from PHP without concatenation. On the other hand: You should really use prepared statements, and in my eyes concatination is better than let PHP automatically substitute your vars. But hey, if you need to choose anyway, might as well be single quotes.

Nest with Two Spaces vs Tabs

I tend to do this in PHP & other languages as well. It allows for consistent layout in all possible editors & views. The tab character sometimes also have unwanted (e.g. autocomplete) impact when you paste into a console. There's an interesting post about it here.

Conventions in General

Sometimes conventions rely more on taste than reason. Still it's helpful to settle on a single 'taste':

It's irrelevant if people drive on the right or left side of the road. As long as they all do the same :)

While I'm not implying there will be fatal accidents if many different styles are deployed, it will contribute to the success of your project if you are able to agree on coding standards, and SQL is no exception.

This is version 1.2.0 of this style guide. I'm happy to hear what you're using, or if you'd like to see additions, just leave a comment so I can update accordingly! Sadly when moving to a new comment system many comments where lost, but we started at 0.0.1 so quite some improvements have already been contributed. Thank you!