12/15/2005

SQL "Beautifying"

Read PL/SQL Beautifier at Tim Hall's blog. I like the idea of a built in SQL, and PL/SQL, formatter for Oracle. If Oracle creates a built-in formatter, I'd like it to be extensible. Have a default format and give the user (us) the ability to tweak it to our liking. Once the "beautifier" comes out and it is integrated into 3rd party tools, you'll have one location to tweak your SQL and PL/SQL formatting style. That sounds good to me. While you're at it, throw in an asserter so we can just say DBMS_BEAUTIFIER.assert_is_formatted( v_sql_or_plsql_clob ).


However, I'm writing to discuss "good" and "bad" SQL formats. I've found that many of the suggested standards are either poor for readability or poor for maintainability.

First some bad:
1) Aligning the keywords

SELECT DUMMY
FROM DUAL
WHERE 1 = 1
AND 2 = 2


Why is it bad? On its own, it doesn't make a difference. If you try and follow that format in PL/SQL, it becomes both hard to maintain and hard to read. Also, try writing fluent SQL while trying to figure out how many spaces you need to enter before starting your keyword. Formats should be as close to natural writing as possible

2) Commas before columns

SELECT
DUMMY d1
, DUMMY d2
, DUMMY d3
FROM DUAL


This is annoying because the comma isn't the most important thing about each row: THE COLUMN IS. This format is used to "help" people remember commas after each non-terminal column. If a programmer can remember to put a semi-colon after every single line of code they make, they can remember to put a comma after non-terminal columns. The most important information should be first


And some good:
1) Alias everything in the from list and use those aliases for EVERY column reference.
This adds both readability and maintainability:

  • Easier to determine from where columns are coming - readability

  • May be more self-documenting - readability

  • Easier to switch which tables or views are referenced - maintainability

  • Allows the addition of columns without breaking the query - maintainability




SELECT
ORDER_ID,
CUSTOMER,
SALES_TOTAL,
PRODUCT,
COMMENTS
FROM ORDERS, ORDER_DETAILS
WHERE ORDER_ID = ORD_ORDER_ID


Where are sales_total and comments coming from?
What if I want to use my ORDERS_SUMMARY view instead of ORDERS?
What if I want to add a comments column to my ORDER_DETAILS table?

Aliasing everything answers all these questions clearly:

SELECT
o.ORDER_ID,
o.CUSTOMER,
od.SALES_TOTAL,
od.PRODUCT,
od.COMMENTS
FROM ORDERS o, ORDER_DETAILS od
WHERE o.ORDER_ID = od.ORD_ORDER_ID


2. Put each item of the select or from list on its own line when it exceed 2 elements
One of the most common changes for a query is adding information or changing from where information is coming. By putting each select and from element on its own line, you not only make it easier to read, but you make it easier to add, delete, and remove from and select elements. It also clears up any confusion between columns used in a function call vs columns in the select list.

3. Put each parameter of a function call on its own line when it exceeds 2 elements
This one is more a rule of thumb (ROT) and should be broken as you see fit. For instance, with decodes, I generally put the item being decoded on its own line and then put pairs of elements on their own line and then the default on its own line. For example:

DECODE(
flag,
'M', 'MONTHLY',
'D', 'DAILY',
'W', 'WEEKLY',
'OTHER'
)

This ROT is especially useful when you begin nesting functions. For example:


SUBSTR( my_str, DECODE( INSTR( my_str, ',' ), 0, LENGTH( my_str ), INSTR( my_str, ',' ) ) )


vs


SUBSTR(
my_str,
DECODE(
INSTR( my_str, ',' ),
0, LENGTH( my_str ),
INSTR( my_str, ',' )
)
)

8 Comments:

Blogger Bob B said...

I forgot to mention the performance benefit of standardly formatted sql. A casing or spacing change hashes to a different sql statement. Therefore, 'select * from dual' != 'select * from duaL'.

If you use the same sql statement in a lot of places and the casing or spacing is different, that means you'll be wasting space in the SGA to store different "copies" of the same sql statement. Which is one of many reasons why packaged cursors are so nice ...

12/15/2005 11:55:00 AM  
Anonymous Anonymous said...

Funny to see that most of the things you say are the ones I myself ended doing after years of working with SQL and PL/SQL.

So: You're not alone!

Cheers.

Carlos.

12/16/2005 02:41:00 AM  
Anonymous Anonymous said...

You can add to the 'alias everything' the benefit that it increases the probability of getting your code right.

How often have you seen the complaint about "why didn't oracle throw an error - this subquery select list references a column that doesn't exist in the table" ?

Almost inevitably the answer is - "because the column exists in a table in the parent query, and if you had alias'ed everything, you (or Oracle) would have seen the error".

12/17/2005 01:28:00 PM  
Blogger William Robertson said...

Totally agree with you on right-alignment of SQL keywords and not PL/SQL keywords. (SQL "AND" has three spaces in front of it, but PL/SQL "AND" does not.) Why on Earth do people do that? Makes absolutely no sense and drives me up the wall.

I disagree on the placement of commas in multiline lists though. Bullet points are clear and easy to read. Sometimes SELECT list expressions span several lines, and a leading bullet-style comma makes it easier to see where the next one begins. Again it drives me up the wall when they are placed in other creative locations.

This is the thing about formatting though. I doubt any two programmers on the planet have the same set of preferences.

btw a new feature of PL/SQL Developer 7.0 beta is configurable naming rules. You can say that parameters should begin with "p_" and be in lowercase and so on, and you will get compiler warnings when code breaks those rules.

12/17/2005 06:03:00 PM  
Blogger William Robertson said...

While I'm at it, I'm afraid I've never bought the performance benefit argument. Why are all these developers re-coding the same SQL anyway, even if they do align it exactly the same way thus saving parse overhead? Shouldn't the application be using packages? And even if it isn't, presumably it reuses the code or, God help it, script that contains the SQL, so however insanely and incomprehensibly it is formatted, it will at least look the same each time to the SQL cache.

12/17/2005 06:13:00 PM  
Blogger Bob B said...

The more you want a format rule to be followed, the better the reason you should have for following it.

For example, if I were "in charge" of Oracle development for something, the alias rule would be mandatory. I *know* from many experiences that it saves time up front and saves time in maintenance.

I'd prefer comma placement at the end, but if more developers wanted to put them up front, I'd live with that. In fact, I'd happily follow that guideline - to set a good example.


I'm a little leary of auto-formatters and here's why. Toad has a built in "format as you type" that will alter the casing of various groups of words as you tell it. You can make keywords all caps, packages camel case, and functions/procs lower case. I set it up and happily programmed.

A while later, I was fixing a critical package and every time I compiled it I got an ora-0600 type message. I emailed a coworker the instructions on how to fix the code and he compiled it fine. That baffled me for a month, until I found out that Toad was changing the casing of the "String" keyword of a Java stored procedure paramter list. Ugh.

Doesn't mean I won't use a formatter, just that I'm cynical about them :-)

12/18/2005 11:33:00 AM  
Anonymous Anonymous said...

Eddie Awads suggested 3 SQL formatters which can at least partially do what you suggest. His article can be found here:

Like Eddy I like very much the SQLinForm formatter
Version 1 or Version 2 which also formats PL/SQL

Ciao

1/28/2006 04:47:00 PM  
Blogger Icon-1 said...

I prefer right alignment of keywords because it visually appealing and the visible gap between the keywords (left) and their objects (right) allows the brain to parse the statement quickly.

Similarly, the commas before the line provide visible separation and, as William said, act like bullet points.

I agree about the long function lines and nested functions (as well as conditional clauses). I insist on clear indentation.

5/30/2008 02:35:00 PM  

Post a Comment

<< Home