I am a big fan of code readability, but being a fan and expecting the code I write to be readable are two different things. Readable code is easier to understand and much more straightforward to grasp.
One significant difference I find while comparing readable and unreadable code, apart from the obvious, is how many different variables I have to hold in my head at a time while I try to grapple my head around it. If this task is simple, I believe that this is not because of some random act of brilliance but due to careful code construction.
CTEs helps achieve this by its ability to separate data into independent logical entities according to the context of the query.
A random use case.
Let's look at a random use case and check how the query performance and readability can be improved. For a particular report in our e-commerce app, we were querying for each item’s latest n review-comments, in our case, we can set n = 2.
For the first iteration, we can write a simple join and then filter that in our application language (Ruby, Go, PHP etc.)
For an e-commerce system like Amazon, this is not going to scale since the sheer number of items and the associated review comments are guaranteed to bog down the dbms.
Lets introduce a Window Function
In general, a Window Function, is a mathematical function that is zero valued outside of some interval. In case of SQL engines, they operate on rows that are related to the current row, in our case, this can be the item_id. All rows of the join that has a common item_id acts as a partition over which the window function acts.
We rewrite our query to do the following:
- Window function should operate on a partition of same item_ids
- Each of these partitions should be ordered by created time stamp of review_comment.
- Select only the top two from each partition
The sub-query is resulting in a full join table, but now we have comment_row_number
field to let us know that the newest review_comment has a value of 1. All we need to do now is to filter using that column.
We have used row_number() function from PostgreSQL as the window function here. According to the docs, any built-in or user-defined aggregate function can be used as a window function.
Putting It All Together
Even though we have our answer, our query has become pretty hard to read. We can get better readability by introducing a Common Table Expression.
Well, that is it folks. We have put together two concepts, readability and performance to create something which is the best of both worlds.