I’ve recently been exploring options to calculate median and quartiles in my Postgres database. If you’re familiar with quartiles you know how handy they can be. There’s a few different options in the Postgres universe to accomplish this, so I figured I would give them all a whirl and see which was the friendliest (and fastest) on my CPU.
I’m using the “batting” table for Sean Lahman’s baseball database as my proof of concept. The table has just under 100,000 rows. Not too big, but a good test case. For my example here, I’m using the “r” column, which indicates total runs scored for a season.
The R Method
PL/R is a popular Postgres extension. If you haven’t checked it out, I would highly recommend it.
Pros: Very fast and simple code. Only one line of actual R code used to build this function.
Cons: R isn’t pre-installed on most systems and PLR isn’t shipped with Postgres. Takes a bit of system config., but not too much.
The Python Method
I’m a big fan of Python in general. It’s currently one of my favorite ETL languages. Here I’m using yet another great Postgres extension called PL/Python.
Pros: Python is pre-installed on most Linux and Mac systems making set up a breeze.
Cons: Surprisingly slow! Python ties to pipe all the data into the interpreter and then back out again as a function result. Too much system cost for me!
The C Method
Everyone remember C from your CS-101 class in college? Yeah, that’s why no one likes to write it. Fortunately, this is a pre-packaged Postgres extension written in C called Quantile. I’m not going to post the mile-long C code here, but you can see it on the GitHub repo.
Pros: BLAZING FAST! Returned an array faster than native SQL could calculate a median! I ended up putting the R solution into production because PL/R has room for further application, but if I were looking for speed and nothing else, the Quantile extension is the clear winner.
Cons: A third-party extension, so you’re at the mercy of the developers to keep things updated. This particular repo is about four years old and looks to be updated on a regular basis.
comments powered by Disqus