# Partitioning Without Partitions

Posted on October 5, 2016
Tags: SQL

One of the common modern features of relational databases is the ability to partition tables. “Partition” can mean a few different things, but in this case, I’m talking about the common use-case of 1- taking a large table, 2- splitting it up into smaller tables, but 3- still accessing those tables as if they were a single table. There are some benefits to partitioning, particularly around maintenance tasks. Smaller tables mean swifter table scans, faster table operations, and faster histogram generation, not to mention the opportunity for more widely varied histograms for your data.

As mentioned, though the feature is relatively modern. Oracle added it in 1997, SQL Server in 1998 (sort of), and later in 2005 (for real). DB2 didn’t add it till 2007 (seriously). PostgreSQL added table inheritance (which is commonly used to implement partitioning) somewhere prior to 2000.

But the need predates 1997. And, well, DB2 users were out in the cold until less than a decade ago. What did you do if you needed to handle Big Data1 in 1995?

### UNION [ALL]

As it turns out, enterprising hackers can do all sorts of amazing things. See, there’s a SQL operator called UNION, which is a name that absolutely makes sense if you think about data in a very specific way.

Remember sets? Maybe as a data structure, maybe from formal logic, maybe from advanced math? Well, if you think about the results of a SQL SELECT as a set, then the UNION operator does exactly what you think it would: it takes the results of two different SELECT statements and delivers the set of unique results2.

For example3:

=# SELECT generate_series(1,5);
generate_series
-----------------
1
2
3
4
5
=# SELECT generate_series(1,5) UNION SELECT generate_series(4,8);
generate_series
-----------------
1
2
3
4
5
6
7
8

Note that this is a real set-like union – the numbers 4 and 5 were part of both SELECT results, but only showed up once in the output.

…. Which might not be what you want. Often times, maybe even most often, you want all of the results. That’s what the UNION ALL operator does:

=# SELECT generate_series(1,5) UNION ALL SELECT generate_series(4,8);
generate_series
-----------------
1
2
3
4
4
5
5
6
7
8

Now we’re talking. That may not have set-theoretical utility, but it’s much more useful for the pragmatics of table partitioning.

### CREATE VIEW foo

With UNION ALL, partitioning actually becomes relatively straightforward (if still hacky and a bit brittle). You just create your partitions as separate tables, and then slap them together in a big view:

=# CREATE TABLE foo2014 (d date, a int, b text);
=# CREATE TABLE foo2015 (d date, a int, b text);
=# CREATE TABLE foo2016 (d date, a int, b text);
=# CREATE VIEW foo AS
SELECT * FROM foo2014
UNION ALL
SELECT * FROM foo2015
UNION ALL
SELECT * FROM foo2016;

Et, voila! You can now select data from one place, and have your DB search a bunch of places. To play with a live demo, check this sqlfiddle (and consider donating to SQL Fiddle).

Now, this is hacky. Unless you can hang triggers on your views, you probably need to send all of your INSERT and UPDATE traffic to the partitions directly, rather than to the “table” itself. And your database may not be smart enough to avoid looking at all of your partitions for each query4.

And it’s brittle. I do hope you automated partition creation, because if your columns are inconsistently named or typed, then you’ll have problems. And each time you need a new partition, you have to re-create the view.

But it works. And it works in anything that conforms to SQL99. It even works in sqlite (though its usefulness there is debatable, since sqlite databases are a single file on-disk anyway).

There’s not much use for this technique now that we have more mature and robust partitioning capabilities available to us, though.

Tomorrow I’ll walk through some non-partitioning uses for these techniques that might be a little more relevant.

1. Where “Big” is relative to the time. We’re talking gigadata, not teradata, and certainly not petadata.

2. SQL99 defines INTERSECT and EXCEPT operators, as well, for all of your tables-as-sets needs.

3. Using PostgreSQL (specifically v9.5.4). I cheated slightly and used ORDER BY clauses to actually generate the results, for readability purposes. The SQL as-written will return results in a potentially-unstable order.

4. Most are, but sometimes require extra configuration. And all the ones I know of rely on constraints on each partition table, and possible mirrors of those constraints in each SELECT clause in the view.

I'd love to have a conversation, but I'm not interested in having a comments section. Instead, ping me on twitter or some other media. I'll try to link to responses from the relevant post page.