Death, Taxes, and Storing Hierarchical Data in a Relational Database

A quick update to the Buildering.net situation. Basically, we're going to convert their install of Ikonboard (no link, because really, you shouldn't use it ever), to Drupal's kinda kludgy "Advanced" Forum module.

The benefit is a unified login system, which is good enough to outweigh the loss of "real" forum features. (Personally, I prefer forums that concentrate on the basics, too many buttons confuse me!).

Of course, this whole exercise has reinforced my long held hatred towards database migrations. (In fact, I faintly remember swearing to never do another one of these again many times over the years.)

On the other hand, it's given me a lot of insight into the underpinnings of Drupal's database structure, which I suppose can only help in future endeavors. (Well, I had to find at least one positive right?).

StackOverflow

I ran into an issue with StackOverflow this week, which happened to have been covered by the podcast as well (week #21).

I'm still quite enjoying the site, and am trying to absorb as much info as possible on random bits of information I've never really thought about before (my favourite kind!). However, in taking a look at the limited number of topics that I would consider myself "fairly knowledgeable" about, I definitely noticed that incomplete answers can be quite frustrating.

One question in particular was answered somewhat incompletely (the answer was correct, but was not as correct as it could have been). However, the less correct version had already been voted up excessively, and the more correct version sat lonely at the bottom of the page.

I'm not quite sure what the solution is to those situations. You are allowed to comment specifically on the original question if you have a higher reputation level (sadly, mine is nothing to write home about).

Anyhow, I'm interested in seeing how that plays out, as it may be just a case of needing a higher user base to self correct these sorts of inconsistencies.

Oh right, hierarchical data

One thing I see a lot of, or at least, enough of that I feel compelled to write about it, is the storing of hierarchical data in a relational database.

I think part of the problem is, we're so used to relational databases now, and even if we might not admit to it, generally feel they are the superior option.

When presented with hierarchical data, our first thought generally is to squeeze it unceremoniously into a relational structure.

However, if we take a step back, we see that we use (and trust, for the most part), hierarchical databases in many other areas. Think of the file system, for example. Not a lot of people would advocate storing files in a relational database (I'm sure there are outlying circumstances, as with most things), and instead are happy with allowing the file system to handle it, while putting pointers to the data in our relational database.

Even XML, for the most part, we allow to exist in it's naturally hierarchical form.

If you do need to store hierarchical data in a database though (and we see this a lot with nested navigation systems, for example), there are a few choices to be had.

This older article sums up the choices, but I think is still a great read.

For those who may be new to storing hierarchical data relationally, I'd suggest taking a look at both methodologies and choosing the one that best fits your data needs. If you have a data set with controlled (or limited amount of) nesting, it will probably be easier to just stick to the adjacency list model.

However, if your tree can expand quite large (perhaps a CMS which allows users to create large navigational structures), the nested set model allows you to perform more efficient operations on the data.