Wednesday, November 10, 2010

Of Horses and Carts

As developers, we like to code.  We want to write code.  It's what we do.  So, naturally, when a project begins what we want to do is dive in and start writing code.  From a proper planning perspective, this is generally frowned upon.  And for good reason.  When you're just starting to plan and haven't flushed out the details and don't have a firm grasp on the actual requirements (not just the documented requirements that some business user wrote down) is precisely when you shouldn't be etching into stone the logic to be used in the software.

But this reality can easily be (and often is) misconstrued as a mandate to not write any code just yet.  This is a fallacy.  Writing code isn't the problem.  Writing code that's etched in stone is the problem.  And overlooking the actual problem by mandating against what is essentially a means to the problem very easily leads to not solving the problem, but instead just moving it somewhere else.  Somewhere sinister.  The data model.

We've been writing software for years, and we generally know how it goes.  Almost every developer still does this just out of habit.  First you build your database and model out your tables, then you write your code to sit on top of that.  Right?  That's how everyone has always done it, so it must be the way.

Sadly, and at the cost of untold man-hours, it is not the way.  But it's just such common practice that people continue to behave in this manner out of nothing more than habit.  It's what they know, it's how they think, and it's a tried and true approach that management understands so it's the safe route.  (Safe for the developer, not for the ongoing maintenance of the software.)

What is essentially happening here is that the early attempt at solidifying the requirements is being etched in stone in the database instead of in the code.  And raise your hand if you think that re-factoring a database later in the life cycle of the software is significantly more difficult than re-factoring the code.  That's what I thought.

It all comes back to my favorite of favorites... separation of concerns.  You may be using proper IoC, you may be putting in hard assembly or even service boundaries between your layers.  But you haven't flushed out all of those dependencies.  The overall structure, in every direction, still depends on its core.  And when you first begin designing the software you are essentially designing its core.  The choice is yours... Should the core be the data model or should the core be the domain model?

Let's go with the common approach, the data model.  You build your ER diagram, create your tables, map your keys, create your association tables for those pesky many-to-many relationships, etc.  You now have a core database upon which your software will sit.  Essentially, you now have this (pardon my crude diagrams):
Your layers are separated, and that's all well and good.  But notice a subtle dependency there.  The overall shape of your software is governed by its core.  There's no getting around this, not unless you do what will likely amount to more abstraction than you need in a highly de-coupled service architecture.  (Get ready for tons of DTOs and "class explosion" for that.)  Even if these are broken apart by assembly and dependency-injected and all that happy fun stuff, there's still the underlying fact that your software's core is its data model.  What happens if that data model ever needs to change, or if you need to move to a different data store entirely?  A lot of work happens, that's what.

Consider instead shifting your core a little bit.  Imagine for a moment breaking that cardinal rule that "thou shalt not code first" and actually begin the design by creating your domain models.  In code.  What about the database?  You can figure that out later.  Or, at least in my case, hopefully a trained data modeler can help you figure it out later.  (Developers like to think we're also data modelers, but most of us just aren't.  A lot of that comes from the fundamental differences in design and function between object-oriented thinking in code and relational thinking in an RDBMS.)  Now, you have this:
The structural dependency is still there, but the core has shifted.  Your data model was built to accommodate your domain model, instead of the other way around.  By this approach, the data persistence is simply an interface which interacts with the domain, no different than the UI or anything else that hooks into the central domain.  The idea here is to be able to re-factor things more easily, especially in the data model (where significant growth can lead to unforeseen performance problems and scaling issues not evident in the original design), without impacting the entire system.

Many times this boils down to a cultural problem, really.  Businesses have spent decades with the understanding that "the data is paramount."  While there is generally truth to this statement, it should not be extended to believe that everything about the database is the core of your system and all that matters.  After all, the engine which drives that data plays a fairly critical role in your business.  Unless you're dealing with simple forms-over-data applications and simple rails-style interfaces, you would probably do well to consider the importance of all that business logic.

A common analogy in the English language is "putting the cart before the horse."  And you know how developers love analogies...  The cart is your data.  It's the payload that's being transported.  The horse is your engine.  It drives the data to and fro.  In the age-old struggle between cart-makers and horse-breeders there is a debate over which is the more important part of the system.  Without the horse, the cart doesn't move.  Without the cart, the horse has nothing to do.  Both are valid points to be sure, but when designing the system which natural construct ends up being the core?  No matter how well you abstract your horse-to-cart interface, there's still a natural architectural dependency in the system.  And it's a hell of a lot easier to build a cart that fits your horse than to breed a horse that fits your cart.


  1. Your first diagram assumes that they even have a domain model.

    I would like to add the phase "smart data structures and dumb code works better then the other way around" from Eric S. Raymond.

    What I have noticed is that people who design the data model first seem to see the information as just that, data. Then they produce the code that has to operate on that data for everything. I personally see this as producing a dumb data structure meaning that all your code has to be smarter.

    In a domain model things are object-oriented which is to say that the data is wrapped with intelligence. All the other code interacts with this intelligence and not the data directly meaning there can be a lot less to worry about.

    Designing the database first doesn't mean you can't end up with a good domain model, but I think you are seriously hurting yourself.

    I will also say that going database first means that everything has to have a place in the database. I am starting to find it annoying to have enum style tables in the database, or tables that probably would have been fine as a hardcoded class. Not everything has to be configurable. Also I don't like seeing to many enums being switch/if'ed upon outside of construction zones.

    Oh and let alone that I feel starting closer to how the user thinks about the problem helps greatly with producing more intelligent and useful software.

    Forms-on-data provides the user with a data centric method for them to unintuitively apply knowledge out of their head. The user most likely wants a task/workflow centric method that aligns to their knowledge.

  2. One thing that irks me about designing the database first is that it assumes all data must be stored in relational tables in a single database. As you mention, making lookup tables for every little enum gets a little tiring. Also, some data just isn't relational. Period. If a data model includes a table with a key and an XML blob, that's a good indication that some of that data should be put into a document or object database, or maybe even flat XML files if it calls for that.

    Or, based on the needs of the software, maybe the whole of the data model should be partitioned into a couple different databases with considerably more focus placed on performance. Some parts of the business model may be very write-heavy and others very read-heavy. One big normalized structure wouldn't really account for the different "flavors" of data storage required.

    I definitely like your point about how the business users think in terms of process flow, not in terms of data-at-rest storage. Sitting in meetings with the business users and going over an ER diagram isn't necessarily the best use of their time, nor is it necessarily the best way to get business information from them.

  3. Yeah it's part of DDD that you try to work at their level figuring out the common language to reliably communicate and think about their domain with and distilling that language and the knowledge they communicate to you in the applications domain model.

    Now that you have the domain model you can build up processes around them that are valid since you interact with an intelligent domain model and represents how they think and talk about what they do.

    You could not really understand what the persistence and reporting requirements are until you have started to model the domain. Certainly RDBMS (or swiss army database) can provide you with all of your needs and an ORM can get you back and forth, but it doesn't mean that's the best option.

    In Applying DDD, Nilsson suggests starting with the domain and during the development of it take breaks to think and work on the persistence. I think this is because the domain model is really the core, but you'll need to evaluate any compromises on persistence to keep yourself realistic. Going for true persistence ignorance could dig yourself some holes.