Friday, June 22, 2012

Documenting Assumptions

When estimating and planning a project, we're often told to "document our assumptions." It certainly sounds reasonable enough, and indeed is a very important document to create. For example, if there's a requirement that users can "upload forms" to a web application, and that requirement isn't any more specific than that, then I'm going to write down a few assumptions about that requirement. I'm going to assume that the "forms" are static files (such as Word or PDF). I'm going to assume that no validation will be performed and that the files will simply be saved as-is for a person to review them. I'm going to assume that the application isn't providing the users with these files and that they're just something the user has. And so on.

Are these all of my assumptions? No, not at all. They're just the ones that I wrote down in the document.

The problem with "documenting your assumptions" arises when the document is referenced, not when it is created. It's referenced if and when something goes wrong in some way and we need to look back at our paper trail to figure out whose fault it is. (Which, when I put it that way, seems kind of... unproductive.) That's because it's a document which serves no other purpose than to cover the asses of everyone involved. It's created because we all know that something will come up during the course of the project that wasn't foreseen, and we want a paper trail indicating that it's not our fault.

And so it leads to the inevitable response... "This isn't in your documented assumptions." That's a cop-out, plain and simple. And it's a cop-out because "documenting your assumptions" is an unreasonable task. It's not unreasonable in the sense that we shouldn't do it. As I said, it's an important document. Some assumptions are critical because they help provide more insight into the situation. "PDF? Oh, no, that's not what I was picturing at all. Let's clarify so we can more accurately plan..." It's unreasonable because writing down all unknowns is a logical fallacy.

Think about it for a minute. Open up a text editor right now and write down everything you don't know. I could give you all the time in the world and the task wouldn't become any more possible to complete. You can't write down everything you don't know because you don't know what those things are.

Using such a document to share insights between team members and to uncover potential problems early in the process is productive and useful. Using such a document to cover our asses for the later inevitability that something unexpected will happen is at worst childish and at best a remnant of a strict waterfall development methodology, complete with red tape and bureaucracy.

Let's start documenting all of my assumptions for this project:
  • I assume that the client's IT infrastructure will not fail and block my work. Ok, that's reasonable. It's fair that if I'm on-site and their network goes down that I may be blocked and that it's their responsibility to unblock me.
  • I assume that the file sizes for any transferred documents will be reasonably small for uploading/downloading on the web. Again, that's fair.
  • I assume that the client wants to use a standard HTML input element (potentially styled to look better) for users to select a file. Well, why wouldn't we? Oh, someone's actually asked for something more custom than that before? Didn't they understand how web pages work? Oh, then I guess it's a fair assumption. I've just never thought to include it in one of these documents before.
  • I assume that the client's holiday/vacation schedule will not prevent me from interacting with key business stakeholders. Well... Ok. That one's a little uncommon, but I guess it makes sense when you think about it. If the client has a ridiculous vacation plan and employees are never there then I can't be expected to get anything done.
  • I assume that the building will not be overrun by jihadists. See, now this is just silly. Why is this even necessary? Well, it's an assumption. Take it or leave it.
  • I assume that the project sponsor's wife won't leave him and put him in an emotional state that makes working very difficult. Hey now, don't make this personal.
  • I assume that my wife won't leave me and put me in an emotional state that makes working very difficult. (Funny story, actually, but I digress.) Wait, how is this covering your own ass again?
  • I assume that gremlins won't... Ok, stop. This is stupid.
Sure, it is kind of stupid. But it's meant to illustrate a point. There are things we don't know, and so we ask for clarification. And then there are things we don't know that we don't know, and so we don't know to ask for clarification. Take that third one above for example. Would you have specified that? Would it even have occurred to you to ask? What other non-web things do you think a client might "have in mind" when they're writing their web application requirements for you? You might be surprised.

The fact is, we can't document everything we don't know. If you replace every request to "document your assumptions" with a request to "write down everything you don't know, including the things you don't even know that you don't know" then it becomes pretty clear how absurd the notion is.

And this is why we have agile programming. We fundamentally look at the problem differently. Instead of battling the futility of documenting everything that we don't know, we simply assume that there exist things that we don't know and we build the process around that assumption.

We're not settling for not knowing. We're not giving up on trying to know. We're just indicating openly and honestly that we recognize the fact that we don't know everything and that we will learn new things throughout the process which will cause us to change plans. (And, really, "open and honest" communication and transparency is a hell of a lot more productive than everyone trying to generate enough of a paper trail to cover their own asses.)

We will learn new things throughout the project. Instead of fighting those things from the start, embrace them when they happen. Assimilate them into the project so that we can keep moving forward. Time spent trying to document everything we don't know is wasted time. Time spent trying to figure out who we can blame for something is wasted time. Don't stall on what we don't know and just move forward with what we do know. Then adjust when the list of things we do know grows.

After all, the requirements and assumptions and all of the other documents are not reality. They are a perception of reality at a point in time. As we move forward with our efforts, we will learn more about reality. Perceptions will change. Reality, however, will not change. So let's adjust the process to make way for reality, instead of trying to fight reality to fit our own previous perceptions.

Monday, June 18, 2012

Ode to a Project

To report, or not to report, that is the question:
Whether 'tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Scope Change,
Or to take Arms against a Sea of requirements,
And by delivering end them: to write, to wait
No more; and by a write, to say we end
The head-aches, and the thousand Natural shocks
That Projects are heir to? 'Tis a consummation
Devoutly to be wished. To write to send,
To send, perchance to Complete; Ay, there's the rub,
For in that completion of projects, what dreams may come,
When we have shuffled off this mortal software,
Must give us pause. There's the respect
That makes Calamity of so long projects:
For who would bear the Whips and Scorns of extensions,
The Oppressor's budget, the proud man's Architecture,
The pangs of reported Bugs, the Code's delay,
The insolence of Meetings, and the Spurns
That consultants merit of the unworthy takes,
When he himself might his Deadline make
With a bare Laptop? Who would Installations bear,
To grunt and sweat under a weary Production Release,
The undiscovered Country, from whose bourn
No Deployment returns, Puzzles the will,
And makes us rather bear those ills we have,
Than fly to others that we know not of.
Thus Status Reports do make Cowards of us all,
And thus the Native hue of Project Status
Is sicklied o'er, with the progress cast in Yellow,
And enterprises of great pitch and moment,
With this regard their Responses turn awry,
And lose the name of Agile. Soft you now,
The Project Manager? In thy Status Reports
Be all my code remembered.

Saturday, June 9, 2012

Not All Reference Types Are Entities

I'm working on a scheduling system as part of a side-project, and I couldn't help but notice something interesting about how I'm designing it.  The system has a kind of an abstraction of events, as events aren't always simple "calendar events" in the traditional sense.

Originally the requirement was that "events need to be able to repeat."  Well, any calendar system can do that.  So I wanted to know why other calendar systems aren't meeting the need of the application.  With a little more back-and-forth in a small domain modeling exercise, it became clear that the requirement by itself didn't really state the business need.

As it turned out, events don't really need to "repeat" in the traditional sense.  More specific to the business need, events need to be able to have multiple instances within the same event.  For example, a particular "event" might happen "every day for a week" or "every Tuesday for a month" and so on.  And it can get pretty complex, such as "from 14:00 to 17:00 every Wednesday for a month and a half, except for the fourth instance because the venue has something else so that one will be on Thursday instead, and will start at 14:30 instead of 14:00."  Just making an event be "repeatable" won't cut it.

The solution is simple.  Events don't have dates and times associated with them.  Instead, Events contain a collection of Sessions which represent individual "instances" in this case.  So the Event has a name and a description and other attributes, including a Location.  It also has a collection of Sessions which each have a Start date/time, a Stop date/time, and an optional overriding Location.

So the database structure is also simple.  An Events table, a Sessions table, and even a Locations table just for some little extra normalization (and because I plan to do more with Locations in the future in this system).  As I designed the code, however, something didn't sit right with the concept of identity on these data structures.  As a matter of reflex, I included an ID on the Sessions table.  You know, so the system can uniquely identify a Session.  But... why?

I was specifically thinking back to a previous project at work where entity identity was a significant problem.  In that project, a technical decision was made at some point prior to my involvement that every table in the database will have a GUID as an ID and a software framework used that to uniquely identify all of the data entities.  This led to a pretty serious problem in the data because the business had a very different definition of "identity" for their entities.  An ID value (especially a GUID) meant nothing to the business.  They were thinking in business terms and defining what attributes of an entity identified that entity.  (Essentially, the business was thinking correctly and the technical design was artificially limiting them.)

So, what uniquely identifies a Session in my case?  Well, nothing important.  In fact, in the absence of an Event, a Session is meaningless.  My domain doesn't even need to fetch Sessions from a repository by themselves.  They should be attached as attributes to an Event when fetched from the Events repository, that's all.  They should never need to be fetched individually outside of the context of an Event.

That is... Sessions are not entities.  They are not, individually and atomically, a representation of a meaningful business concept.  They are attributes attached to an entity... the Event.  Sure, Sessions have their own table in the database.  (This is a technical concern, not a domain concern.)  They even have an ID to uniquely identify them.  (This is a technical concern, not a domain concern.)  They are even reference types in the code, not value types.  (This is a technical concern, not a domain concern.)  But they are not an entity in and of themselves.  (This is a domain concern.)

At the level of the programming language being used (C#), they are not value types.  But the business isn't concerned with the intricacies of C#.  The business is concerned with the domain.  And as far as the domain is concerned, Sessions are value types.  You don't care which Session you're talking about, and if you blow one away and replace it with another one of identical values then the two are indistinguishable.  The values are all that's important, not the unique identity thereof.

In real life, contrast this with something like a human being.  Have you ever known someone with the same name as you?  The same birthday?  The same address (like a family member)?  Any other identical attributes?  It's unlikely that you would know someone who shared all of these attributes with you, but it's not impossible.  You may need to explicitly seek out such a person and manually line up all of your attributes, but it can be done.  (Within reason of the attributes, of course.)

Does that mean the two of you are now the same person?  No, not at all.  You are unique entities.  Your attributes are simply values, they are not the entity itself.  Values are often used to identify an entity in the absence of a unique identifier.  (For example, business users may uniquely identify customers by their phone number.  This may be good enough for a particular business, even though it's possible to have collisions.  The phone number isn't the actual identity of the person, it's just a value used by the business to distinguish customers.)  But they're just values, not identity.

In this project, Events will have a unique identifier as well.  An ID column in the database, which is a simple incrementing integer.  It's likely that the business will internally identify Events differently, of course.  And the software will need to account for this.  A combination of values may be used to identify an Event, including perhaps even the collection of Sessions.

But a Session by itself doesn't need an identity.  No more so than your address needs an identity.  Your house has an identity, and the value of its address is the most common way to identify it.  But the address itself doesn't need an identity.  It's not the entity, it's just an attribute value.

This actually reminds me of another project I worked on some time ago when I was working in North Carolina.  We were modeling a fairly complex data model for a project and we brought in the database guy to help us.  He went about doing very database-y things, including standard relational normalization.  A lot of what he showed us was very helpful.  The concept of super-typing tables to achieve a kind of inheritance model in the data was new to me, for example, but made a lot of sense and made the design much cleaner and simpler.

But there was one case where his model didn't make sense to me.  Naturally, there was an Addresses table to store the addresses of various other entities.  People, client businesses, anything that had one or more addresses as an attribute of it.  And, being a relational database expert, he naturally normalized that data.  But he took it a step further.  His goal was to prevent data duplication within the Addresses table.  So if two or more other entities had the same address, they should refer to a single record in that table.

Should they?  This is where I disagreed with the design.  At the time I articulated the concern with a simple use case... Suppose two people share a mailing address.  For example, two business contacts at the same office.  One of those people moves (transfers to a different office).  So someone updates his address.  But wait... They just updated the address for both people, and for that office, and for any other entity using that address.  This is no good.

The conclusion was to simply allow data duplication in the Addresses table.  The more we thought about the technical implementation, the more inescapable that conclusion became.  There was still some mild objection to data duplication, but the objectors couldn't think of a more elegant solution.

There's a reason they couldn't.  They were trying to do something against the domain.  The domain made it very clear that addresses were not entities.  They don't exist by themselves, they don't have any individual meaning, and they don't need to have identity.  Addresses are values, not entities.  They exist only as attributes to entities.  If one is used more than once, that's ok.  If you delete one and replace it with another one of the same value, it's the same one.  (The technical implementation will need to maintain the relationship, of course, but that's a technical concern and not a domain concern.)

Just because something has a POCO in your software doesn't mean it's an entity.  Just because something has its own table in your database doesn't mean it's an entity.  The domain defines what is an entity and what is not.  The technical implementation needs to reflect the domain's definitions, not present its own definitions in the name of some convention or habit of the technical implementors.