Friday, November 2, 2012

Single Responsibility, or It's OK To Duplicate Code

The Single Responsibility Principle doesn't mean that code should "do one thing." Nor does it mean that any given piece of functionality should exist in only one place. It means that code should have one, and only one, reason to change.

But what is a reason to change?

A reason to change can be any number of things... A new requirement, a clarification, an additional feature. It can come from any part of the business. To understand what a reason to change is, we need to understand what a responsibility is. Because the responsibility is what gives us the reason to change.

The responsibility addressed in the Single Responsibility Principle is the role in the ownership of the code to which that code is responsible. It's the owner of that piece of functionality within the business. It's the person who gives you the new requirement, issuing a change to the code. And, as stated by the principle, any given piece of code should have only one owner.
The responsibility is not what the code does, it's for whom the code does it.
Let's say, for example, you're building a handful of web applications to be used by a handful of different lines of business within a company. Each line of business has its own leadership, and within each one there are departments with fairly autonomous leadership, etc. Ultimately, there is a whole slew of stakeholders each claiming some measure of ownership throughout the code.

Now, there's a piece of functionality that's somewhat shared among several of these groups. The first instinct is to avoid duplication and increase code re-use by making this functionality once and having everybody use it, right? Well, sure, that makes sense. And it's a worthy goal. But at what cost do we sometimes pursue that goal?

Note that I said it's "somewhat" shared. There are differences. At first it seems the same for everyone, so you design it once and re-use it everywhere. You build the system around this re-use. You assume it in everything you do. But then something happens. One of the lines of business adds a requirement to this module.

Now what do you do? Let's look at some common options...

1. You put conditional checks in the code so that it behaves one way sometimes and another way other times. First of all, this sounds a lot like a violation of the Interface Segregation Principle as well. You're essentially attempting to create one catch-all interface to handle multiple use cases. Those conditionals will add up, and they will make support and debugging difficult. How many different things does that one module need to do? How many code paths does it need to have? How are you going to test all of those?

2. You tell the stakeholder that they can't make that requirement because "the system wasn't designed for it." I've seen a lot of developers convince stakeholders of this nonsense. They sure as hell can make that requirement, the developer is just too lazy or too committed to his own design to actually meet the business needs, which should be his priority in the first place. The stakeholder shouldn't buy into that excuse, not one bit.

3. You make the change. And, of course, the other stakeholders for the other roles to which this code is responsible are going to see that change and wonder why someone who doesn't own their functionality just changed it. And you're going to have some explaining to do, probably laced with more excuses.

The simple fact of the matter is that the Single Responsibility Principle was violated in this case. Even if you go with the first option and the other stakeholders are kept in the dark, you still changed their module. Their use cases still need to be re-tested. (Or are you just going to assure everybody that it works and there's no need to re-test it? I hope QA doesn't buy into that nonsense.)

Because of this violation, any change to that one piece of code is going to require re-testing and re-validating a lot of stuff because there are too many responsibilities in play. And every separate business role/entity who has a stake in that code has every right to demand to know why their module was changed to support someone else's requirement.

Now, this is coming off a little overly harsh, and I don't intend it as such. The developer's mistake might not have been in the code. Not until he started to make excuses for it, anyway.

Prior to the new requirement, everything was abstracted into a common module. Duplication was eliminated, all was well. It was the advent of the new requirement that changed this. If the developer had no reason to think that this functionality would ever change then there was no reason to duplicate it. But once that requirement came in, the developer had a choice. Refactor out the new responsibility into a separated concern, or throw a quick fix at it and make excuses.

How many times have we seen the latter? How many times have we done the latter?

I know what you're thinking... Won't this lead to duplication? Isn't duplication bad? Well, yes. Duplication in code is bad. But let's define duplication:
Duplication in code happens when two pieces of code do the same thing for the same reason.
That last part is the key thing to notice here. It's not entirely duplication if it's doing the same thing for a different reason. And that reason may be that it's a different responsibility. Again, we strive throughout our careers to eliminate duplicated code... but at what cost?

Often times duplication is orthogonal to coupling. Consider the above example. That module tightly coupled those lines of business. Those individual responsibilities, those separate concerns were irreconcilably joined together. Change one, the others change with it. Sure, we can hide that change behind conditionals and such, but that doesn't fix the problem. It only creates more problems.

How far are we willing to push the cause of removing all duplication? After all, looking through my code I sure do have a lot of loop constructs. I have a ton of these (with better variable names, of course):

foreach (var thing in things)

Is that duplication? Should I strive to eliminate all of that duplication to avoid having those same keywords and keystrokes over and over again throughout the code? Of course not, that's silly. It's taking the notion to an unnecessary extreme.

So, having established that there's an unnecessary extreme... And assuming you'll stipulate that the opposite end of the spectrum, not avoiding duplication at all, is also a bad thing... We are left with a spectrum. And as with any spectrum, there's a pretty wide range of possibilities. To lean to one side and say that all duplication must be avoided is to do so at the cost of tight coupling.
Duplication is orthogonal to coupling.
It's ultimately a matter of cost. What will it cost to support that code? To overly-trivialize the cost analysis...

If we duplicate the same responsibility across multiple modules (obviously undesirable but included here for completeness) then support costs for changes to that responsibility are O(n) where n is the number of times it's duplicated.

If we religiously collapse all duplication into single pieces of globally shared code, thereby increasing coupling, then support costs for changes to any one responsibility are O(n) where n is the number of responsibilities in that code.

If, when we encounter a second responsibility, we refactor out the code which is owned by that responsibility (duplicating some things, if we count duplication by keystrokes alone), then there is an up-front development cost that's O(1) for making the change. Then ongoing costs for any given change to that one responsibility are O(1) for making the change.

I'm no mathologist, but the one-time up-front cost sounds a lot more attractive than the unknown ongoing cost. Sure, as developers the second option to abstract everything is more appealing. It's a more interesting puzzle to solve. (Sidebar: I once heard a great quote on that... "With enough levels of abstraction you never need to solve the actual problem." -Java Architect) But solving interesting puzzles isn't why we're here. You can always solve that one on your own time. (And if, in doing so, you come up with something even better for the business then kudos.) We're here to write software that supports and enhances the business.

Ok, I know the math was an oversimplification. It's just there to illustrate the point. In the long run, separating responsibilities into their own modules reduces support costs. Even at the cost of duplication. After all, it's not really duplication if two pieces of code are doing the same thing but for different reasons.

The Single Responsibility Principle and the Interface Segregation Principle are more valuable to the codebase than one developer's desire to do something clever.

Tuesday, September 25, 2012

iPhone vs. Android? Seriously?

I just had a brief and interesting conversation with a friend on Facebook about the usual iPhone vs. Android debate. This time it was different, though, because I don't think either of us exhibited any measure of "fanboyism" and instead just had a fairly calm and mature discussion on the topic. I know, what the hell, right?

But it left me thinking about the whole debate from a 10,000-foot view. It's all the usual stuff as you'd expect. Apple just released a new product, the market is abuzz, the sales are soaring, and everybody who doesn't ride that train is touting their own warez as being superior in various ways. The argument is tired, but it's important.

An interesting statement was made, though. "Android is working its way into everything." It reminded me of the various pie charts we've all seen over the past few quarters, watching Android's install base creep up past the iOS install base. And now Android fans love that pie chart. Because it proves that their system is winning, right?

Doesn't it? Or... does anybody really care?

Sure, Android has the larger install base. And it will continue to have that. But so what? Where are the profits? Where is the brand recognition? Does anybody who owns an Android device know or care what Android is? Or that Google is behind it? They don't call it Android, they call it by the device name. Take that same pie chart and add the dividing lines between the actual devices, then which piece is bigger?

On the other hand, when somebody owns an iPhone they know it's an iPhone. They know it's Apple. They know that they want to buy other things from Apple. They go to an Apple store and talk with Apple people about the Apple products that they're buying from Apple. The brand is right there in everything they do. And so are the profits.

Six years ago, Apple pulled a new product out of their collective ass. They conjured it out of thin air. Quite literally any competitor in that space could have done the same thing. Microsoft, Google, Nokia, Samsung, Sony, Blackberry, Palm, anybody. But they didn't. Apple did. And that one product unquestionably turned the mobile phone market on its ear. Overnight everybody suddenly had a lot of catching up to do.

And they've caught up. Good for them. Competition is a good thing. Well, at least the technology has caught up. But the money sure hasn't. Look at what Apple had going at the time...

Rewind six to eight years and look at the iPod. How close was that to 100% market saturation for the MP3 player consumer base? It was a household word, for God's sake. There were tons of other MP3 players out there, but who cared? They were all "non-iPods." The iPod was the market leader by a wide margin. That one simple product revolutionized and dominated the MP3 player market.

Then Apple rode that capital into the iTunes Music Store. The music industry in general was a bit more complex and had bigger players in it, but Apple still rocked that boat. The people had spoken, and what they said was that they wanted to download music. Everybody else in the industry fought the market, while Apple capitalized on it. They opened up the iTunes Music Store, told the music industry that there's a new set of rules in town, and the market went with it. Profits soared.

Then Apple rode that capital into the iPhone. This one single product has since dwarfed the entire corporate empire of Apple's longest running rival, Microsoft. And it didn't take long for that to happen, as opposed to the decades Microsoft has been riding the enormous former gap between the two. And again, it was a revolutionary product. Again, overnight everybody had a lot of catching up to do.

And while everybody was catching up in that market, Apple had a few more tricks up their sleeves. Next in line was the Macbook Air. Sure, it didn't have quite the impact that the iPhone had. Not by a long shot. But what did it do to the ultra-light laptop market? Was there even an ultra-light laptop market before that? Sure there was, but it wasn't serious. Then that Air came out, and it was beautiful. The price was at a premium though, mostly because it was the first of its kind and the technology (SSDs anyone?) was pretty pricey at the time. And what has the market done since then? Have you looked at Dell's ultra-light laptops recently? They look remarkably similar to Airs. Everybody's does. Because they're playing catch up.

Ok, let's let that market play catch up for a while. In the meantime, we've got an iPad to release. A what now? An iPad. You know that fledgling tablet market that everybody's been trying to open up for years? Apple opened it up. Wide. And dominated it. Just as revolutionary to the tablet market as the iPhone was to the mobile phone market, the iPad quickly became the standard against which all others would be measured. And all others have been playing catch up ever since. Have you used any of those "competing" tablets? Most of them are pretty awful. And so another entire market/industry was turned on its ear.

I'll grant you that the iPhone 5 isn't revolutionary. It's a pretty standard upgrade from the iPhone 4S. (I'm still buying one, of course, mainly because I don't have a 4S. I have a 4. And the difference between the 4 and the 5 is staggering.) I'll grant you that Android has a wider install base. I'll grant you that Windows 8 and Windows RT are exciting and innovative. (About time, Microsoft. The last time this happened was Windows 95.) I'll fully stipulate that Apple didn't revolutionize any markets or turn any industries on their ear this year.

But, again, so what? Do you really expect them to change the world every year? Feel entitled much? Ok, so maybe this post reads like Apple fanboyism, and maybe the language I've used supports that. I'll admit to being a fan of Apple products. I enjoy the brand, and I'm fairly loyal to it. And isn't that the point? Don't take my word for it, don't even take those various install base pie charts' words for it. Listen to the one thing we're all in this for... Listen to the money. Riding wave after wave of revolutionary product, building a line-up that dominates everything it touches and a brand that permeates the very lives of its legions of customers, Apple has risen to become the most valuable corporate entity in the history of capitalism.

If that isn't the bottom line, what is?

So, sure, Apple didn't innovate much this year. Their captain died at the helm. Wouldn't it be somewhat noble of us to grant them quarter before continuing the fight? Ok, while that was a nice analogy (and I've been looking for an opportunity to work "granting quarter" into a conversation for a while now), it doesn't really matter. They're still riding the biggest wave around.

This year they're leaning pretty heavily on the brand and the marketing. So? They built that wave, let them ride it for a moment. They're still on top. A single product release didn't send shockwaves rippling through the industry, but it's a bit early to be sounding the death knell for anybody sitting on that kind of capital. The anti-Microsoft crowd (myself included, of course) has been trying to toll that bell for years. But that mountain of money is tough to move. And cute little toys like Linux just don't have the market force for move it.

Speaking of Linux, isn't that what Android is made of? Ah, yes, that's what we were talking about. Android devices vs. Apple devices. Or, again, a vast sea of varied devices vs. a unified and immensely successful brand. Is that really a competition?

Technologically there are plenty of devices which compete with or even surpass Apple's devices, at least by some measures. Maybe not all, but some. I've still yet to find one that has the same... je ne sais quoi... as Apple's devices. They compete or even surpass on a piece here and there, but you'd be hard pressed to demonstrate one that presents as compelling an overall experience. I guess the price points help with that, though. The competing ones are cheaper. So if you want something with a better Feature X than an iDevice, you can find one at a better price. As long as Feature X is all you care about, I guess.

But I'll say again... so what?

Remember that scene at the end of Pirates Of Silicon Valley when Steve Jobs realized that Bill Gates had beaten him to market saturation?  That Microsoft had won?
Steve Jobs: We're better than you are! We have better stuff.
Bill Gates: You don't get it, Steve. That doesn't matter!
My how the tables have turned.

You Can't Help Everybody

I love Stack Overflow, that much is certain. But why? What is it that makes Stack Overflow such an attractive place for professional software developers? What is that quality contained therein which just makes the experience... better?

It's the noise to signal ratio, hands down. I've been on forums, I've been in newsgroups, I've subscribed to email lists, etc. They all suffer from the same problem... noise. And lots of it. Where does this noise come from? Well, the Internet of course. But these things are on the internet, just like Stack Overflow is. So how is it that they get all of this noise and we don't?

There's something about the Stack Overflow community which, while not acting as a "walled garden" by keeping the community close and private, does present a barrier to entry for noise. It's a very simple yet radically unconventional approach to an online community. Quite simply, not everybody is welcome at Stack Overflow.
Not everybody is welcome at Stack Overflow.
I told you it was radically unconventional. It even sounds rude, doesn't it? But it's true, and it works. Don't get me wrong, we welcome everybody by default. But not everybody belongs there. And the community does a fantastic job chasing away unwelcome participants.

Throughout other online communities, numbers are important. More users means more advertising revenue, more content, more incentive for even more users, etc. So not only is everybody welcomed, but the maintainers do whatever they can to try to cater to everybody. Adding countless features, making countless modifications, etc. And what ends up happening is that they lose the focus of the community itself. They dilute the purpose in order to try to cater to a wider audience.

By trying to be everything for everybody, they end up being nothing of substance to anybody.

Stack Overflow takes a different approach. We vehemently defend the core purpose of the community, and while we welcome anybody who wants to participate we also reject anybody who doesn't want to be a part of that core purpose. The purpose itself is simple... To provide a place for software developers to ask questions and get answers about the software they're developing.

To illustrate, I came across an interesting example today of somebody who isn't welcome in our community. Take a look:
To be fair, some of the comments came off as a bit rude. We do have a problem with that. But this is hardly the worst of it. The real problem here, clearly, was the person asking the "question." They didn't want to clarify, they didn't want to try to work toward a solution to their problem, they just wanted to argue.

This person was a clear example of a help vampire. They were more interested in arguing about the question than in improving it, and it desperately needed improvement. Where does this sense of entitlement for help from others originate? The Internet, I guess.

So, not having received... whatever it is this person was looking for, they rage-quit. And that's ok. Nobody from Stack Overflow is going to contact them. Nobody is going to chase them down and try to make nice. The community will do just fine without them. Essentially, the creation of content like this is unwelcome. (Though I can at least give the user a tip of my hat for pro-actively deleting the question, otherwise we would have had to clean up after them.)

Is the user now no longer welcome at Stack Overflow? That's up to the user. It's really the content that's unwelcome. So if the user continues to submit such content then the user would demonstrate themselves as being unwelcome. If the user improves the content, then the user becomes welcome. The point is that it's the user's decision, not ours. We simply represent the community and the standards therein. The user can participate or move along. (Or rage quit, if they'd rather do that.)

We're not out to help everybody. We're out to help each other. Anybody is welcome to be a part of that if and only if they have the same purpose... to help each other.

Monday, September 10, 2012

Clean Code Screencast

I've finally finished recording and moderately editing my first screencast. As I mentioned in a previous post, this was considerably more difficult and time-consuming than expected. By comparison, giving a presentation in front of a live audience is much easier and far less aggravating. I had to break this into segments because doing it all in one take just wasn't possible, and then edit those segments together after multiple takes of each.

The outcome also isn't as clean as I'd like to get to at some point.  There are some audio problems which I've identified and need to work on.  I also don't like the preview image that YouTube is using by default.  Overall there's touch-up to be practiced over time.

In any event, it's done. This was originally something I'd worked on with a colleague or two and the original version was internal to my employer. Having been well-received and having been presented again to another internal audience and again being well-received, I decided to clean up the whole thing and start to build what might be a small series of these.

The presentation itself is a simple introduction to Clean Code, inspired by Robert C. Martin's book of the same name. It's a pretty simple and straightforward review of some of the core concepts presented in the book. For much more in-depth information (from a better public speaker with better videos) I highly recommend Uncle Bob's Clean Coders video series. They're not free, but a personal license is worth every penny.

There are additional presentations I'm working on as well, and I hope to continue to practice these screencasts over time. But for now just finally finishing this one is a huge sigh of relief for me. Lessons have been learned, practice has been had, and I hope to get better at this.

So without further ado, here's the screencast. Enjoy!

Thursday, September 6, 2012

Testing Private Methods

You shouldn't test private methods.

There, wasn't that simple? Perhaps too simple, so allow me to explain...

This debate seems to pop up on the internet and amid various developer groups from time to time. Should we test private methods? It's a simple yes-or-no question (or, even simpler, it's a "no" question), and there seems to be an encampment of zealous opinions on both side. I, of course, am no exception to that. The thing is, I've yet to hear any good argument in favor of such practice. And I don't believe there is one.

First of all, consider the argument that we should "test everything." With that statement I wholeheartedly agree. But how do you define "everything"? Certainly you should test all outward-facing (read: public) functionality. So, by extension, this would also test all private functionality, would it not? If you have private functionality that isn't being used by the public functionality... Then why do you have it? If nothing is using it, get rid of it.

Suppose you do test your private methods, perhaps through some trickery of the language or some reflection of assemblies of whatever. Then what do you do if your implementation changes?

Let's say you have some repository which uses a database and you write tests against that functionality. But the repository also has private helper methods for handling common database tasks, so you write tests against those as well. Now let's say you swap out that repository implementation with one that saves to an XML file. The public functionality should be identical so as to not be a breaking change to the rest of the code. But it is a breaking change to the tests. The private helper methods are all different, so the tests need to be updated in order to test the same interface.

This, of course, is unacceptable.

Private is just that... private. You mark something as private when you feel that it's a detail which needn't be known to the rest of the code. It's not part of the interface. Well, the tests are part of the rest of the code. They're not special. They're not some one-off thing that hangs onto the side of the codebase. They're classes and methods like anything else. If something is private, your tests shouldn't know about it. And if they shouldn't know about it, then certainly they shouldn't try to directly use it.

Seriously, can anybody present a compelling reason why a private method should ever be tested? Why code which by design is entirely unknown should somehow be examined?

Thursday, August 30, 2012

Recordings Are Coming, I Promise

I've given this Clean Code presentation of mine at work a couple of times. It's pretty simple, just a sort of intro into Robert Martin's book, and it's mainly targeting junior developers. But it's been very well-received at work and so I figured I'd do a screencast of it to share more widely. Indeed, I've been wanting to do screencasts of other things as well, so this will be a good first step into that medium.

As it turns out, this is really bloody difficult to do. I've been recording take after take, always finding something wrong. Editing isn't exactly a perfect process either, since I'm not satisfied with it if it doesn't seamlessly line up. So I continue to try these takes over and over.

A phone rings in the background, the kids come into the room and require attention, I stammer or make a mistake, the battery on my presentation tool runs down, I try taking the laptop somewhere else to do a quick recording attempt but the battery runs low, other system notifications pop up, etc.

This is hard. By comparison, actual public speaking is a piece of cake. You just do it and it's done. But privately recording something for public consumption, that's a whole other beast entirely. When standing up in front of an audience, a stammer quickly becomes a forgotten thing of the past or a mistake can be corrected. On a persisted medium like a video, that's not acceptable. It's there forever for all to see. Perfection is more critical.

This is going to take longer than I expected.

Friday, August 17, 2012

Addresses in a Post-Google World

It comes up a lot in business applications... How should we design a form to accept addresses? We see all kinds of examples. There's the baseline simple approach:
  • Address
  • City
  • State
  • Zip
Then of course people want to add a second address line, so you end up with the very poorly-named variant:
  • Address1
  • Address2
  • City
  • State
  • Zip
And then, almost inevitably, it spirals out of control from there. Should we separate street number from street name? Street type? County? Country? Should we pre-populate from a table of zip codes? Where do we get that data? How do we keep it up to date? What about other address elements? What about international addresses? Strange middle-of-nowhere addresses that don't fit this model? How does the post office do it? Can we copy their format? Can that format work without the miles of red tape and bureaucracy to support it?

It goes on and on. And, thinking about it now, it's all terribly silly. Why are we still trying to solve this same problem over and over? Every company ends up with its own formats, every format gets broken by some edge case at some point, etc. It ends up being a lot of work, and what's the gain?

Is there a simpler, more universal way?

Well, yes. Google already did it. (And Bing, and MapQuest, etc.) The form is actually quite simple and intuitive:
  • Address
Done. That's it. The user knows how to type their address. They've done it a million times. You don't have to instruct them on the various components that you think their address should have. Just let them type what they've written for years. And you have their address. The address they use as they know it.

But what about the components? What if I want to report on this data somehow and filter records by state or zip or some other metric? Well, that's your problem. It's not the user's problem. Don't give them an ugly form that might not even work for them just because you have some back-end requirement that isn't meaningful to their user experience.

This doesn't mean you can't get this granular data. It just means that you don't need to get it from the user. There are plenty of address locating and geocoding services out there which will give you a hell of a lot more structured data than whatever random business user you assigned to this project would have designed in their bad UX form. Get this data behind the scenes. Don't trouble the user with it.

For example, let's look at the data you get from Google when searching an address:
{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "1600",
               "short_name" : "1600",
               "types" : [ "street_number" ]
            },
            {
               "long_name" : "Amphitheatre Pkwy",
               "short_name" : "Amphitheatre Pkwy",
               "types" : [ "route" ]
            },
            {
               "long_name" : "Mountain View",
               "short_name" : "Mountain View",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Santa Clara",
               "short_name" : "Santa Clara",
               "types" : [ "administrative_area_level_2", "political" ]
            },
            {
               "long_name" : "California",
               "short_name" : "CA",
               "types" : [ "administrative_area_level_1", "political" ]
            },
            {
               "long_name" : "United States",
               "short_name" : "US",
               "types" : [ "country", "political" ]
            },
            {
               "long_name" : "94043",
               "short_name" : "94043",
               "types" : [ "postal_code" ]
            }
         ],
         "formatted_address" : "1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA",
         "geometry" : {
            "location" : {
               "lat" : 37.42310540,
               "lng" : -122.08239880
            },
            "location_type" : "ROOFTOP",
            "viewport" : {
               "northeast" : {
                  "lat" : 37.42445438029150,
                  "lng" : -122.0810498197085
               },
               "southwest" : {
                  "lat" : 37.42175641970850,
                  "lng" : -122.0837477802915
               }
            }
         },
         "types" : [ "street_address" ]
      }
   ],
   "status" : "OK"
}
That sure is a lot of very structured data. Was your random business user who defines the requirements for that form going to be able to get all of that out of the user? What would that form have looked like? It even gives back a "formatted address" in response to something otherwise unformatted based on what it found. So even if the user puts in something that looks messy you can replace it with something cleaner. You even get coordinates for God's sake.

Seriously, stop having your users enter their address components separately. It's as bad as giving them multiple text inputs for the components of a phone number. You don't need that level of granularity in your data, and if you do there are much more comprehensive and much more reliable services to give you that data based on what your users enter into the form.

Sunday, July 22, 2012

Pick Any Two Constraints

Most of us are familiar with an industry adage:
Faster, cheaper, better... Pick any two.
The idea is simple. You want something soon (faster), you don't want to spend a lot (cheaper), and you want it to be a quality product (better). However, you're not going to get everything you want. Reality doesn't work that way. And so you choose the two that you want more and settle for those.

  • If you want your software delivered quickly and to be of high quality, it won't be cheap.
  • If you want your software cheap and of high quality, it won't be delivered soon.
  • If you want your software delivered quickly and cheap, it won't be of high quality.
I've encountered plenty of people in the industry who are familiar with this, but for some reason it never sinks in with customers. Maybe project sponsors are just too accustomed to getting everything they want that they can't imagine having to pick something to sacrifice? That's a cop-out answer, so no. Or, perhaps we're not communicating the concept in a way that they truly understand? This is a much more productive approach, because it puts the onus on us to better facilitate that communication, which is our professional responsibility.

So instead of choosing which two you want, instead think of it like this:
Schedule, budget, quality... Choose the two by which the project is constrained.
Unless the project leadership has a complete and unerring understanding of the entire project (which I've never seen happen, so let's just assume for a moment that it's not realistic), the project will require some "wiggle room." In this triangle of constraints, it's going to move along an axis somewhere:


The good news is, you get to decide where:


So which two of the three axes will be fixed for the duration of the project, and which one will be allowed to fluctuate? That's for the business to decide. The most important part is that this be communicated explicitly and without confusion before work begins. It's easy for assumptions to be made when communication isn't explicit, and attempting to scope and document those assumptions is a fool's errand.

Ah, but there's a catch. (Isn't there always?) One of these three axes is a lot more difficult to quantify than the other two. How do we, as an industry, measure software quality? What metrics do we employ? It's a bit of a difficult question, isn't it? Now take it a step further and try to answer that question in a way that a non-developer can understand. We can talk about unit tests and code coverage and continuous integration all day long, but a project sponsor isn't going to hear us. "It's all Greek to him" and MEGO Syndrome sets in quickly.

This is a fair response. After all, how much attentiveness and interest is the developer going to be able to maintain if the project sponsor steers the discussion toward budget forecasts and marketing predictions? I can assure you that the moment somebody breaks out some spreadsheets to talk about finances, my attention span is entirely dissolved. (Some of you may enjoy such topics, of course. But I'm sure there are others which thoroughly glaze your eyes and put you to sleep. So the point remains.)

Ay, there's the rub. And I wish I had an answer right now, but I don't. Not yet anyway. So the challenge is ours to defend our axis of quality. However, the decision itself still belongs to the business. Just because we can't quantify it doesn't mean we should sweep it under the rug.
If you constrain the project by schedule and budget, quality will slip.
Plain and simple. The stakeholder(s) will ask us to quantify that, and right now we can't. (If you can, please speak up. The industry needs to know what you have to say.) We can quantify schedule, and we can quantify budget. Those are pretty clear and understandable metrics. And so (unerringly in my career so far) the business will choose those two metrics as the fixed axes. Those are what they understand, so those are what they choose to control.

The quality axis isn't in their scope of control, so choosing that one would rely them very heavily on you to control it. It would leave them with no guarantee that they're controlling any more than one axis, which is no comforting situation at all. So, again, the onus is on you to communicate and "sell" this idea. At least for now.

The critical piece, however, is that the idea is at least communicated. The business has every reason and every right to sacrifice quality in order to meet schedule and budget. It's unfortunate, and it leaves a sour taste in our mouths as engineers, but it's their decision and not ours. Don't make the decision for them. Don't speak only in terms of two constraints.

As professionals it's not only our responsibility to craft a quality product, it's even more importantly our responsibility to communicate the risks to that quality when discussing the product with the business. And, conversely, as professionals it's the responsibility of the business to understand the reality of what we're communicating (provided we can communicate it effectively.) Or, to put it another way... They won't be mad if you set the expectation. They'll be mad if you make their decision for them.

Saturday, July 14, 2012

Observations of an Offshore Team

There's certainly no shortage of offshore teams in software development. The stereotype of an Indian sweatshop of programmers and the garbage code they produce is well known throughout the industry. And I'd be lying if I said I hadn't seen that stereotype in action again and again. But I can't let that sour my perception of all offshoring in general. Particularly because my company now has an office in the Philippines for providing low-cost development work.

First of all, the Philippines is a far cry from India. The country and the people are vastly different in every way. I'm not going to compare apples and oranges, and besides that's outside the scope of these observations anyway. I was in the Philippines, not in India. So I can't speak to the latter. But hopefully what I've learned in working with the former can be applied somewhat universally.

The difference between working remotely with an offshore team and actually going there is profound. The perceptions, from both sides of the proverbial fence, change entirely when sitting in the same room. And I can't recommend the experience enough for any company which partners with an offshore team. Naturally, it would be cost-prohibitive to always work face-to-face in this situation. But sending at least one or two representatives to work directly with the team makes all the difference in the world.

If any company or team expects that they can simply toss requirements over a fence and expect production software to be tossed back over at a specified date, they're deluding themselves. It's an absurd notion to begin with, and years of industry headaches have confirmed that. Just being able to have casual conversations over lunch or dinner regarding the business domain and intent of the project can save untold costs in what would otherwise have been software that didn't capture the actual business intent. Understanding the why is vastly more important to software development than meeting a set of defined specifications.

There's also the psychological aspect to it. A team is more of a team when they know each other. Prior to this trip, everyone in the Manila office was a name and sometimes a voice, nothing more. An IM window, an email, a conference call... That was the extent of it. Now they're not just names on a form somewhere. Now I dare say I can call them friends. Or at the very least colleagues. A forced workplace joke over a conference call pales in comparison to a Friday night out to grab a drink or two. The social aspect of the team helps us to communicate and to understand each other more readily. The "fence" is torn down.

(As an aside, I regret to point out that I wasn't able to socialize nearly as much as I had hoped. So the fence is still partially there. Oddly enough, I ended up socializing more with members of adjacent teams than with my own, which is a bit unfortunate. But even going out to lunch with my team from time to time was still markedly better than IMs and emails.)

But even just the perception from "our side of the fence" entirely changes when you go there and meet the people on the other end of the emails. It's one thing to memorize someone's name, but it's another thing entirely to shake their hand and have a conversation with them. It's one thing to be numerically aware of the time zone difference, but it's another thing entirely to work the same shift as them and experience what it's like to have time-shifted hours. (After all, it's easy for us to label a rowdy background din on a conference call as being unprofessional. But honestly, what do you think an office of software developers who are all friends with each other is going to be like after 9:00 PM?)

Then there's the change in perspective on the little things around the office. What does the office look like? How is it laid out? What sort of equipment are they using? What assumptions do we have about our own "normal" office environments actually translate to that office, and what assumptions don't? You might be surprised.

For example, there's a labor/tools anti-symmetry between office environments in the US and many office environments abroad. In fact, this anti-symmetry is specifically why companies in the US delegate work abroad. To put it simply, and hopefully not disrespectfully... people are cheap. In the US if there's something slowing down a team and it can be solved by the purchase of some technology, the choice is clear. Don't let anything get in the way of the people. Tools are disposable. But in many offices abroad, tools are expensive (more expensive than they are here, actually, due to varying market forces). But people are easy to find and easy to replace.

It's harsh, and it's distasteful to me, but it's a reality that pervades the industry. I like to think that my company is different, and we're aware that it's going to take some time to convince our new team members of that fact. As I've said many times before, I hate referring to people as "resources." People are individuals. They are contributors. They are team members. They are not "resources." And actually meeting your team and getting to know them helps a lot to make that distinction.

I really like our team. And I truly believe that there's real development talent there just waiting to be nurtured. I don't want to simply delegate tasks to them. I want to work with them. And that difference, I believe, will make all the difference in the resulting software. Maybe not on the first project, maybe not on the second, but in the long run.

Saturday, July 7, 2012

The 90/9 Rule

We've all heard of the 80/20 rule, right? It feels like it's one of the most commonly-cited principles in the software development industry, in my experience. And there are always subtle flavors of it when cited. More often than not, it seems like it's in reference to "getting bang for your buck" on a software project.

And getting "bang for your buck" is a rather important concept in business software. Purists like myself are always compromising to the almighty dollar. And the reason for that is simple... We don't own the software we write. The person who writes the checks makes the rules, plain and simple. And that person is very interested in quantifying the value of what they get for the money they spend.

To help illustrate "bang for your buck" in the world of software, I've come up with a rule of my own. I call it "The 90/9 rule." And it's essentially like Zeno's Paradox of Achilles and the Tortoise applied to software development. Think of it as such:

  • For a reasonable expense, you can achieve 90% of your intended goal.
  • To achieve 90% of the remainder (thus, 99% total), double that expense.
  • To achieve 90% of the remainder (thus, 99.9% total), double that expense.
  • ad infinitum...
A key thing to notice is that it never reaches 100%. No matter how much expense one incurs, one will always have a compromise somewhere. The only question one needs to ask is where one is willing to draw that line between expense and intended results.

This isn't to say that we're not going to deliver complete and valuable functionality. It just means that there will be some things which the business intended or imagined that won't quite fit into the project. It always happens. And I dare say that a primary root cause of delays and shoddy results on a project is when the business won't let go of that full 100% intent. They won't let the reality of what's happening alter their perception of what they want to happen.

Considering the 90/9 rule in a project brings that part of reality to light. By accepting the balance of expense vs. intent, the business can (should) more effectively manage expectations and focus the spending of that expense on the 90% that's realistically feasible within the scope of that expense. And, if more is needed, the business can balance that against the added expense.

As an illustration of this rule, consider the simple matter of system availability, or uptime:

  • 90% (one nine) is pretty easy to achieve and definitely within a reasonable expense.
  • 99% (two nines) is still reasonable, but requires a little more expense. A proper server vs. some resurrected old laptop for example.
  • 99.9% (three nines) is going to require considerably more. That's less than half a day of downtime per year. Consumer hardware at all probably won't cut it, nor will a consumer internet connection on which the server sits, etc.
  • 99.99% (four nines) is getting really expensive now. That additional small amount of availability (a difference of a few hours of downtime in a given year) is going to require redundancy across multiple sites.
  • 99.999% (five nines) will require solid redundancy. The expense to guarantee this level of availability is no small matter at all.
  • and so on...
The exponential growth in expense is easy to see. What's also easy to see is that 100% isn't on the scale. 100% doesn't exist. No amount of expense is going to account for every possibility. You can get very close to 100%, but you won't reach it. There will be a risk somewhere.

The illustration of high availability focuses on hardware, but the same holds for the availability of the software. How solid, validated, and bug-free does 90% reliable software need to be? How solid, validated, and bug-free does 99.999% reliable software need to be? You can imagine the amount of planning, testing, validating, refactoring, and overall hardening of the system that would be required to achieve that level. And you can imagine how expensive it's going to be, bringing in experts from every field used by the software (architecture, infrastructure, database, UI, etc.) in order to achieve that level.

Think about the 90/9 rule on your current project or your next project. How well does it map to your expectations vs. the project sponsor's expectations?

Monday, July 2, 2012

No Time To Test?

I recently had a conversation with a colleague and we briefly touched on the subject of TDD. It wasn't the focus of the conversation at all, so I didn't want to derail things when he said this, but I can't get past what was said...
Clients rarely want to pay for the extra time to have tests.
I'm not at all shocked by the statement. Quite the contrary, I'd be pleasantly surprised to find anything else. But the commonality of the statement alone doesn't justify it. If our clients are refusing to pay more for test suites in the software that we write, then it's a failure on our part to properly address the issue.

We all know how it goes. A client has a project they want done; We put together an estimate with a rough breakdown of the work; And inevitably we need to cut off a few corners to get the numbers to line up with expectations. What's the first thing to get dropped? "Well, our users can test the software, so we don't need all this testing in the project budget."

When a statement like that is uttered, this is what a developer hears:
We don't need you to prove that your software works. You can just claim that it does.
Seriously, that's exactly what I hear. But of course it never goes down like that. What does end up happening? We all know the story. Bugs are filed, users are impatient, proper QA practices aren't followed, and so on and so on. Time adds up. We spend so much time chasing the "bugs" that we lose time constructing and polishing the product itself. So the end result is something that barely limps over the finish line, held up by a team of otherwise skilled and respected people who can maintain the thin veneer of polish just long enough, already in dire need of clean-up work on day one.

So tell me again how we didn't have time to validate our work.

Recently I've been committing a lot of my free time to a side project. It's a small business (which in no way competes with my employer, mind you, for the record) where I've been brought in to help with technology infrastructure and, for the most part, write some software to help streamline operations and reduce internal costs. Since I essentially have carte blanche authority to define everything about this software, I'm disciplining myself to stick with one central concept... TDD

I'm not the best at it, and I'll readily admit to that. My unit tests aren't very well-defined, and my code coverage isn't as good as it could be yet. But I'm getting better. (And that's partly the idea, really... to improve my skills in this area so that I can more effectively use them throughout my professional career.) However, even with my comparatively amateur unit testing (and integration testing) skills, I've put together a pretty comprehensive suite on my first pass at the codebase.

And you know what? It honestly makes development go faster.

Sure, I have to put all this time and effort into the tests. But what's the outcome of that effort? The code itself becomes rock-solid. At any point I can validate the entire domain at the click of a button. Even with full integration tests which actually hit databases and external services, the whole thing takes only a few minutes to run. (Most of that time is spent with set-up and tear-down of the database for each atomic test.)

What do I need to work on in the code right now? Run the tests and see what fails. What was I doing when I got interrupted yesterday and couldn't get back to it until today? Run the tests and see what fails. Everything passes? Grab the next task/story/etc. The time savings in context switching alone are immense.

Then there's the time savings in adding new features. Once I have my tests, coding the new feature is a breeze. I've already defined how it's supposed to work. If I find that it's not going to work like that, I go back to the tests and adjust. But the point is that once it's done, it's done. So often in our field we joke about the definition of the word "done." Does that mean the developers are handing it off to QA? That QA is handing it off to UAT? That the business has signed off? With TDD it's simple. All green tests = done. I finish coding a feature, press a button to run the full end-to-end automated integration suite of tests, take a break for a couple of minutes, and it's done.

And what's more, the tests are slowly becoming a specification document in and of themselves. Test method names like AnEventShouldHaveAParentCalendar() and AnEventShouldHoldZeroOrMoreSessions() and AnEventShouldHaveNoOverlappingSessions() sound a lot like requirements to me. And I keep adding more of these requirements. Once in a while, when developing in the domain, I'll realize that I've made an assumption and that I need to write another test to capture that assumption. How often does that happen in "real projects"? (Sure, you "document the assumption." But where does that go? What effect does that have? I wrote a test for it. If the test continues to pass, the assumption continues to be true. We'll know the minute it becomes false. It's baked into the system.)

Think about it in terms of other professions... Does the aircraft manufacturer not have time to test the airplane in which you're flying? Does the auto mechanic not have time to test your brakes that he fixed? Or, even closer to home with business software, does your accounting department not have time to use double-entry bookkeeping? Are you really paying those accountants to do the same work twice? Yes, yes you are. And for a very good reason. That same reason applies here.

I've been spouting the rhetoric for years, because I've known in my heart of hearts that it must be true. Now on this side project I'm validating my faith. No time to test? Honey, I don't have time not to test. And neither do you if you care at all about the stability and success of your software.

Friday, June 22, 2012

Documenting Assumptions


When estimating and planning a project, we're often told to "document our assumptions." It certainly sounds reasonable enough, and indeed is a very important document to create. For example, if there's a requirement that users can "upload forms" to a web application, and that requirement isn't any more specific than that, then I'm going to write down a few assumptions about that requirement. I'm going to assume that the "forms" are static files (such as Word or PDF). I'm going to assume that no validation will be performed and that the files will simply be saved as-is for a person to review them. I'm going to assume that the application isn't providing the users with these files and that they're just something the user has. And so on.

Are these all of my assumptions? No, not at all. They're just the ones that I wrote down in the document.

The problem with "documenting your assumptions" arises when the document is referenced, not when it is created. It's referenced if and when something goes wrong in some way and we need to look back at our paper trail to figure out whose fault it is. (Which, when I put it that way, seems kind of... unproductive.) That's because it's a document which serves no other purpose than to cover the asses of everyone involved. It's created because we all know that something will come up during the course of the project that wasn't foreseen, and we want a paper trail indicating that it's not our fault.

And so it leads to the inevitable response... "This isn't in your documented assumptions." That's a cop-out, plain and simple. And it's a cop-out because "documenting your assumptions" is an unreasonable task. It's not unreasonable in the sense that we shouldn't do it. As I said, it's an important document. Some assumptions are critical because they help provide more insight into the situation. "PDF? Oh, no, that's not what I was picturing at all. Let's clarify so we can more accurately plan..." It's unreasonable because writing down all unknowns is a logical fallacy.

Think about it for a minute. Open up a text editor right now and write down everything you don't know. I could give you all the time in the world and the task wouldn't become any more possible to complete. You can't write down everything you don't know because you don't know what those things are.

Using such a document to share insights between team members and to uncover potential problems early in the process is productive and useful. Using such a document to cover our asses for the later inevitability that something unexpected will happen is at worst childish and at best a remnant of a strict waterfall development methodology, complete with red tape and bureaucracy.

Let's start documenting all of my assumptions for this project:
  • I assume that the client's IT infrastructure will not fail and block my work. Ok, that's reasonable. It's fair that if I'm on-site and their network goes down that I may be blocked and that it's their responsibility to unblock me.
  • I assume that the file sizes for any transferred documents will be reasonably small for uploading/downloading on the web. Again, that's fair.
  • I assume that the client wants to use a standard HTML input element (potentially styled to look better) for users to select a file. Well, why wouldn't we? Oh, someone's actually asked for something more custom than that before? Didn't they understand how web pages work? Oh, then I guess it's a fair assumption. I've just never thought to include it in one of these documents before.
  • I assume that the client's holiday/vacation schedule will not prevent me from interacting with key business stakeholders. Well... Ok. That one's a little uncommon, but I guess it makes sense when you think about it. If the client has a ridiculous vacation plan and employees are never there then I can't be expected to get anything done.
  • I assume that the building will not be overrun by jihadists. See, now this is just silly. Why is this even necessary? Well, it's an assumption. Take it or leave it.
  • I assume that the project sponsor's wife won't leave him and put him in an emotional state that makes working very difficult. Hey now, don't make this personal.
  • I assume that my wife won't leave me and put me in an emotional state that makes working very difficult. (Funny story, actually, but I digress.) Wait, how is this covering your own ass again?
  • I assume that gremlins won't... Ok, stop. This is stupid.
Sure, it is kind of stupid. But it's meant to illustrate a point. There are things we don't know, and so we ask for clarification. And then there are things we don't know that we don't know, and so we don't know to ask for clarification. Take that third one above for example. Would you have specified that? Would it even have occurred to you to ask? What other non-web things do you think a client might "have in mind" when they're writing their web application requirements for you? You might be surprised.

The fact is, we can't document everything we don't know. If you replace every request to "document your assumptions" with a request to "write down everything you don't know, including the things you don't even know that you don't know" then it becomes pretty clear how absurd the notion is.

And this is why we have agile programming. We fundamentally look at the problem differently. Instead of battling the futility of documenting everything that we don't know, we simply assume that there exist things that we don't know and we build the process around that assumption.

We're not settling for not knowing. We're not giving up on trying to know. We're just indicating openly and honestly that we recognize the fact that we don't know everything and that we will learn new things throughout the process which will cause us to change plans. (And, really, "open and honest" communication and transparency is a hell of a lot more productive than everyone trying to generate enough of a paper trail to cover their own asses.)

We will learn new things throughout the project. Instead of fighting those things from the start, embrace them when they happen. Assimilate them into the project so that we can keep moving forward. Time spent trying to document everything we don't know is wasted time. Time spent trying to figure out who we can blame for something is wasted time. Don't stall on what we don't know and just move forward with what we do know. Then adjust when the list of things we do know grows.

After all, the requirements and assumptions and all of the other documents are not reality. They are a perception of reality at a point in time. As we move forward with our efforts, we will learn more about reality. Perceptions will change. Reality, however, will not change. So let's adjust the process to make way for reality, instead of trying to fight reality to fit our own previous perceptions.

Monday, June 18, 2012

Ode to a Project

To report, or not to report, that is the question:
Whether 'tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Scope Change,
Or to take Arms against a Sea of requirements,
And by delivering end them: to write, to wait
No more; and by a write, to say we end
The head-aches, and the thousand Natural shocks
That Projects are heir to? 'Tis a consummation
Devoutly to be wished. To write to send,
To send, perchance to Complete; Ay, there's the rub,
For in that completion of projects, what dreams may come,
When we have shuffled off this mortal software,
Must give us pause. There's the respect
That makes Calamity of so long projects:
For who would bear the Whips and Scorns of extensions,
The Oppressor's budget, the proud man's Architecture,
The pangs of reported Bugs, the Code's delay,
The insolence of Meetings, and the Spurns
That consultants merit of the unworthy takes,
When he himself might his Deadline make
With a bare Laptop? Who would Installations bear,
To grunt and sweat under a weary Production Release,
The undiscovered Country, from whose bourn
No Deployment returns, Puzzles the will,
And makes us rather bear those ills we have,
Than fly to others that we know not of.
Thus Status Reports do make Cowards of us all,
And thus the Native hue of Project Status
Is sicklied o'er, with the progress cast in Yellow,
And enterprises of great pitch and moment,
With this regard their Responses turn awry,
And lose the name of Agile. Soft you now,
The Project Manager? In thy Status Reports
Be all my code remembered.

Saturday, June 9, 2012

Not All Reference Types Are Entities

I'm working on a scheduling system as part of a side-project, and I couldn't help but notice something interesting about how I'm designing it.  The system has a kind of an abstraction of events, as events aren't always simple "calendar events" in the traditional sense.

Originally the requirement was that "events need to be able to repeat."  Well, any calendar system can do that.  So I wanted to know why other calendar systems aren't meeting the need of the application.  With a little more back-and-forth in a small domain modeling exercise, it became clear that the requirement by itself didn't really state the business need.

As it turned out, events don't really need to "repeat" in the traditional sense.  More specific to the business need, events need to be able to have multiple instances within the same event.  For example, a particular "event" might happen "every day for a week" or "every Tuesday for a month" and so on.  And it can get pretty complex, such as "from 14:00 to 17:00 every Wednesday for a month and a half, except for the fourth instance because the venue has something else so that one will be on Thursday instead, and will start at 14:30 instead of 14:00."  Just making an event be "repeatable" won't cut it.

The solution is simple.  Events don't have dates and times associated with them.  Instead, Events contain a collection of Sessions which represent individual "instances" in this case.  So the Event has a name and a description and other attributes, including a Location.  It also has a collection of Sessions which each have a Start date/time, a Stop date/time, and an optional overriding Location.

So the database structure is also simple.  An Events table, a Sessions table, and even a Locations table just for some little extra normalization (and because I plan to do more with Locations in the future in this system).  As I designed the code, however, something didn't sit right with the concept of identity on these data structures.  As a matter of reflex, I included an ID on the Sessions table.  You know, so the system can uniquely identify a Session.  But... why?

I was specifically thinking back to a previous project at work where entity identity was a significant problem.  In that project, a technical decision was made at some point prior to my involvement that every table in the database will have a GUID as an ID and a software framework used that to uniquely identify all of the data entities.  This led to a pretty serious problem in the data because the business had a very different definition of "identity" for their entities.  An ID value (especially a GUID) meant nothing to the business.  They were thinking in business terms and defining what attributes of an entity identified that entity.  (Essentially, the business was thinking correctly and the technical design was artificially limiting them.)

So, what uniquely identifies a Session in my case?  Well, nothing important.  In fact, in the absence of an Event, a Session is meaningless.  My domain doesn't even need to fetch Sessions from a repository by themselves.  They should be attached as attributes to an Event when fetched from the Events repository, that's all.  They should never need to be fetched individually outside of the context of an Event.

That is... Sessions are not entities.  They are not, individually and atomically, a representation of a meaningful business concept.  They are attributes attached to an entity... the Event.  Sure, Sessions have their own table in the database.  (This is a technical concern, not a domain concern.)  They even have an ID to uniquely identify them.  (This is a technical concern, not a domain concern.)  They are even reference types in the code, not value types.  (This is a technical concern, not a domain concern.)  But they are not an entity in and of themselves.  (This is a domain concern.)

At the level of the programming language being used (C#), they are not value types.  But the business isn't concerned with the intricacies of C#.  The business is concerned with the domain.  And as far as the domain is concerned, Sessions are value types.  You don't care which Session you're talking about, and if you blow one away and replace it with another one of identical values then the two are indistinguishable.  The values are all that's important, not the unique identity thereof.

In real life, contrast this with something like a human being.  Have you ever known someone with the same name as you?  The same birthday?  The same address (like a family member)?  Any other identical attributes?  It's unlikely that you would know someone who shared all of these attributes with you, but it's not impossible.  You may need to explicitly seek out such a person and manually line up all of your attributes, but it can be done.  (Within reason of the attributes, of course.)

Does that mean the two of you are now the same person?  No, not at all.  You are unique entities.  Your attributes are simply values, they are not the entity itself.  Values are often used to identify an entity in the absence of a unique identifier.  (For example, business users may uniquely identify customers by their phone number.  This may be good enough for a particular business, even though it's possible to have collisions.  The phone number isn't the actual identity of the person, it's just a value used by the business to distinguish customers.)  But they're just values, not identity.

In this project, Events will have a unique identifier as well.  An ID column in the database, which is a simple incrementing integer.  It's likely that the business will internally identify Events differently, of course.  And the software will need to account for this.  A combination of values may be used to identify an Event, including perhaps even the collection of Sessions.

But a Session by itself doesn't need an identity.  No more so than your address needs an identity.  Your house has an identity, and the value of its address is the most common way to identify it.  But the address itself doesn't need an identity.  It's not the entity, it's just an attribute value.

This actually reminds me of another project I worked on some time ago when I was working in North Carolina.  We were modeling a fairly complex data model for a project and we brought in the database guy to help us.  He went about doing very database-y things, including standard relational normalization.  A lot of what he showed us was very helpful.  The concept of super-typing tables to achieve a kind of inheritance model in the data was new to me, for example, but made a lot of sense and made the design much cleaner and simpler.

But there was one case where his model didn't make sense to me.  Naturally, there was an Addresses table to store the addresses of various other entities.  People, client businesses, anything that had one or more addresses as an attribute of it.  And, being a relational database expert, he naturally normalized that data.  But he took it a step further.  His goal was to prevent data duplication within the Addresses table.  So if two or more other entities had the same address, they should refer to a single record in that table.

Should they?  This is where I disagreed with the design.  At the time I articulated the concern with a simple use case... Suppose two people share a mailing address.  For example, two business contacts at the same office.  One of those people moves (transfers to a different office).  So someone updates his address.  But wait... They just updated the address for both people, and for that office, and for any other entity using that address.  This is no good.

The conclusion was to simply allow data duplication in the Addresses table.  The more we thought about the technical implementation, the more inescapable that conclusion became.  There was still some mild objection to data duplication, but the objectors couldn't think of a more elegant solution.

There's a reason they couldn't.  They were trying to do something against the domain.  The domain made it very clear that addresses were not entities.  They don't exist by themselves, they don't have any individual meaning, and they don't need to have identity.  Addresses are values, not entities.  They exist only as attributes to entities.  If one is used more than once, that's ok.  If you delete one and replace it with another one of the same value, it's the same one.  (The technical implementation will need to maintain the relationship, of course, but that's a technical concern and not a domain concern.)

Just because something has a POCO in your software doesn't mean it's an entity.  Just because something has its own table in your database doesn't mean it's an entity.  The domain defines what is an entity and what is not.  The technical implementation needs to reflect the domain's definitions, not present its own definitions in the name of some convention or habit of the technical implementors.

Saturday, May 26, 2012

Dropping The "I"

I could have sworn that I'd written something a while back about dropping the "I" in front of interfaces, but I can't seem to find it.  Maybe I had merely intended to write it.  Oh well.  In any event, what I may or may not have said before is still working very well for me.

The idea, at least when I heard it, came from good old Uncle Bob.  It's simple, really.  While I'm not entirely familiar with the idiomatic conventions in most other languages, in C# (or .NET in general, I suppose) the convention has always been to prefix interfaces with an "I".  Something like this:

public interface IModelRepository { }
public class ModelRepository : IModelRepository { }

But... why?  The IDE knows it's an interface and not a class.  So you're not doing the IDE a favor.  The developers know it's an interface because, well, it's an interface.  And the code itself shouldn't care.  It's an "encoding" in the type that harkens back to things like Hungarian Notation.  And it's unnecessary.

A while back I adopted another convention for my code.  Something more like this:

public interface ModelRepository { }
public class ModelRepositoryImplementation : ModelRepository { }

Doesn't that just feel... cleaner?  There's no unnecessary encoding, each name states exactly what it does.  The interface is a repository for some model, the class is an implementation of that repository.  Two concerns, separated.

It looks cleaner throughout the rest of the codebase as well, because the rest of the codebase doesn't care about the implementation.  It cares only about the interface.  The specific module which wires up the implementation to the interface (the service locator, if you will) is all that cares about it.  And within that code it's also much cleaner.  After all, semantically what is that code doing?  Is it supplying a repository for an I-repository?  Or is it supplying an implementation for a repository?  It just makes more sense to name the constructs by what they are and what they do.

Now, sadly, this isn't common practice.  As a consultant I often have to go with what the community collectively prescribes because the code I write will be handed off to some developer with whom I will never work.  Most likely some "resource" who was hired specifically for a support role.  (I hate calling people "resources" by the way.)

So more often than not I have to go with the I-prefix simply because it's a more commonly known pattern and is expected and understood by more people.

But, for my code (and for code where I'm the team lead on a proper ongoing team dynamic, not a drive-by consultant implementation) I love this.  Clean.

Thursday, May 24, 2012

Agile Means "No End Game"

I've seen a lot of places maintaining their own flavors of "agile" (or, more specifically, their own flavors of SCRUM).  And this is good, agile is all about self-organizing teams and finding what works for you.  You should develop your own flavor.  Well... To an extent...

The most prominent barrier I see is adoption at every level of the business.  How many proponents of agile have given their pitch, presented the benefits, even attached precious numbers and values to the whole thing, only to be met with the same question:  "That sounds great.  So when can I have my finished system?  8 months?  10?  What will it cost?"

Something was lost in the exchange.  There are a couple of problems with the response:
  1. When you can have it is up to how you steer priorities.
  2. There's no such thing as a "finished system."
The first one is the more obvious of the two.  When can you have it?  We don't know.  That's the point, really.  Even if we were to estimate and plan the whole thing up front, we couldn't give you an accurate number.  It might be wildly high, in which case you'll waste time and money on something you could have had sooner.  Or it could be wildly low, in which case you'll burn out your team and end up with an inferior product.  Right now, before any discovery has been done, is the worst possible time to make such an estimate.

This is the basic paradigm of agile.  You're not getting work done faster; You're not getting work done better; You're trading one piece of control for another.  You're gaining the ability to quickly and easily change direction and adapt to new discoveries, and you're giving up the long-term planning to get this ability.

Of course, you don't want to give up anything.  You're a CFO, you want numbers.  This is going to go on a budget sheet and be locked away in a cabinet somewhere and, well, forgotten.  You demand these numbers.  But... did you ever really get these numbers in the first place?  How much do you value wildly inaccurate guesses?   Why is it so critical to write down a number that you're willing to do it at the worst possible point in the project?  Wouldn't you rather have more control of the number going forward?

It requires a change in how you want to handle those numbers.  We can't tell you with certainty that you're going to need to give us $1.6M for an 8-month project.  What we can tell you with certainty is that, if you can allocate $200K per month as a budget, we can begin work immediately.  With each passing month (hell, with each passing week) you'll have more information presented to you about the status of the project.  We could get $200K into it and realize it's going to be a lot more, or even a lot less.  You'll have feedback much sooner and can adjust and steer accordingly.

This begins to touch upon the second item above.  The lack of an end game for the system.  Now our budget is periodic, not absolute.  You can adjust it at will, even cut it entirely, and the team will respond accordingly.  Want to cut it down to $100K per month?  No problem.  You'll get half the velocity out of the team.  It's not the end of the world (though it could mean the end of a job for team members, so standard turnaround cautions apply), it just means that you're turning down a knob somewhere to get less throughput from an ongoing process.

It makes sense from the perspective of software itself.  We all know that the real costs are not in development, they're in support.  That ongoing $200K per month isn't going to go to waste after the initial production release.  In fact, if your plan is to bring in a team temporarily, have them build a system and release it to production, and then send them on their merry way... Get ready to spend a lot of money.  Software does indeed rot, sir.

If, on the other hand, you maintain an ongoing budget for the team in general, support and upgrades will just be a standard part of that budget.  Bug fixes, new features, etc.  And, as I said before, you can tweak that budget without a problem.  Even cut it entirely (if you decide, as a business, that you want to cut support for your software entirely).

Because there is no end game.  Software doesn't stop.  It's never complete.  It's an ongoing cycle (assuming your business genuinely relies on this product, of course, and it's not just a temporary scaffold that will be thrown away after you've used it).  Unless you plan on your business coming to a stop, don't plan on the software which supports your business coming to a stop.

The amount of money you spend on your enterprise software can fluctuate.  And, again, it can even dry up entirely in a dire enough situation.  But the point is that it's a periodic cost, not an absolute one.  Your big $1.8M project that's planned from the beginning is going to cost you a lot more than $1.8M, you just don't know it yet.  Which means you haven't planned for it yet.  That additional cost won't come out until after the project is "done."  Operational costs, IT costs... it'll all be there.

"You don't know that yet" is, in fact, exactly the point we're trying to relate to you in this pitch.  We don't know things yet either.  So, rather than make up numbers, we're proposing a paradigm shift in how we manage uncertainty.  Instead of pretending that we do know, let's assume that we don't know and plan our approach accordingly.  To adjust for this uncertainty, we build agility into the process.  So that as soon as we do know something, we can immediately react.  As a whole team.  As an entire enterprise.

"Agile" isn't just something the developers do.  It's something the corporate culture adopts.  And it either adopts it wholly or not at all.

Saturday, May 19, 2012

The Case for Service Accounts

It's a common question when writing an application.  Should the application access the database (and other services) as each individual user, or as a single application-specific service account?  This is an especially important question in any multi-tier service-oriented system.  There are pros and cons to either approach, of course.  But for my entire career I've always preferred the service account approach.  This is for a number of reasons.

First of all, you may not have an internal authentication/authorization context for every user.  The application may, for example, be a public-facing web application.  (I have actually heard someone suggest that user registration for a public website be handled through Active Directory.  Too bad I didn't get to see them try to implement that, it would have been a blast.)

Second, in multi-tier environments, passing that context from machine to machine is hard.  Have you ever had to maintain Kerberos in a Windows environment?  It's not only really bloody terrible, but from a more concrete business perspective it will devour operational expenses both in terms of downtime for users and constant support from IT staff who would be better off doing other things.

But there's a more pressing concern with the difference between user contexts and service accounts.  And it's one that's probably most often used as a reason for user contexts, even though it's really a reason for service accounts... Auditing.

"We want to maintain the user context throughout the system so that we can audit the user's actions throughout the system."

Are you sure that's a good idea?  Let's think about it for a minute...

When a user (we'll call her Mary) performs an action in the application, you definitely want to audit the action she performed.  But what was that action?  Did Mary edit a record in a database?  Did Mary call a web service?  What did Mary actually do?

The last time I saw this kind of user auditing in place, every record in the database had an audit log attached to it indicating a history of what changes were made (inserts, updates, and deletes) and which user made the change.  So the business can say with confidence that at 1:37:14 PM on May 17th 2012 Mary edited this record in the database.

Mary did no such thing.  Your audit trail is wrong.


Mary didn't edit a record in the database.  Mary performed an action in the application.  The application edited the record in the database.  Or perhaps even called a web service (which is another application entirely) and that application edited the record in the database.  Mary was completely removed from this action.  What Mary did in the application may have translated directly to a database record edit.  She may have even thought that she was directly editing a record in a database, like in old Access applications.  But the application actually performed that action.  Mary merely performed a request.

What if the application does more things unknown to Mary?  Maybe Mary told the application to update the record for a patient in a medical system.  The application updates the record, performs some other business logic on the new data, updates some other records, kicks off some tasks, etc.  Mary isn't aware of any of this.  And, more importantly, Mary didn't tell the application to do any of this.  All she did was tell the application to update some patient data.  So why should all of these other things be attributed to Mary?

Worse, what if the application has a bug in it?  What if this medical application accidentally updates the wrong patient's data?  Why would you want to blame Mary for that?  The audit trail would be very wrong, arguably criminally wrong.  Mary performed a request to update the data for her patient.  The application received that request, but did something completely different.  Mary certainly didn't do this.  The application did it.  Arguably even the developer did it.

If Mary opens up SQL Server Management Studio and directly edits data, then that should definitely use her personal credentials to audit those changes.  But that's not what's happening here.

What should be audited is the actual request Mary made to the application.  And any request that application makes of other systems in response to Mary's request.  Database edits, service calls, etc.  Each should be audited in its own right.  That audit trail can link these requests together for reporting purposes, so that one can easily determine which application requests were made in response to a user's request.  But the user's request should be the only thing attributed to the user, and that attribution should be as close to the user's actual actions as possible (that is, as close to the UI as possible) to reduce the risk of bugs changing the requests before the audit occurs.

Sunday, May 13, 2012

"The Business Person Doesn't Need To Know That"

At my current project there's some pretty significant involvement from the business in the details and minutia of the application design.  For better or for worse, each and every business user wants to know exactly how things are going to work, and wants to put their two cents into the design.   It's a lot of cooks in the kitchen, and as a developer it's easy to complain about that...  But I should really make the best of it.   After all, it could be worse.

I'm reminded of a previous job at a bank that no longer exists.   (I've said it before and I'll say it again...  A bank going bankrupt is irony at its best, and it also an astounding example of a business running itself into the ground.)  At one point during my brief tenure there I was tasked with fixing a bug in an application owned by the HR department.  It was a basic one-off HR utility, grabbing data from here and there and moving some bits around so they can do something.  No big deal.  But it suddenly stopped working recently.  And it stopped working in a weird way.

Normally it would iterate through every employee in the company, connect each one to some other data in some manner relevant to HR, and update some data in a transactional database owned by the application (and, thus, by the HR department).  Then it would email some reports to key individuals based on the new state of the data.

Note that I keep pointing to the fact that this is owned by the HR department.  That's a significant part of the point I'm trying to make.

The "weird way" in which it "stopped working" was that it would run as scheduled, update some data, and send some reports.  But it didn't update everything.  It took a little cajoling with the HR point of contact (which I think was also head of HR... we'll call her the Product Owner) to get more relevant information (she's used to being protective of this data... but I'm not interested in the real-world values of the strings and integers, just the fact that there are strings and integers), but the pattern which was eventually uncovered was that it only "worked" for employees whose surnames began with A-F (semi-inclusive), and never for any employees whose surnames were F+ (semi-inclusive).

The pattern indicates pretty clearly what, or at least where, the problem lies.  Something is wrong with the data for an employee whose surname begins with F, and the application is silently failing.  And this data is relatively new, as the system only recently stopped working.  (Though, in hindsight, "recently" in bank-time could have been two years while the department awaited approval and navigated the sea of red tape to get a bug fixed.)

So I set about debugging.  The arduous process of stepping through somebody else's code and examining values.  And let me assure you that this code was a mess.  I was told that it was written by somebody who didn't really know .NET very well, and was accustomed to developing in another system.  I take statements like that with a grain of salt, understand.  And I do this because the errors weren't necessarily .NET-specific errors.  If the developer didn't use LINQ or didn't use built-in functionality to perform common tasks, that's one thing.  But we're talking about serious logical problems.  Looping through data in unimaginably inefficient ways in order to find a value that you already have in another data structure, things like that.

But I digress...

I tracked down the problem, and wrote a detailed analysis of it.  This was earlier in my career, and while I don't have the analysis anymore I am confident that it could have been worded much more professionally.  So I approach this particular anecdote with a modicum of understanding of the events as they unfolded.  Nevertheless, I sent this analysis to my supervisor and to the Product Owner.

The analysis essentially laid out the logic surrounding the error, indicating specifically where and why the error occurred, the logical nature of the error (the application was assuming that every entity in Active Directory was an employee, and that every "employee" email address was unique... the falseness of the former assumption led to the falseness of the latter assumption), etc.

I was proud of this analysis.  I'd not only found the nature of the problem, but I'd found a prime candidate of code that needed to be re-factored and cleaned, which would result in an application that's easier to maintain and dramatically more efficient for the users.  So the analysis was also a request for clarification. What should the logic be?  Clearly, the logic as it stands right now is inaccurate.  But without any further knowledge of what the application should be doing (again, the code didn't give much indication of this), I needed collaboration with the Product Owner.

"She doesn't know or care about these details; It's your job to fix it."

Those weren't my supervisor's exact words, I don't think, but that's essentially the gist of it.  And the other people in my group agreed.  More vehemently than respectfully (again, earlier in my career), I disagreed.

How can the Product Owner not know or not care about the business logic driving her application?  Sure, the implementation details are unimportant to her.  That's where the developer's "magic" comes into play. But the logic?  That is most certainly owned by the Product Owner.  I can't tell her what her application is supposed to do.  I can suggest logic to achieve the desired business outcome, but that logic needs her collaboration.  She can't just expect to toss some vague concept over a wall and expect somebody to toss back a completed application.

Can she?

"That's just how we do things here."  If you've read my previous posts, you've seen that one.  It's the same place.

Again, I completely and wholly disagree.  At its core, the application was "broken" because it logically defined the unique identifier for an employee as an email address.  Does HR use this to uniquely identify an employee?  If so, then I agree that there's a technical implementation detail which broke that logic.  Nonetheless, it's useful information for the Product Owner.  Apparently enterprise-wide that particular unique identifier isn't as reliable as originally assumed, and something in the business logic may need to be adjusted.  But then... If that's not how HR uniquely identifies an employee, then the application isn't expressing the business rules of the HR department.  The Product Owner definitely needs to know about that.  And, for lack of additional information, the Product Owner needs to work with the developer to correct the logic so that it accurately and completely reflects the business rules.

Either way, the Product Owner needs to be involved.  But... She doesn't care about such details?  At another previous job I once said, "If the people who want this software don't care about it, why should I?"  It's kind of passive-aggressive, but the point remains.  It's either important or it isn't.  It's either worth your attention or it isn't.

If your software isn't worth your attention, expect bad software.  And don't be surprised when you get it.  (Loosely translated: Don't come crying to me when you get everything you want.  I'll help you fix it, but I won't take accountability for the parts that weren't under my control.)

Ok, I'm slipping back into passive-aggressive mode.  I do apologize for that.  It's something I need to correct.  But, again, I digress...

My point is that everybody needs to care about the software.  Everybody who's involved, anyway.  The developer, the tech lead, the project manager, the tester, the product owner, the data analyst, the executives who run the whole company... everybody.  Anybody who doesn't care about the project doesn't belong on the team.

Wait... the executives?

Yes, the executives.  Everybody who has a stake in the software.  This particular example of a stakeholder highlights a particular scenario, though.  Not everybody has the time to personally be involved in the software.  This doesn't mean they shouldn't care, it simply means that they delegate the responsibility to somebody else and trust that somebody.  And that somebody needs to deliver.  They need to care.

So, maybe the Product Owner in this case didn't have the time to personally be involved.  Then delegate.  And trust your delegation.  That means that if you delegate your ownership of the software entirely to the developer, then understand that whatever the developer does is owned by you.  The developer is responsible for his or her decisions, naturally.  But as the person who delegated the effort, so are you.

The previous developer made the wrong decisions.  Don't make the same mistake again.

Your software is as important as you make it.  If you just do some hand-waving and walk away, what results do you expect?

So, for better or for worse, at least my current client cares.  The business person does need to know.  And the business person wants to know.  Which is a pretty good position to have.