Tuesday, January 17, 2012

The Cost of Quality

There's been a lot of discussion on my employer's Yammer boards lately regarding the value and importance of unit tests.  The discussions were mostly inspired by a recent post by one of our former colleagues, as well as by a response by one of my favorite authors.  Being a consultancy, the topic in general is often discussed on our team.  After all, having a comprehensive suite of passing tests is the way to demonstrate that what we delivered is what we promised.

(On a side note, I found Uncle Bob's response to be a bit too preachy and dogmatic for my tastes.  I'm a huge fan of his work, and I completely agree with what he's saying, so don't get me wrong.  I just found how he said it this time to be a bit too emotionally charged.  Of course, he has no shortage of other material which conveys the point in a more client-friendly manner, and he's entitled to have the occasional rant just like the rest of us.  So while I would argue his point to clients, I just wouldn't use that particular blog post to support my argument.)

Amid the discussions were references made to various studies and papers on the cost of quality in software.  A quick Google search led me to this, which itself references further works, and so on.  This turned out to be fantastic material for making the case for unit tests and good code design when trying to convince non-technical decision makers (managers, et al).

I particularly like the segregation of costs into buckets.  Something like this:


The magnitude of the costs can clearly be seen as increasing from left to right.  Being able to manage software quality with automated tools is very cheap, whereas external failures which drive away customers is very expensive.  Clear and simple.

But there's more.

I'm sure we've all been in this situation.  We've all tried to push for up-front investment in code quality (which is poorly worded, because I still maintain that it takes equal or less effort to initially write good code than bad code, but that's another discussion entirely), or tried to explain to the owner of a legacy codebase that their code is of poor quality and heavy re-factoring is going to be necessary.  And the counter arguments, from a management perspective, are often centered on cost.  I've heard many variations on the same central argument:
"Rather than invest too much in the code, we can just focus more on QA.  We can have junior (read: inexpensive) resources do our testing, developers can test their changes, etc.  Just validate your work and it'll be fine."
So, where does this fall in the buckets?  Clearly the manager is saying that he would prefer to sink his money into Appraisal Costs than Prevention Costs.  So, let's take the case of a legacy codebase that just, quite simply, doesn't allow for any kind of automated testing.  Whether it was written by amateurs who didn't know what they were doing, whether it was actively decided by management that automated testing is not to be considered, or whatever the reason... that's not important.  The net result is the same.  Automated testing can't be done in this case without a major up-front investment.  So the business has decided to prefer Appraisal Costs:


A simple business decision, and the business has every right to make that decision.  So now that the decision has been made, that's the end of it, right?  Management decided to standardize on Appraisal Costs, so that's the standard going forward, right?

Wrong.  See, things change.  This decision would be the end of the discussion if this change to the software was the end of the changes.  Is your business growing or changing?  Is your software growing or changing to accommodate it?  I certainly hope so, because a lack of change is usually a sign that the business is in trouble or not adapting to market forces.

The software is getting more complex.  And that poorly designed code isn't helping.  If you haven't seen this happen, understand that I envy your career.  Because of my past four jobs, three of them have exhibited this behavior in one way or another.  QA was having a harder and harder time keeping up, and the software was only getting worse.  Basically, this was happening:


From the perspective of somebody who cares about costs, that's a frightening trend.  Management has already decided that Prevention Costs are not a consideration.  But now that decision is bleeding into Appraisal Costs.  And we're starting to feel some Internal Failure Costs.  And decisions like "We should educate our users on the known issues" or "We should document the bugs better" don't really solve the problem.  They just add documentation to the problem.  (Consider other problems in life.  I could explain to you at length why your car won't work.  But just putting some gas in it will be more productive in the long run.)

I hate to run around yelling that the sky is falling, but the costs speak for themselves.  If one takes a step back and considers the simplest view of the facts:
  • The business decided to standardize on Appraisal Costs
  • The business has incurred Internal Failure Costs
These two facts together lead me to a single conclusion:
  • The business' decisions don't control its costs
Again, frightening.  Can that box be pushed back to the left?  Sure, but it'll cost you.  (Ok, I kind of intended that pun.  But it's true.)  But how much resistance does that big red strike provide?  How much effort is it going to take to move that box to the left?

I can't answer that question.  So, instead, I propose an alternate approach entirely.  Don't move the box to the left.  Don't fight to stay in Appraisal Costs.  Instead, start focusing on Prevention Costs.  Start... re-factoring.  Don't fight the code you have, replace it with the code you want.

(Yes, managers... replace.  I know you spent money on that code, but that doesn't mean it's worth anything.  That code is increasing your costs.  If you're in a position where you can't replace your legacy code, that means you're not in charge.  The legacy code is in charge.  It's making the decisions for you.  Code should be replaced.  Even good code should be replaced.  With better code.)

To continue the loosely-held 10,000-foot view, consider this:


Yes, the reality of the situation has continued to bleed into Internal Failure Costs.  In the short run, that can happen.  We can try to minimize it, but it can happen.  The thing is... It was going to happen anyway.  We didn't create that problem by taking this route.  In fact, we've lessened it.  The big red strike is continuing its rightward journey.  But that's OK.  We've gone around it entirely.

Does it cost time and money to do this re-factoring?  Of course.  Everything costs time and money.  It's just a question of how you want the graphs to trend.  We've already established that forces beyond the control of management have caused that big red strike to move and grow.  But we can make use of the forces we do control to mitigate the problem.

So keep focusing on those Prevention Costs.  Aim for this:


Great Scott, that big red strike has grown.  It's out of control!  Well, yes.  That's right.  And now, because we've standardized on Prevention Costs, we don't need to control those other forces.

Now what will happen if that big red strike continues to grow?
Costs will decrease.

Now we're no longer fighting external forces.  Where we were previously fighting (that is, spending money) for the sole purpose of maintaining Appraisal Costs, now we're not fighting anything at all.

We are the music makers; We are the dreamers of dreams.

No comments:

Post a Comment