Tuesday, June 1, 2010

It's OK To Hard-Code, And Other Things Your Company Might Not Know

We've all seen our fair share of "coding standards" on various projects and in various jobs. It could be as harmless as "please keep these enums in alphabetical order" to something as insidious as "all values must be in a config file." While I'm sure every such edict came from a well-intentioned policy maker, I fear that many times they were born out of a knee-jerk reaction to a problem that occurred some time in the past. And in the effort of solving that problem, more problems were introduced.

What basically ends up happening is a variant of the Law Of Instrument. Basically, the edict is made from on high: "I have found a hammer that solves this problem, so all future work must be done only with nails to avoid other problems." And what's worse, these edicts very easily become canon and spread like a plague as developers raised to believe such dogma move to other jobs and climb the ladder of experience.

This isn't to say that coding standards in general are a bad thing. In fact, to say that would make me just as guilty of applying an over-generalized rule to the complex ecosystem of software development as those whom I decry here. No, standards are useful, provided they are created and applied to be useful. After all, keeping a large collection of enums in alphabetical order (provided they are named something intuitive and useful) seems like a pretty good idea. But there exist commonly-enforced standards in many companies that do more to stifle software development than to standardize it.

Let's take a look at a few examples drawn from my own experience (thankfully, not all of these apply at my current job)...

1) Thou Shalt Make Ugly Variable Names


I've never liked Hungarian Notation. But man, some shops love it. And, invariably, they almost always get it wrong. Hungarian Notation basically means that you prefix your variables with useful shorthand to easily identify what those variables represent. Somewhere along the way, somebody thought that it means to prefix the variables with what class they represent, and that pox has spread.

I think it was the timing, honestly. When Hungarian Notation was spreading throughout the industry, you didn't have fancy IDEs that did a lot of the grunt work for you. If you're programming in a text editor devoid of context or syntax, you want your code to be easy to read. You don't want to have to look for the variable definition to know what type it is. But in this modern age of intellisense, there is no reason on God's green earth to require that all strings begin with "str" or all text boxes begin with "tbx" or any other such nonsense.

Determining the type of a variable now involves nothing more than hovering your mouse over it. Its declaration is a context menu away. And for the love of all things holy, it doesn't need to be over-abbreviated. We're not short on bytes here, you can write out full words. There is no award for keeping all your variable names exactly eight characters. (Unless you consider being impaled on a pike a reward, because we may be able to arrange something.) If you refer to the network ID of the current user as "cusrntid" then nobody will know what it is just from looking at it. Your naming convention and variable standardization has failed, you may as well have called it "x" or something equally pointless.

So please, name your variables properly. They don't have to be a specific notation across your entire enterprise. They don't have to be a specific length, or abbreviate things in specific ways. They just need to quickly and simply convey what they mean in the context of the code. If you're in a class called HRDataRepository, in a function called GetEmployeeNames, and there's a variable that declares an ADO command object, guess what that variable does? How did you know it holds a SELECT command for the Employee table in the HR database if I didn't even tell you the official standard-conforming name of the variable yet?

Rigid standards to variable names won't make your ugly, complicated, difficult-to-trace code any easier to support. Honestly, do you want your developers spending more thought cycles on fitting your broken conventions, or do you want them to apply those thought cycles to actually solving problems that mean something to your business?

2) Thou Shalt Keep Line Count To A Minimum


Line count? Line count? Ok, we already covered that the number of bytes occupied by your code is no longer an issue. Turns out 640K wasn't enough for anybody, and we've grown to accept that. We've moved on. But more to the point, we're talking about readability here. So let me start off by making a bold statement, followed by a clarifying disclaimer:

There is nothing inherently wrong with writing a function that's 1,000 lines long.

There is something inherently wrong with writing a confusing, difficult-to-trace function that's 1,000 lines long.

See the difference? Maybe your business process involves 1,000 steps in one atomic operation. Maybe that's just how much code it takes to accomplish the single task at hand. It happens. And as long as that code is easy to understand, it's OK. But when you start forcing your developers to keep their line count down, guess what happens? Their line size increases. Now you've done it, Mr. Developer Manager. You've made something terrible happen. Consider the following piece of code:

employees.ForEach(e =>
{
  var dataRow = SomeFunction(e);
  returnObj.Add(
    new EmployeeDTO{
      Name = dataRow["Name"],
      Email = dataRow["Email"],
      Department = dataRow["DepartmentName"],
      //... there are more fields
      // and some of them require other
      // function calls and deeper nests
    }
  );
});

And that's a relatively simple example. The arrowhead can get a lot deeper than that. But look at it. It's just one statement, a call to ForEach() on a generic list. Do you want that on one line in order to keep your precious line count down? Can you handle something like that on your conscience?

I know why you want your functions small. The article I referenced earlier even put forth the rule that closing braces shouldn't be beyond the same page view as the opening braces. There's good reason for that, in most cases. You should be able to see an entire code block without scrolling so you can grok the whole picture. I applaud that, it's a good thing to try to do. But if that suggestion causes you to re-factor your functions in unintuitive ways, then you didn't get the message.

What if another coding standard requires that all the functions in your class be kept in alphabetical order? Related to making intuitive variable names, you also want intuitive function names. So your atomic 1,000-line process is broken up into dozens of functions that perform dozens of pieces. And now tracing that process involves jumping around the code file like a monkey on crack, instead of following a simple step-by-step process in a single function.

3) Thou Shalt Comment Everything; // declaring that one should comment everything


The title of this section should be pretty self-explanatory. Comments are great, and they should be used often. The same can be said of oxygen. But, like oxygen, keeping it in moderation is in your best interests. If you write too many comments then you'll probably end up with a high noise-to-signal ratio. The average value of each comment drops, and the important ones more easily go unnoticed.

When you have a block of code that does something obvious, you probably don't need to describe it very much in comments. Taking a previous example, you have a function that gets employee names from a database. You know what it does from its name. It takes no input, it returns a list of strings as output. The function itself opens a database connection, selects names from an employee table into a variable, closes the connection and returns the variable. It doesn't do anything unexpected or unintuitive. So, please, don't waste your time commenting on every line in the function.

Does your function filter employees by some known employee status? Explain that in a comment. Does it perform a sort on the employees for a reason that's probably clear somewhere else in the code? Explain that in a comment, including where else it's important. Those are business needs that are not immediately obvious when reading the code. But your code that opens a connection to the database, runs a query and closes the connection... it's pretty obvious what that does and why it does it.

I remember a professor back in school who required that every line of code be described in a comment. It led to gems like this (but at least with better variable/function names, left out here for brevity):

for (int i = 0; i < foo.length; i++){ // begin a for loop
  DoSomething(foo[i]); // call DoSomething on foo[i]
  DoSomethingElse(foo[i]); // call DoSomethingElse on foo[i]
  if (i > SOME_CONSTANT){ // check if i is greater than SOME_CONSTANT

    break; // exit the for loop early
  } // end if

} // end for

So now you've written the same code twice. Once in C++, once in plain English. Congratulations. So what does it do? Why are you terminating the loop early based on a pre-defined constant? Commenting every line led to commenting each line. Individually. That stripped all context from the block of code as the developer was too busy commenting on the line of code instead of the block. A quick comment before the loop could have explained what we're doing and why that constant is important. Also, if those functions/variables are named properly and if that conditional isn't there, then none of it needs comments if it's doing something obvious.

Don't write more comments. Write better comments, keeping in mind that sometimes "better" is to not bother writing one at all.

4) Thou Shalt Create Bloated And Unmaintainable Config Files


Want to know a secret? It's OK to hard-code values. Not everything needs to be pulled out into a config file. If that statement caused you to cringe or in some way be taken aback, then please keep reading. This isn't just for you, this is for everybody who ever works on your code.

It's generally a good idea to put changeable values in config files so that they can be easily changed on the fly without re-building and re-deploying the software. Paths to network shares, connection strings, email recipient addresses, etc. These values might change, and they might need to be changed quickly and without re-testing and re-deploying everything. But, as is the general theme of this article, there is such a thing as taking it too far.

I once worked with a man who insisted that all literal values must come from a config file. Have you ever seen code that didn't have a single string or number literal in it? It just doesn't look right. It also led to a number of problems.

First, there was the question of how to pull strings from a config file without using strings to reference the config values. His approach was to have a single class of hundreds of static constants to hold the config value keys. This was the only "approved" place to put string literals, and they had to adhere to strict conventions.

Second, everything was a string. In his attempt to get strings out of his code, he introduced more strings into his code. There is a huge difference between 1 and "1" and that difference can be very important. You know it, I know it, the compiler knows it. Making all values enter the application as strings meant that he had to do tons of type checking and error trapping that he otherwise wouldn't have needed. It was a lot of work, and it was fragile.

Third, it was fragile. (Yes, that was worth saying in both paragraphs.) The code required that the massive config file be just right, or things could go horribly wrong. No matter how much type checking and error trapping you put in there, someone will find a way to enter a config value that will break your code. This is especially true when your config file is massive and a standard small percentage of human error becomes a very real number.

Users just don't need that much control over your software. They don't need to change the text of an error message or, God help you, the name of the stored procedure that you call. Little Bobby Tables might stop by to haunt you later. Let your users change things they understand. But if you don't expect them to be able to write the code themselves then you shouldn't expect them to manage every aspect of how that code behaves either. Separate your configured values from your hard-coded values with a little bit of common sense.

To give another example, I recently worked with a guy at my current job on a relatively simple project. I was writing some code that involved a case statement. Basically, it was dividing up code paths based on the type of a loan in some bank software. Loan type 430 is mortgage, loan types 1-699 (except 430) are commercial, loan types 700+ are consumer. Those are the three options. Any more options or any more granular differentiating among loan types will require code changes. A config value here isn't going to offer you any advantages.

Will the loan types ever change? Seriously, have you ever worked at a bank? Nothing changes. Ever. You may as well be etching that code into stone because it's going to outlast you and everything you hold dear. Should the users be able to configure this? Probably not. They might not even know the ID numbers, they might only know loan types by their names or some other distinguishing characteristic. And are there really that many different types of loans? My guess is that someone picked a 3-digit ID to cover future expansion and just broke it up by a perceived 70/30 split in the types of loans available. Then mortgages eventually broke some standard.

The developer with whom I was working saw my hard-coded values (== 430, <= 699, else) and said "the only constant is change." I get what you're trying to say, and I very much agree with the statement. But follow the spirit of the law rather than the letter of the law wherever possible. If anything about these values ever changes, isn't that going to require more code changes to accommodate that anyway? This is fairly core business logic. How much effort do you really want to spend re-factoring out into a config file values that probably aren't going to change.

Don't be afraid to hard-code values from time to time. It's just easier to support in many cases. Yes, there will be developers who curse your name for it. "The genius decided to hard-code it" will be a phrase spoken of your work from time to time. But come on, we all curse the names of previous developers from time to time. Nobody wants to work on somebody else's code.

"But," you may ask, "shouldn't we do what we can to avoid having to re-build and re-deploy our code? After all, deploying is such a hassle." Is it? Why? Building and deploying your code are the most automated steps in the entire process. Writing/debugging/testing is the hard stuff. If building and deploying in your environment are difficult and painful then your environment is broken. Stop breaking your code to match your environment and start fixing your environment. If change is indeed the only constant then we should be actively embracing it rather than grudgingly accepting it.

5) Thou Shalt Tightly Couple With Third Parties Who Don't Know Or Care About You


"Log4Net is nice, but our standard here is to use the Logging Application Block in Microsoft Enterprise Library." Well, dear company, aside from the fact that you're ignoring the reduced overhead and increased performance of the former in this particular case, your standard on the latter is broken. You did it wrong. The problem isn't with the library itself, it's with the fact that you were somehow convinced that you needed to pick one.

Consider, instead, the following statement: "Our standard here is to log to our ILogger interface, to write all info and above events to [some database or some message queue of some kind] and to email all error and above events to [some distribution list]." It took a little more thought, but it's a lot more of a standard than that statement in the previous paragraph.

In the first example, no real standard was created at all. You chose a library to use, but defined nothing about how it should be used. Some applications can write logs to a file on some server that nobody knows about, some can write to a database table somewhere, some can email the developer who wrote the code and nobody else, etc. That's not a standard. All that does is tightly couple your code with some third party library that you can't control or support. You've gained no benefits.

In the second example, you can still use your precious third party library. Nothing's stopping you. But, more importantly, you can use a different third party library if you ever want to. Or you can roll your own. Nothing's stopping you. Maybe you're not using dependency injection and aren't into the whole interfaces thing. That's fine, whatever. But the point here is that these "standards" on external dependencies should be abstracted wherever possible.

So don't let your company be so content in the fact that they "chose a standard" and that all of their applications adhere to that same standard. They haven't made anything easier to support in doing so. And don't you dare walk in there with a smug sense of open-source satisfaction and declare that the standard is now being switched to Log4Net. If that's the case, then you missed the point entirely.

OK, that was a long post with a good bit of ranting mixed in there. Let's sum up. Development standards can be a very good thing, but they have to be applied properly to the problems at hand. Deciding on the wrong standard and forcing it down the throats of your developers impedes productivity. Your developers are intelligent people who can solve problems in intelligent ways. Stop trying to make the entire process a matter of copying and pasting "approved" pieces of code in the right order.

You should choose your standards and your acceptable coding practices for your domain. But choose wisely.

5 comments:

  1. Wow that was a hell of a multi-part rant.

    Hungarian Notation, oh how I loathe you. I think it made more sense when in the VB6 days. I believe back then there wasn't much of a scope being enforced on variables in the language so people would declare all variables and initializing values at the top of a function. So yeah, you might be interested in knowing what the hell a variable held when you were 70 lines in and no variable declarations in sight. Oh and then there was the whole 'option explicit' and Variant type. I better stop myself before I start ranting about VB6 and the coding conventions of that thing.

    Of course I program in Python which is dynamically typed and the smallest scope is a function block (so a variable declared in an if statement is visible to the whole function) and no type identifying notation anywhere nor any intellisense really (I use VIM for Python). Perhaps it's a shift in thinking that occurred. They happen almost way too often in our field.

    As for line count of functions, I do agree that functions shouldn't get too long. I think it has to do with expressiveness. If you introduce a new word and it takes you a 1,000 sentences using primitive words to describe what it means then that's a lot for a person to take in at once. If you can break up the definition of that word into smaller more expressive words, I think this is easier to digest. A person can gradually understand it and the words could be reused to describe new things. Although I don't think the only reason to do this is for reuse.

    ReplyDelete
  2. It's a matter of good judgement, really. In most cases, yes, long involved functions and classes should be avoided. It's a good general rule of thumb to break things into smaller discrete chunks of code.

    The point, however, is to remember _why_ we do that. We don't break up our functions strictly because of line count, we break them up because it makes them easier to understand and manage. If, however, a rare case is encountered where doing this makes the code more difficult to follow, then it shouldn't be broken up just for the sake of line count.

    Where people usually say "always keep functions small" I'd rather say "always keep functions simple."

    ReplyDelete
  3. Yeah I agree. I was telling the new guy here that we are in a field where there are not really perfect ideas and shitty ideas. There is a lot of grey area since almost any idea will be a good one in the right context.

    ReplyDelete
  4. Now HERE is an example of an IT standard that I'm sure was very well-intentioned but, um, went wrong somehow:

    http://thedailywtf.com/Articles/Secured-Typing.aspx

    ReplyDelete
  5. What's also awesome is that nobody from IT stayed. So after the extremely frustrating week the company would still not know anything about setting up that software. Seems like you would want someone to learn about it.

    ReplyDelete