Monday, June 28, 2010

Dependency Injection

Hey guys what are your views of DI? It does seem partly ritual to make sure things are testable, but I suppose it also is to ensure you can easily do context sensitive configurations or the ability to replace functionality. Of course if most of your interfaces end up only have a single implementation does it seem like too much overhead? Or too convoluted?

I'm personally a fan of the pattern and think that the (although somewhat annoying but also minor) overhead of maintaining interfaces is worth the cost of having a consistent scheme of how things get plugged together. Also I don't think it's too convoluted to follow now that I have experience with the pattern and workflow.

Anyway what do you guys think about the practice? What do you find are the pros and cons? Do you have any run ins with people new to it? How do they feel about it? etc..etc

Saturday, June 26, 2010

Area 51 Proposal

In case anybody's interested, I have a proposal up at Area 51 for a stack exchange site regarding development methodologies.  If you're interested, show your support by following and contributing!

http://area51.stackexchange.com/proposals/9543/development-methodologies

Saturday, June 19, 2010

Tiny DSLs In The Code

I wanted to explore a little on the LINQ vs. lambda that Dave mentioned. I really don't have a direction for this post but wanted to get some thoughts down and see what others were thinking. Also it should be noted that I'm talking mostly about the IQueryable abilities and the two methods of using them (LINQ: from x in list where x.valid select x, and lambda: list.Where(x => x.valid))

The lambda syntax he mentions is just strictly straight up C#. You chain together the methods giving them the lambda expressions for the various actions to perform on the queryable you are working with. You get to build up what you want on a lazy evaluated data structure and it flows fairly cleanly inside C#.

There are some advantages to this, someone familiar with the basic idea of method chaining and the language idea of lambda expressions can easily understand the flow of the code. This sort of internal DSL gives us the ability to take advantage of language features and try to produce more expressive code. Several OSS libraries in the .NET world have certainly picked up on the fluent interface idea which is an internal DSL practice.

Then we have the LINQ syntax. It allows us to also build up what we want and have it lazily evaluated but now the you are working in a language that isn't really a part of C#. Although LINQ is a language feature of C# it still is more of an external DSL. It's a small language implemented on its own. Thankfully the compiler fully understands this DSL and can convert it to the expressions to form a fully valid C# program. You can still chain actions off of it and pass it around, but you are given a new language to express the idea with and not only relying on C#.

Now someone needs to know this new language in order to understand your code and not just C#. Perhaps LINQ is a bit of a simple example, but this new language can provide create expressiveness and higher levels of thinking. These are generally great but again now someone needs to know two languages. This second language though is purposefully designed to be smaller, stricter, more focused, and more expressive in its domain. Naturally I'm speaking about DSLs and their advantages and disadvantages.

I wonder if DSLs will generally catch on. Like I mentioned in a comment on Dave's post the Ruby world seems like enjoy some DSLs. They are still really internal DSLs since they are implemented in and used in Ruby which is a general programming language but Ruby's flexibility in syntax allows for them to produce some pretty interesting results. I suppose the most famous is RSpec and how it can help a developer very expressively and cleanly say what his/her code is suppose to do. This does mean that you now have to learn how to write this new DSL but in return it offers a more natural and less ambiguous way to write what you mean.

.NET does have Boo which is pretty cool language. It's a general statically typed programming language on the CLR. Python inspired syntax but it has a very extensible compiler and a built in idea of macros. You can actually define new syntax in Boo with Boo. So you could actually write the core of the system in C# and then have areas that you write in a DSL in Boo that when compiled gets tied into the system (such as maybe a rules engine on customer pricing). Or you write the whole thing in Boo.

Now all this DSL stuff certainly brings more overhead and higher level of entry for a new developer so I'm curious what everyone's thoughts are. Do you think they are worth it? Has promise? Some people feel that proper use of DSLs or just in general polyglot environments are going to be what allows us to manage complexity better and improve our disciple. Instead of hitting everything with a hammer, you could add in a saw, coat hanger, and duct tape.

The .NET ecosystem of languages has certainly expanded with F# and IronPython being supported languages, let alone all the others that are out there. The Java world seems to embrace their ecosystem of languages and although many feel Java the language has become fairly stale the JVM has still been continuing to explode with ideas. Perhaps I should write a post about my thoughts on the big differences I feel between the .NET and Java world (mostly about open mindedness and OSS).

Anyway again, What do you guys think about polyglot environments and DSLs both internal and external?

Thursday, June 17, 2010

Natural Language Constructs

Yesterday I casually mentioned to Sean that in my code I seem to vastly prefer lambda over LINQ syntax. He translated that to mean that I probably prefer chaining my actions from one object to the next, or something like that. And, you know what? He was right. I thought about it a bit more today and what I discovered was the reason why I've never liked certain static method calls in C# (actually, I've never liked static methods in general and only grudgingly accept them where needed).

Taking this a bit further, I narrowed down a few key examples of static method calls I've never liked. Let's take a look at what are probably the two most common for me:

string.IsNullOrEmpty(input);
string.Format(input, args);

I can't tell you how many times I've begun typing a variable to check if it's null, only to remember that it's a string and have to back-track for a moment to make the method call. Or how many times I've begun typing a string only to later remember that it needs to be formatted and to back-track for a moment, etc. It's just a little thing, but it irks me a little bit every time.

(In fact, this is probably the reason why it took so long for me to catch on to using string.Format() instead of just concatenating strings together, which I've always done until recently. I hope all the places I've ever worked will forgive that.)

But no longer must I accept these things. For now I can write extension methods! Behold:

public static string FormatWith(this string s, params object[] args)
{
  return string.Format(s, args);
}

public static string IsNullOrEmpty(this string s)
{
  return string.IsNullOrEmpty(s);
}

You may be wondering if that second one will break if it's called on a truly null variable. It doesn't. Man, I love C#.

I can't speak for everyone else, but to me this makes the code so much easier to write. I feel like a weight has been lifted. It's all in how we look at what we're writing, I guess. For me it's about the subject at hand. In this case, the variable. I prefer to start with the variable and chain my commands to it accordingly. (This is also why I hate output parameters and things like Array.Reverse(), which by the way I've also replaced with an extension method). I mean, compare for yourself:

if (string.IsNullOrEmpty(s)) return false;
vs.
if (s.IsNullOrEmpty()) return false;

return string.Format("The value is: {0}", s);
vs.
return "The value is: {0}".FormatWith(s);

At the very least, it's less typing. But, more to the point, it starts with the subject and then proceeds with the actions on that subject. That's how we learned to speak English, that's how some of us want to write our code. It just seems like a more natural language construct, doesn't it?

Either way, I've been making a lot of these extensions, and getting ideas for more from various sources online. StackOverflow has a great thread on the topic.

[Edit: I renamed IsEmptyOrNull() to IsNullOrEmpty() because, unlike Format(), the compiler doesn't seem to complain that it's a name already in use. It can tell the difference between the two.]

Thursday, June 10, 2010

Domain Sketch

I think this is the rough sketch that I wanted the architect to do back at my old job.  But now, as I've been diving deeper into this stuff myself (I'll admit, there was just no motivation to improve anything back there), I've drawn this for my teammates here on my cubicle's white board.
 
It's a basic diagram of the domain layout I'm working towards here, and if you were at my last job then you may notice it borrows heavily from that design.  I've changed a good bit in the interaction between the handlers and the processors to accommodate the different application ecosystem at my new job, but there wasn't enough room on this little board to go into much detail there.

One thing that may help in pushing this idea at my current employer (being a financial institution) is the auditing at the single entry point.  If the client applications are as dumb as possible and all of the business logic is in the Processors and all of the actual data persistence/interaction is in the Repositories, then having that single entry point is a convenient place to keep a detailed audit trail of all requests that come in to the domain.

Friday, June 4, 2010

Tony Hasn't Written Anything Yet

Tony, I know you're super busy with the new job and the long commute. (I ran into Roy the other day by the way, he said you guys don't regularly carpool anymore. Something about a "falling out." Some of us were wondering which of you would snap first.) I have to admit, though, I've been looking forward to that WPF article you were thinking about writing.

Anyway, to get to the point. If you want a simple way to contribute to this blog that doesn't involve writing an article, we could certainly use a better HTML template. You're an admin, you should be able to edit such things. If not, let me know.

I'd tried using some canned templates that I found online, but the various "widgets" used by this blog broke them in terribly amusing ways. And, well, let's face facts. You're a better HTML/CSS guy than I am (and probably better than the people who make most of the canned Blogger templates I found).

If you don't have the time, that's cool. I can start putting something together. I just figured I'd extend it to you since you're the resident expert and you might enjoy it.

Edit: Here's an example of one I liked that probably just needs a little tweaking:

Tuesday, June 1, 2010

It's OK To Hard-Code, And Other Things Your Company Might Not Know

We've all seen our fair share of "coding standards" on various projects and in various jobs. It could be as harmless as "please keep these enums in alphabetical order" to something as insidious as "all values must be in a config file." While I'm sure every such edict came from a well-intentioned policy maker, I fear that many times they were born out of a knee-jerk reaction to a problem that occurred some time in the past. And in the effort of solving that problem, more problems were introduced.

What basically ends up happening is a variant of the Law Of Instrument. Basically, the edict is made from on high: "I have found a hammer that solves this problem, so all future work must be done only with nails to avoid other problems." And what's worse, these edicts very easily become canon and spread like a plague as developers raised to believe such dogma move to other jobs and climb the ladder of experience.

This isn't to say that coding standards in general are a bad thing. In fact, to say that would make me just as guilty of applying an over-generalized rule to the complex ecosystem of software development as those whom I decry here. No, standards are useful, provided they are created and applied to be useful. After all, keeping a large collection of enums in alphabetical order (provided they are named something intuitive and useful) seems like a pretty good idea. But there exist commonly-enforced standards in many companies that do more to stifle software development than to standardize it.

Let's take a look at a few examples drawn from my own experience (thankfully, not all of these apply at my current job)...

1) Thou Shalt Make Ugly Variable Names


I've never liked Hungarian Notation. But man, some shops love it. And, invariably, they almost always get it wrong. Hungarian Notation basically means that you prefix your variables with useful shorthand to easily identify what those variables represent. Somewhere along the way, somebody thought that it means to prefix the variables with what class they represent, and that pox has spread.

I think it was the timing, honestly. When Hungarian Notation was spreading throughout the industry, you didn't have fancy IDEs that did a lot of the grunt work for you. If you're programming in a text editor devoid of context or syntax, you want your code to be easy to read. You don't want to have to look for the variable definition to know what type it is. But in this modern age of intellisense, there is no reason on God's green earth to require that all strings begin with "str" or all text boxes begin with "tbx" or any other such nonsense.

Determining the type of a variable now involves nothing more than hovering your mouse over it. Its declaration is a context menu away. And for the love of all things holy, it doesn't need to be over-abbreviated. We're not short on bytes here, you can write out full words. There is no award for keeping all your variable names exactly eight characters. (Unless you consider being impaled on a pike a reward, because we may be able to arrange something.) If you refer to the network ID of the current user as "cusrntid" then nobody will know what it is just from looking at it. Your naming convention and variable standardization has failed, you may as well have called it "x" or something equally pointless.

So please, name your variables properly. They don't have to be a specific notation across your entire enterprise. They don't have to be a specific length, or abbreviate things in specific ways. They just need to quickly and simply convey what they mean in the context of the code. If you're in a class called HRDataRepository, in a function called GetEmployeeNames, and there's a variable that declares an ADO command object, guess what that variable does? How did you know it holds a SELECT command for the Employee table in the HR database if I didn't even tell you the official standard-conforming name of the variable yet?

Rigid standards to variable names won't make your ugly, complicated, difficult-to-trace code any easier to support. Honestly, do you want your developers spending more thought cycles on fitting your broken conventions, or do you want them to apply those thought cycles to actually solving problems that mean something to your business?

2) Thou Shalt Keep Line Count To A Minimum


Line count? Line count? Ok, we already covered that the number of bytes occupied by your code is no longer an issue. Turns out 640K wasn't enough for anybody, and we've grown to accept that. We've moved on. But more to the point, we're talking about readability here. So let me start off by making a bold statement, followed by a clarifying disclaimer:

There is nothing inherently wrong with writing a function that's 1,000 lines long.

There is something inherently wrong with writing a confusing, difficult-to-trace function that's 1,000 lines long.

See the difference? Maybe your business process involves 1,000 steps in one atomic operation. Maybe that's just how much code it takes to accomplish the single task at hand. It happens. And as long as that code is easy to understand, it's OK. But when you start forcing your developers to keep their line count down, guess what happens? Their line size increases. Now you've done it, Mr. Developer Manager. You've made something terrible happen. Consider the following piece of code:

employees.ForEach(e =>
{
  var dataRow = SomeFunction(e);
  returnObj.Add(
    new EmployeeDTO{
      Name = dataRow["Name"],
      Email = dataRow["Email"],
      Department = dataRow["DepartmentName"],
      //... there are more fields
      // and some of them require other
      // function calls and deeper nests
    }
  );
});

And that's a relatively simple example. The arrowhead can get a lot deeper than that. But look at it. It's just one statement, a call to ForEach() on a generic list. Do you want that on one line in order to keep your precious line count down? Can you handle something like that on your conscience?

I know why you want your functions small. The article I referenced earlier even put forth the rule that closing braces shouldn't be beyond the same page view as the opening braces. There's good reason for that, in most cases. You should be able to see an entire code block without scrolling so you can grok the whole picture. I applaud that, it's a good thing to try to do. But if that suggestion causes you to re-factor your functions in unintuitive ways, then you didn't get the message.

What if another coding standard requires that all the functions in your class be kept in alphabetical order? Related to making intuitive variable names, you also want intuitive function names. So your atomic 1,000-line process is broken up into dozens of functions that perform dozens of pieces. And now tracing that process involves jumping around the code file like a monkey on crack, instead of following a simple step-by-step process in a single function.

3) Thou Shalt Comment Everything; // declaring that one should comment everything


The title of this section should be pretty self-explanatory. Comments are great, and they should be used often. The same can be said of oxygen. But, like oxygen, keeping it in moderation is in your best interests. If you write too many comments then you'll probably end up with a high noise-to-signal ratio. The average value of each comment drops, and the important ones more easily go unnoticed.

When you have a block of code that does something obvious, you probably don't need to describe it very much in comments. Taking a previous example, you have a function that gets employee names from a database. You know what it does from its name. It takes no input, it returns a list of strings as output. The function itself opens a database connection, selects names from an employee table into a variable, closes the connection and returns the variable. It doesn't do anything unexpected or unintuitive. So, please, don't waste your time commenting on every line in the function.

Does your function filter employees by some known employee status? Explain that in a comment. Does it perform a sort on the employees for a reason that's probably clear somewhere else in the code? Explain that in a comment, including where else it's important. Those are business needs that are not immediately obvious when reading the code. But your code that opens a connection to the database, runs a query and closes the connection... it's pretty obvious what that does and why it does it.

I remember a professor back in school who required that every line of code be described in a comment. It led to gems like this (but at least with better variable/function names, left out here for brevity):

for (int i = 0; i < foo.length; i++){ // begin a for loop
  DoSomething(foo[i]); // call DoSomething on foo[i]
  DoSomethingElse(foo[i]); // call DoSomethingElse on foo[i]
  if (i > SOME_CONSTANT){ // check if i is greater than SOME_CONSTANT

    break; // exit the for loop early
  } // end if

} // end for

So now you've written the same code twice. Once in C++, once in plain English. Congratulations. So what does it do? Why are you terminating the loop early based on a pre-defined constant? Commenting every line led to commenting each line. Individually. That stripped all context from the block of code as the developer was too busy commenting on the line of code instead of the block. A quick comment before the loop could have explained what we're doing and why that constant is important. Also, if those functions/variables are named properly and if that conditional isn't there, then none of it needs comments if it's doing something obvious.

Don't write more comments. Write better comments, keeping in mind that sometimes "better" is to not bother writing one at all.

4) Thou Shalt Create Bloated And Unmaintainable Config Files


Want to know a secret? It's OK to hard-code values. Not everything needs to be pulled out into a config file. If that statement caused you to cringe or in some way be taken aback, then please keep reading. This isn't just for you, this is for everybody who ever works on your code.

It's generally a good idea to put changeable values in config files so that they can be easily changed on the fly without re-building and re-deploying the software. Paths to network shares, connection strings, email recipient addresses, etc. These values might change, and they might need to be changed quickly and without re-testing and re-deploying everything. But, as is the general theme of this article, there is such a thing as taking it too far.

I once worked with a man who insisted that all literal values must come from a config file. Have you ever seen code that didn't have a single string or number literal in it? It just doesn't look right. It also led to a number of problems.

First, there was the question of how to pull strings from a config file without using strings to reference the config values. His approach was to have a single class of hundreds of static constants to hold the config value keys. This was the only "approved" place to put string literals, and they had to adhere to strict conventions.

Second, everything was a string. In his attempt to get strings out of his code, he introduced more strings into his code. There is a huge difference between 1 and "1" and that difference can be very important. You know it, I know it, the compiler knows it. Making all values enter the application as strings meant that he had to do tons of type checking and error trapping that he otherwise wouldn't have needed. It was a lot of work, and it was fragile.

Third, it was fragile. (Yes, that was worth saying in both paragraphs.) The code required that the massive config file be just right, or things could go horribly wrong. No matter how much type checking and error trapping you put in there, someone will find a way to enter a config value that will break your code. This is especially true when your config file is massive and a standard small percentage of human error becomes a very real number.

Users just don't need that much control over your software. They don't need to change the text of an error message or, God help you, the name of the stored procedure that you call. Little Bobby Tables might stop by to haunt you later. Let your users change things they understand. But if you don't expect them to be able to write the code themselves then you shouldn't expect them to manage every aspect of how that code behaves either. Separate your configured values from your hard-coded values with a little bit of common sense.

To give another example, I recently worked with a guy at my current job on a relatively simple project. I was writing some code that involved a case statement. Basically, it was dividing up code paths based on the type of a loan in some bank software. Loan type 430 is mortgage, loan types 1-699 (except 430) are commercial, loan types 700+ are consumer. Those are the three options. Any more options or any more granular differentiating among loan types will require code changes. A config value here isn't going to offer you any advantages.

Will the loan types ever change? Seriously, have you ever worked at a bank? Nothing changes. Ever. You may as well be etching that code into stone because it's going to outlast you and everything you hold dear. Should the users be able to configure this? Probably not. They might not even know the ID numbers, they might only know loan types by their names or some other distinguishing characteristic. And are there really that many different types of loans? My guess is that someone picked a 3-digit ID to cover future expansion and just broke it up by a perceived 70/30 split in the types of loans available. Then mortgages eventually broke some standard.

The developer with whom I was working saw my hard-coded values (== 430, <= 699, else) and said "the only constant is change." I get what you're trying to say, and I very much agree with the statement. But follow the spirit of the law rather than the letter of the law wherever possible. If anything about these values ever changes, isn't that going to require more code changes to accommodate that anyway? This is fairly core business logic. How much effort do you really want to spend re-factoring out into a config file values that probably aren't going to change.

Don't be afraid to hard-code values from time to time. It's just easier to support in many cases. Yes, there will be developers who curse your name for it. "The genius decided to hard-code it" will be a phrase spoken of your work from time to time. But come on, we all curse the names of previous developers from time to time. Nobody wants to work on somebody else's code.

"But," you may ask, "shouldn't we do what we can to avoid having to re-build and re-deploy our code? After all, deploying is such a hassle." Is it? Why? Building and deploying your code are the most automated steps in the entire process. Writing/debugging/testing is the hard stuff. If building and deploying in your environment are difficult and painful then your environment is broken. Stop breaking your code to match your environment and start fixing your environment. If change is indeed the only constant then we should be actively embracing it rather than grudgingly accepting it.

5) Thou Shalt Tightly Couple With Third Parties Who Don't Know Or Care About You


"Log4Net is nice, but our standard here is to use the Logging Application Block in Microsoft Enterprise Library." Well, dear company, aside from the fact that you're ignoring the reduced overhead and increased performance of the former in this particular case, your standard on the latter is broken. You did it wrong. The problem isn't with the library itself, it's with the fact that you were somehow convinced that you needed to pick one.

Consider, instead, the following statement: "Our standard here is to log to our ILogger interface, to write all info and above events to [some database or some message queue of some kind] and to email all error and above events to [some distribution list]." It took a little more thought, but it's a lot more of a standard than that statement in the previous paragraph.

In the first example, no real standard was created at all. You chose a library to use, but defined nothing about how it should be used. Some applications can write logs to a file on some server that nobody knows about, some can write to a database table somewhere, some can email the developer who wrote the code and nobody else, etc. That's not a standard. All that does is tightly couple your code with some third party library that you can't control or support. You've gained no benefits.

In the second example, you can still use your precious third party library. Nothing's stopping you. But, more importantly, you can use a different third party library if you ever want to. Or you can roll your own. Nothing's stopping you. Maybe you're not using dependency injection and aren't into the whole interfaces thing. That's fine, whatever. But the point here is that these "standards" on external dependencies should be abstracted wherever possible.

So don't let your company be so content in the fact that they "chose a standard" and that all of their applications adhere to that same standard. They haven't made anything easier to support in doing so. And don't you dare walk in there with a smug sense of open-source satisfaction and declare that the standard is now being switched to Log4Net. If that's the case, then you missed the point entirely.

OK, that was a long post with a good bit of ranting mixed in there. Let's sum up. Development standards can be a very good thing, but they have to be applied properly to the problems at hand. Deciding on the wrong standard and forcing it down the throats of your developers impedes productivity. Your developers are intelligent people who can solve problems in intelligent ways. Stop trying to make the entire process a matter of copying and pasting "approved" pieces of code in the right order.

You should choose your standards and your acceptable coding practices for your domain. But choose wisely.