Monday, October 31, 2011

The Various Flavors of Validation

I recently wrote a post about Microsoft's Data Annotations and how they can misdirect one to believe that one's data is being validated when it really isn't. Essentially, what I'm against is relying on these annotations within the domain models for validation. All they really do is tell certain UI implementations (MVC3 web applications, for example) about their data validation. And they only do this if the UI implementation accepts it. The UI has to check the ModelState, etc. They're not really validating the data, they're just providing suggestions that other components may or may not heed.

This led to the question among colleagues... Where should the validation live? There is as much debate over this as there are tools and frameworks to assist in the endeavor. So my opinion on the matter is just that, an opinion. But as far as I'm concerned the validation should live everywhere. Each part of the system should internally maintain its own validation.

The idea is simple. There should be multiple points of validation because there are multiple reasons for validating the data. Will this lead to code duplication? In some cases, quite possibly. And I wholeheartedly agree that code duplication is a bad thing. But just because two lines of code do the same thing, are they really duplicated? It depends. If they do the same thing for the same reason and are generally in the same context, then yes. But if they coincidentally do the same thing but for entirely different reasons, then no. They are not duplicated.

Let's take a look at an example. My domain has a model:

public class Band
{
  private string _name;
  public string Name
  {
    get
    {
      return _name;
    }
    set
    {
      if (string.IsNullOrWhiteSpace(value))
        throw new ArgumentException("Name cannot be empty.");
      _name = value;
    }
  }

  private string _genre;
  public string Genre
  {
    get
    {
      return _genre;
    }
    set
    {
      if (string.IsNullOrWhiteSpace(value))
        throw new ArgumentException("Genre cannot be empty.");
      _genre = value;
    }
  }

  private Band() { }

  public Band(string name, string genre)
  {
    Name = name;
    Genre = genre;
  }
}

As before, this model is internally guaranteeing its state. There are no annotations on bare auto-properties to suggest that a property shouldn't be null. There is actual code preventing a null or empty value from being used in the creation of the model. The model can't exist in an invalid state. Data validation exists here.

Now, it stands to reason that the UI shouldn't rely solely on this validation. Why? Because it would be a terrible user experience. Any invalid input would result in an exception to be handled by the application. Even if the exception is handled well and the user is presented with a helpful message, it would still mean that the user sees only the first error on any given attempt. Trying to submit a large form with lots of invalid fields would be an an irritating trial-and-error process.

So we need to do more validation. The model is validating the business logic. But the UI also needs to validate the input. It's roughly the same thing, but it's for an entirely different reason. The UI doesn't care about business logic, and the domain doesn't care about UX. So even though both systems need to "make sure the Name isn't empty" they're doing it for entirely different reasons.

Naturally, because I don't want tight coupling and because my UI and my domain should have the freedom to change independently of one another, I don't want my UI to be bound to the domain models. So for my MVC3 UI I'll go ahead and create a view model:

public class BandViewModel
{
  [Required]
  public string Name { get; set; }
        
  [Required]
  public string Genre { get; set; }
}

There's a lot more validation that can (and probably should) be done, but you get the idea. This is where these data annotations can be useful. As long as the developer(s) working on the MVC3 application are all aware of the standard being used. (And it's a pretty common standard, so it's not a lot to expect.) So the controller can make use of the annotations accordingly:

public class BandController : Controller
{
  public ActionResult Index()
  {
    return View(
        repository.Get()
                  .Select(b => new BandViewModel {
                      Name = b.Name,
                      Genre = b.Genre
                  })
    );
  }

  [HttpGet]
  public ActionResult Edit(string name)
  {
    if (ModelState.IsValid)
    {
      var band = repository.Get(name);
      return View(
        new BandViewModel
        {
          Name = band.Name,
          Genre = band.Genre
        }
      );
    }
    return View();
  }

  [HttpPost]
  public ActionResult Edit(BandViewModel band)
  {
    if (ModelState.IsValid)
    {
     repository.Save(
        new Domain.Band(band.Name, band.Genre)
      );
      return RedirectToAction("Index");
    }

    return View(band);
  }
}

The data annotations work fine on the view models because they're a UI concern. The controller makes use of them because the controller is a UI concern. (Well, ok, the View is the UI concern within the context of the MVC3 project. The entire project, however, is a UI concern within the context of the domain. It exists to handle interactions with users to the domain API.)

So the validation now exists in two places. But for two entirely different reasons. Additionally, the database is going to have validation on it as well. The column which stores the Name value is probably not going to allow NULL values. Indeed, it may go further and have a UNIQUE constraint on that column. This is data validation, too. And it's already happening in systems all over the place. So the "duplication" is already there. Makes sense, right? You wouldn't rely entirely on the UI input validation for your entire system and just store the values in unadorned columns in a flat table, would you? Of course not.

We're up to three places validating data now. The application validates the input and interactions with the system, the domain models validate the business logic and ensure correctness therein, and the database (and potentially data access layer, since at the very least each method should check its inputs) validates the integrity of the data at rest.

Well, since this is a web application, we have the potential for a fourth place to validate data. The client-side UI. It's not necessary, of course. It's just a matter of design preference. Weigh the cost of post-backs with the cost of maintaining a little more code, determine your bandwidth constraints, etc. But if you end up wanting that little extra UX goodness to assist the user in filling out your form before posting it, then you can add more validation:

<body>
  @Html.ValidationSummary("Please fix your errors.")
  <div>
    @using (Html.BeginForm("Edit", "Band"))
    {
      <fieldset>
        Name: @Html.TextBox("Name", Model.Name)<br />
        Genre: @Html.TextBox("Genre", Model.Genre)
      </fieldset>
      <button type="submit">Save</button>
    }
  </div>
  <script type="text/javascript">
    $(document).ready(function () {
      $('#Name').blur(function () {
        if ($(this).val() == '') {
          $(this).css('background', '#ffeeee');
        }
      });
      $('#Genre').blur(function () {
        if ($(this).val() == '') {
          $(this).css('background', '#ffeeee');
        }
      });
    });
  </script>
</body>

(You'll want prettier forms and friendlier validation, of course. But you get the idea.)

For a simple form like this it's not necessary, but for a larger and more interactive application it's often really nice to have client-side code to assist the user in interacting with the application. But once again, this is a duplication of logic. As before, however, it's duplicating the logic for an entirely different purpose. A purpose outside the scope of the other places where the logic exists. The end result is the same, but the contextual meaning is different.

So where does validation live?
  • In the domain models - Validate the state of the models at every step.  Don't allow invalid models to be created and then rely on some .IsValid() construct.  Prevent invalid state from ever existing on the data in motion.
  • In the database - Validate the integrity of the data at rest.  Don't allow invalid data to be persisted.  One application may be reliable enough to provide only valid data, but others may not.  And admins have a habit of directly editing data.  Make sure the data model rigidly maintains the integrity of the data.
  • In the application - Validate the user's input into the system.  Present friendly error messages, of course.  But this is the first line of defense for the rest of the system.  Armed with the knowledge that the business logic won't tolerate invalid data, make sure it's valid before even consulting with the business logic.
  • In the client-side UI - Provide data validation cues to assist the user and maintain a friendly UX.

That's a lot of validation. But notice how each one performs a distinctly different function for the overall system. It's repeating the same concept, but in a different context.

Now what happens if something needs to change? Well, then it'll need to be changed anywhere it's applicable. Is that duplicated effort? Again, it depends. Maybe all you need to change is the UI validation for some different visual cues. Maybe you need to make a more strict or less strict application without actually changing the business logic. Maybe you need to change some validation in the database because you're changing the data model and the new model needs to enforce the business logic differently.

If you change the core business logic then, yes, you will likely have to make changes to the other parts of the system. This isn't the end of the world. And it makes sense, doesn't it? If you change the validation logic in the domain models then you've changed the shape of your models. You've changed your fundamental business concepts in your domain specific language. So naturally you'll want to check the database implementation to make sure it can still persist them properly (and look for any data migration that needs to happen on existing data), and you'll want to check your applications and interfaces to make sure they're interacting with the models correctly (and look for any cases where the logical change presents a drastic UI change since users may need to be warned and/or re-trained).

It's not a lot of code. And it's not a lot to remember. There is a feeling of duplication which we as developers find distasteful, but it's keeping the concerns separated and atomic in their own implementations. Besides, if you miss one then your automated tests will catch it quickly. Won't they?

Monday, October 17, 2011

Startup Weekend

This weekend I was at Startup Weekend in Boston, and words fail me in an attempt to describe how awesome (and exhausting) the event was.  I'm definitely glad I went, and will be attending these events in the future.  The entrepreneurial spirit and drive throughout the weekend was inspiring, not to mention the friends and business contacts made while working together on a project like that.

The idea behind the weekend is simple.  People pitch their start-up business ideas the first night, everybody votes on which ones they want to implement, and the top projects (I think it was 16 of them) recruit teams among the rest of the attendees to develop the business plan, prototype, etc. throughout the weekend.  There was a wide variety of ideas being pitched, so there was easily something for everyone.

Being a web developer by trade, I was specifically looking for something where I could flex those particular muscles.  There was no shortage of mobile phone apps being developed, and maybe I'll do one of those next time.  But I wanted to stick to a web application this time so I chose a team in that space.  (I'm going to hold off on marketing the end result here until we've vetted the system a little more.  I'm not entirely sure why, and I don't have the wherewithal to explain it until the caffeine sets in, so I'm just going to go with my gut for now.)

The really interesting part for me was that I ended up not really doing a lot of development.  I knew that was going to be a challenge going into the event because I don't really do start-up work.  I'm not a cowboy who throws code at something to rapidly build a prototype.  I'm a methodical enterprise developer.  A whole product in a weekend?  It takes us at least a week just to talk about building a product, then a few more weeks for requirements and design.

So I ended up in a role where I sort of oversaw the development, making decisions about the technology so the developers could keep moving forward, and trying to drive it toward the business goal.  When I had no hands-on development tasks (such as the occasional jQuery widget to fit into the designer's vision), I helped with the business development and product envisioning.

It was kind of weird, really.  Not heads-down coding?  Standing around the designer's desk and helping the group envision their product presentation?  Helping to film a commercial?  These are not my normal tasks.  But, honestly, it worked.  I like to think I was pretty good at it, and I definitely enjoyed it.  Don't get me wrong, I wanted to code.  But the goal wasn't to have a fully-functional website by the end of the weekend.  (Although some of the other teams seem to have managed that, or something close to it, but those were different products with different business drivers.)  The goal was to make the final pitch at the end of the event.

So I imagine I'll be doing a lot more development as we continue with this endeavor.  (That's right, we weren't playing around.  Nobody there was just toying with the notion.  These people are really building start-ups and I really want to be a part of that.)  Though there's much to be discussed in that.  I mean, what we did write was in PHP.  I don't know if I want to do that again.  We'll see how things go with the team once we've decompressed from the weekend and re-group perhaps next weekend.

We're not building "the next Facebook" or anything of that sort.  But the team leader had an idea and pitched the idea and we've come together to make it happen.  Whether or not it works is for the market to decide.

Wednesday, October 12, 2011

LINQ2GEDCOM

I haven't done much with that gNealogy project in a while, mostly because it's in a state where the next step is visualizations of the data and I'm just not much of a UI guy in that regard. Maybe a flash of inspiration will hit me someday and I'll try something there. But for now I'm content to wait a little longer until Adrian has some space clock cycles to make some pretty visualizations for the project.

One thing never quite sat right with me, though. I'd encapsulated the data access behind some repositories to hide the mess, but it was still a mess. (And still is, until I replace it with my new thing.) I wanted to make something better. Something more streamlined. And, after reading C# in Depth (almost done with it), something more elegant and idiomatic of the language.

So I've set out to create a LINQ data context for GEDCOM files. Much of the data context and IQueryable stuff came from a tutorial I found online. There are several of them, but this one seemed just right for what I was doing. I've extended the functionality to add more entity types, made the tutorial-driven stuff generic for those types, etc.

All in all, I'm really happy with where this project is going so far. There's still a lot to do, and there may still be better ways to do a lot of it. But what it supports already in terms of file data and interacting with the data source (including writing changes back to the file, something that gNealogy never had) is already pretty cool. There's a bit of clean-up left to do in the wake of new features, especially after today's changes, so it's not production-ready by any means. But the main thing is... it's a lot of fun.

So this leaves the current state of my GitHub repositories as:
  • LINQ2GEDCOM - Currently in active development.  Lots of features to add.  My favorite work yet.
  • gNealogy - Not actively being developed right now, but still good.  I need to add the use of LINQ2GEDCOM once that's ready, which would make this one a lot simpler.  I'd also like to change around some of the overall structure here.  The main thing is, it's ready for visualization development if anybody wants to do that.
  • FormGuard - That simple little jQuery plugin I wrote a while back.  Nothing special, certainly nothing to write home about, just a quick proof of concept for myself in terms of writing a jQuery plugin.  Hopefully a stepping stone to more as the need arises.
  • MarkovSmirnoff - A silly little Markov Chain text generator I wrote because Trevor and I wanted to generate random text for some stuff.  It's not elegant, it's not particularly great, it's just fun to play with.
  • CommonUtilities - Nothing special here at all.  This is mostly a placeholder for a personal utility library I'd like to keep around.  I put some of my older stuff in here, but the implementations need to be improved quite a bit.  There's lots to be added here.
And just when I thought that was enough, Trevor had an awesome idea for a game that I want to implement in HTML5/JavaScript.  Hopefully we can get that project off the ground so I can share it here as it grows.  But it's the kind of project that's going to take a lot of planning and design work before any actual coding can begin, so it'll probably be a while.

I love it when a plan comes together.

Sunday, October 9, 2011

Boston Application Security Conference

Yesterday I attended BASC 2011 and had a pretty good time. There were some particularly interesting talks and demos, and I'd be lying if I said I didn't learn a thing or two about web and mobile application security. I'm far from an expert, of course, but it's always good to immerse myself in some knowledge from time to time in the hopes that some useful bits will stick

In the effort to retain some of it, I figured I'd write a short post on what I experienced while I was there...

8:30 - Breakfast

Sponsored by Rapid7. Good selection ofnbagels and spreads, lots to drink. Good stuff.

9:00 - Key note session by Rob Cheyne.

Great stuff. Not only was he an engaging speaker, but he had a lot of good insights on information security in the enterprise. Not just from a technical perspective, but more importantly from a corporate mindset perspective. (On a side note, he looked a lot like the bad guy from Hackers. Kind of funny.)

My favorite part was a story he told about groupthink. There was an experiment conducted a while back on primate social behavior involving 5 monkeys in a closed cage. (Not a small cage or anything like that, there was plenty of room.) In the center of the cage was a ladder and at the top was a banana. Naturally, the monkeys tried to climb the ladder to retrieve it. But any time one of them was just about to reach the banana, they were all sprayed with ice cold water. So very quickly the monkeys learned that trying to get the banana was bad.

One of the monkeys was removed and replaced with a new monkey. From this point on the water was never used again. Now, the new monkey naturally tried to reach the banana. But as soon as he began trying, the other monkeys started beating on him to stop him. He had no idea what was going on, but they had learned that if anybody went for the banana then they'd all be sprayed. The new monkey never knew about the water, but he did quickly learn that reaching for the banana was bad.

One by one, each monkey was replaced. And each time the results were the same. The new monkey would try to get the banana and the others would stop him. This continued until every monkey had been replaced. So, in the end, there was a cage full of monkeys who knew that trying to get the banana was bad. But none of them knew why. None of them had experienced the cold water. All they knew was that "this is just how things are done around here." (Reminds me of the groupthink at a financial institution for which I used to work. They've since gone bankrupt and no longer exist as a company.)

It's a great story for any corporate environment, really. How many places have you worked where that story can apply? From an information security perspective it's particularly applicable. Most enterprises approach security in a very reactive manner. They get burned by something once, they create some policy or procedure to address that one thing, and then they go on living with that. Even if that one thing isn't a threat anymore, even if they didn't properly address it in the first place, even if the threat went away on its own but they somehow believe that their actions prevent it... The new way of doing things gets baked into the corporate culture and becomes "just the way we do things here."

A number of great quotes and one-liners came from audience participation as well. One of the attendees said, "Security is an illusion, risk is real." I chimed in with, "The system is only as secure as the actions of its users." All in all it was a great talk. If for no other reason than to get in the right mindset, enterprise leaders (both technical and non-technical) should listen to this guy from time to time.

10:00 - "Reversing Web Applications" with Andrew Wilson from Trustwave SpiderLabs.

Pretty good. He talked about information gathering for web applications, reverse engineering to discern compromising information, etc. Not as engaging, but actually filled with useful content. I learned a few things here that seem obvious in terms of web application security in hindsight, but sometimes we just need someone to point out the obvious for us.

11:00 - "The Perils of JavaScript APIs" with Ming Chow from the Tufts University Department of Computer Science.

This one was... an interesting grab bag of misinformation. I don't know if it was a language barrier, though the guy seemed to have a pretty good command of English. But the information he was conveying was in many cases misleading and in some cases downright incorrect.

For example, at one point he was talking about web workers in JavaScript. On his PowerPoint slide he had some indication of the restriction that web workers will only run if the page is served over HTTP. That is, if you just open the file locally with "file://" then they won't run. Seems fair enough. But he said it as "web workers won't run on the local file system, they have to be run from the server." Somebody in the group asked, "Wait, do they really run on the server? That is, does the page actually kick off a server task for this? It doesn't run on the local computer?" He responded with "Um, well, I don't really know. But I do know that it won't run from the local file system, it runs from the server." Misleading at best, downright incorrect at worst.

He spent his time talking about how powerful JavaScript has become and that now with the introduction of HTML5 the in-browser experience is more powerful than ever. He said that it's come a long way from the days of simple little JavaScript animations and browser tricks and now we can have entire applications running just in the browser. During all of this, he kept indicating that it's become "too powerful" and that it introduces too many risks. Basically he was saying that client-side code can now become very dangerous because of these capabilities and if the application is client-side then anybody can hack into it.

At one point he said, "Now we can't trust data from the client." Now? Seriously? We've never been able to trust data from the client. This is nothing new. Is this guy saying that he's never felt a need to validate user input before? That's a little scary. Most of the insights he had on the state of JavaScript were insights from a couple years ago. Most of his opinions were misled. Most of his information was incorrect. I shudder to think what he's doing to his students at Tufts.

(By the way, the JavaScript image slideshow on the Tufts University Department of Computer Science is currently broken at the time of this writing, at least in Safari. Loading the images blocks the loading of the rest of the page until they're complete; the images are cycled through very rapidly (as in, within about a second total) on the initial load, and then they no longer cycle at all. I wonder if this guy wrote it.)

12:00 - Lunch

Sponsored by Source. Sandwiches and wraps. Tons of them. Not bad, pretty standard. Good selection, which is important. And there were plenty left for people to snack on throughout the rest of the day.

1:00 - "OWASP Mobile Top 10 Risks" with Zach Lanier of Intrepidus Group.

Good speaker, good information. The talk was basically covering the proposed OWASP top ten list of mobile security threats, which people are encouraged to evaluate and propose changes. He explained the risks and the reasons behind them very well. I don't know much about mobile development, but this list is exactly the kind of thing that should be posted on the wall next to any mobile developer. Any time you write code that's going to run on a mobile platform, refer back to the list and make sure you're not making a mistake that's been made before.

2:00 - "Don't Be a Victim" with Jack Daniel from Tenable.

This man has an epic beard. So, you know, that's pretty awesome. He's also a fun speaker and makes the audience laugh and all that good stuff. But I don't know, I didn't really like this presentation. It was lacking any real content. To his credit, he warned us about this at the beginning. He told us the guy in the other room has more content and that this talk is going to be very light-hearted and fun. I understand that, I really do. But I think there should at least be some content. Something useful. Otherwise don't bother scheduling a room for this, just make it something off to the side for people to take the occasional break.

The whole talk was through metaphor. I can see what he was trying to get at, but he never wrapped it up. He never brought the metaphor back to something useful. Imagine if Mr. Miyagi never taught Daniel how to fight, he just kept having him paint fences and wax cars. The metaphor still holds, and maybe Daniel will one day understand it, but the lesson kind of sucks. The premise was stretched out to the point of being razor-thin. The entire hour felt like an introduction to an actual presentation.

It was mostly a slideshow of pictures he found on the internet, stories about his non-technical exploits (catching snakes as a kid in Texas, crap like that), references to geek humor, and the occasional reference to the fact that he was wearing a Star Trek uniform shirt during the presentation. Was he just showing off his general knowledge and his geekiness?

Don't get me wrong. He seemed like a great guy. He seemed fun. I'm sure he knows his stuff. I'm sure he has plenty of stories about how he used to wear an onion on his belt. But this seemed like a waste of time.

3:00 - "Binary Instrumentation of Programs" with Robert Cohn of Intel.

This was one of the coolest things I've ever seen. He was demonstrating for us the use of a tool he wrote called Pin which basically edits the instructions of running binaries. He didn't write it for security purposes, but its implications in that field are pretty far-reaching. (Its implications in aspect-oriented programming also came up, which is certainly of more interest to me. Though this is a bit more machine-level than my clients would care to muck with.)

A lot of the talk was over my head when talking about binaries, instruction sets (the guy is a senior engineer at Intel, I'm sure he knows CPU instruction sets like the back of his hand), and so on. But when he was showing some C++ code that uses Pin to inject instructions into running applications, that's where I was captivated. Take any existing program, re-define system calls (like malloc), add pre- and post-instruction commands, etc. Seriously bad-ass.

Like I said, the material doesn't entirely resonate with me. It's a lot closer to the metal than I generally work. But it was definitely impressive and at the very least showed me that even a compiled binary isn't entirely safe. Instructions can be placed in memory from anything. (Granted, I knew this of course, but you'd be surprised how many times a client will think otherwise. That once something is compiled it's effectively sealed and unreadable. This talk makes a great example and demonstration against that kind of thinking.)

4:00 - "Google & Search Hacking" with Francis Brown of Stach & Liu.

Wow. Just... wow. Great speaker, phenomenal material. Most of the time he was mucking around in a tool called Search Diggity. We've all seen Google hacking before, but not like this. In what can almost be described as "MetaSploit style" they aggregated all the useful tools of Google hacking and Bing hacking into one convenient package. And "convenient" doesn't even begin to describe it.

The first thing he demonstrated was hacking into someone's Amazon EC2 instance. In under 20 seconds. He searched for a specific regular expression via Google Code, which found tons of hits. Picking a random one, he was given a publicly available piece of source code (from Google Code or Codeplex of GitHub, etc.) which contained hard-coded authentication values for someone's Amazon EC2 instance. He then logged into their EC2 instance and started looking through their files. One of the files contained authentication information for every administrative user in that company.

Seriously, I can't make this stuff up. The crowd was in awe as he jumped around the internet randomly breaking into things that are just left wide open through sloppy coding practices. People kept asking if this is live, they just couldn't believe it.

One of the questions was kind of a moral one. Someone asked why he would help create something like this when its only use is for exploitation. He covered that very well. That's not its only use. The link above to the tool can also be used to find their "defense" tools as well, which use the same concepts. Together they provide a serious set of tools for someone to test their own domains, monitor the entire internet for exploits to their domains (for example if an employee or a contractor leaks authentication information, this would find it), monitor the entire internet for other leaked sensitive data, etc. By the end of the talk he was showing us an app on his iPhone which watches a filtered feed that their tool produces by monitoring Google/Bing searches and maintaining a database of every exploit it can find on the entire internet. Filter it for the domains you own and you've got an IT manager's dream app.

A great thing about this tool also is that it doesn't just look for direct means of attack. Accidental leaks are far more common and more of a problem. This finds them. He gave one example of a Fortune-100 company that had a gaping security hole that may otherwise have gone unnoticed. The company owned tons of domains within a large IP range, and he was monitoring it. One of the sites he found via Google for that IP range stuck out like a sore thumb. Nothing business-related, but instead was a personal site for a high school reunion for some class from the 70s.

Apparently the CEO of this company (I think he said it was the CEO, it was definitely one of the top execs) was using the company infrastructure to host a page for his high school reunion. Who would have ever noticed that it was in the same IP range as the rest of the company's domains? Google notices, if you report on the data in the right way. Well, apparently this site had a SQL injection vulnerability (found by a Google search for SQL error message text indexed on that site). So, from this tiny unnoticeable little website, this guy was able to exploit that vulnerability and gain domain admin access to the core infrastructure of a Fortune-100 company. (Naturally he reported this and they fixed the problem.)

The demo was incredible. The tools demonstrated were incredible. The information available on the internet is downright frightening. Usually by this time in the day at an event like this people are just hanging around for the raffles and giveaways and are tuning out anything else. This presentation was the perfect way to end the day. It re-vitalized everyone's interest in why we were all at this event in the first place. It got people excited. Everybody should see this presentation. Technical, non-technical, business executives, home users, everybody.

5:00 - Social Time

Sponsored by Safelight. Every attendee got two free beers (or wine, or various soda beverages) while we continued to finish the leftovers from lunch. And not crappy beers either. A small but interesting assortment of decent stuff, including Wachusett Blueberry Ale, which tastes like blueberry pancake syrup but not as heavy.

5:30 - Wrap-Up

OWASP raffled off some random swag, which is always cool. One of the sponsors raffled off an iPad, which is definitely the highlight of the giveaways. For some reason, though, the woman who won it seemed thoroughly unenthused. What the hell? If she doesn't want it, I'll take it. My kids would love an iPad.

6:00 - Expert Panel

Admittedly, I didn't stay for this. I was tired and I wanted to get out of there before everybody was trying to use the bathroom, the elevators, and the one machine available for paying for garage parking. So I left.

All in all, a great little conference. I'm definitely going to this group's future events, and I'd love to work with my employer on developing a strategic focus on application security in our client projects. (Again, I'm no expert. But I can at least try to sell them on the need to hire experts and present it to clients as an additional feature, which also means more billable hours. So it's win-win.)

One thing I couldn't help but notice throughout the event was a constant serious of issues with the projectors. This is my third or fourth conference held at a Microsoft office and this seems to be a running theme. It always takes a few minutes and some tweaking to get projectors in Microsoft offices to work with Windows laptops. Someday (hopefully within a year or two) I'm going to be speaking at one of these local conferences (maybe Code Camp or something of that nature), and I'm going to use my Mac. Or my iPad. Or my iPhone. And it's going to work flawlessly. (Note that one person was using a Mac, and the projector worked fine, but he was using Microsoft PowerPoint on the Mac and that kept failing. I'll be using Keynote, thank you very much.)

Tuesday, October 4, 2011

Disposable Resources in Closures

I've been reading Jon Skeet's C# in Depth and the chapter on delegates left me with an interesting question. (Maybe he answered this question and I missed it, it's kind of a lot to take in at times and I'm sure I'll go back through the book again afterward.)

Within the closure of a delegate, variables can be captured and references to them retained within the scope of the delegate. This can lead to some interesting behavior. Here's an example:

class Program
{
  static void Main(string[] args)
  {
    var del = CreateDelegate();
    del();
    del();
  }

  static MethodInvoker CreateDelegate()
  {
    var x = 0;
    MethodInvoker ret = delegate
    {
      x++;
    };
    return ret;
  }
}

Stepping through this code and calling "x" in the command window demonstrates what's happening here. When you're in Main(), "x" doesn't exist. The command window returns an error saying that it can't find it. But with each call to the delegate, internally there is a reference to a single "x" which increments each time (thus ending with a value of 2 here).

So the closure has captured a reference to "x" and that data remains on the heap so that the delegate can use it later, even though there's no reference to it in scope when the delegate isn't being called. Pretty cool. But what happens if that reference is something more, well, IDisposable? Something like this:

class Program
{
  static void Main(string[] args)
  {
    var del = CreateDelegate();
    del();
    del();
  }

  static MethodInvoker CreateDelegate()
  {
    var x = 0;
    using (var txt = File.OpenWrite(@"C:\Temp\temp.txt"))
    {
      MethodInvoker ret = delegate
      {
        x++;
        txt.Write(Encoding.ASCII.GetBytes(x.ToString()), 0, x.ToString().Length);
      };
      return ret;
    }
  }
}

This looks a little scary, given the behavior we've already seen. Is that "using" block going to dispose of the file handle? When will it dispose of it? Will that reference continue to exist on the heap when it's not in scope?

Testing this produces an interesting result. The reference to "x" continues to work as it did previously. And the reference to "txt" seems to also be maintained. But the file handle is no longer open. It appears that when the CreateDelegate() method returns, that "using" block does properly dispose of the resource. The reference still exists, but the file is now closed and attempting to write to it when the delegate is first called results in the proper exception as a result.

So let's try something a little messier:

class Program
{
  static void Main(string[] args)
  {
    var del = CreateDelegate();
    del();
    del();
  }

  static MethodInvoker CreateDelegate()
  {
    var x = 0;
    var txt = File.OpenWrite(@"C:\Temp\temp.txt");
    MethodInvoker ret = delegate
    {
      x++;
      txt.Write(Encoding.ASCII.GetBytes(x.ToString()), 0, x.ToString().Length);
    };
    return ret;
  }
}

Now we're not disposing of the file handle. Given the previous results, the results of this are no surprise. Once the delegate is created, the file handle is open and is left open. (While stepping through the debugger I'd go back out to the file system and see if I can re-name the file, and indeed it would not let me because it was in use by another process. It wouldn't even let me open it in Notepad.) Each call to the delegate successfully writes to the file.

It's worth noting that the file handle was properly disposed by the framework when the application terminated. But what if this process doesn't terminate in an expected way? What if this is a web app or a Windows service? That file handle can get pretty ugly. It's worth testing those scenarios at a later time, but for now let's just look at what happens when the system fails in some way:

class Program
{
  static void Main(string[] args)
  {
    var del = CreateDelegate();
    del();
    del();
    Environment.FailFast("testing");
  }

  static MethodInvoker CreateDelegate()
  {
    var x = 0;
    var txt = File.OpenWrite(@"C:\Temp\temp.txt");
    MethodInvoker ret = delegate
    {
      x++;
      txt.Write(Encoding.ASCII.GetBytes(x.ToString()), 0, x.ToString().Length);
    };
    return ret;
  }
}

The behavior is the same, including the release of the file handle after the application terminates (which actually surprised me a little, but I'm glad it happened), except for one small difference. No text was written to the file this time. It would appear that the captured file handle in this case doesn't flush the buffer until either it's disposed or some other event causes it to flush. Indeed, this was observed in Windows Explorer as I noticed that the file continued to be 0 bytes in size while the delegate was being called. In this last test, it stayed 0 bytes because it was never written to. In the previous test, it went directly from 0 bytes to 2 bytes when the application exited.

I wonder if anybody has ever fallen into a trap like this in production code. I certainly hope not, but I guess it's possible. Imagine a home-grown logger which just has a delegate to a logging function that writes to a file. That log data (and, indeed, all log data leading up to it) will be lost if it's never properly disposed. And it may not be entirely intuitive to developers working in the system unless they really take a look at that logging code (which they shouldn't have to do, it should just work).

I kind of want to come across something like this in a legacy application someday, if for no other reason than to see what the real-world fallout would look like.

Data Annotations and Mixing Concerns

I'm sure you can tell that I'm a huge proponent of separation of concerns. Modular components shouldn't mix, they should be properly encapsulated and be concerned only with what they're supposed to do, not what other components are supposed to do. Many times, this puts me at odds with the various tooling support that's out there. This is especially true here in the Microsoft stack where I comfortably live.

I came across an example yesterday that got me thinking about these tools and how they mix one's concerns. Here's the code that I saw:

public class Application
{
  [Required]
  [Display(Name = "First Name")]
  public string ApplicantFirstName { get; set; }

  [Required]
  [Display(Name = "Last Name")]
  public string ApplicantLastName { get; set; }

  [Display(Name = "Birth Date")]
  public DateTime ApplicantBirthDate { get; set; }

  [Required]
  [Display(Name = "Cellphone Number")]
  public string ApplicantCellphoneNumber { get; set; }

  [Display(Name = "Postal Address")]
  public string PostalNumber { get; set; }

  [Display(Name = "Suburb")]
  public string ApplicantSuburb { get; set; }

  [Display(Name = "City")]
  public string ApplicantCity { get; set; }

  [Display(Name = "Post Code")]
  public string ApplicationPostalCode { get; set; }

  [Required]
  [Display(Name = "Email Address")]
  public string ApplicantEmailAddress { get; set; }
}

It's pretty simple, just a POCO with some basic fields on it. But what immediately bothered me was all those annotations on the properties. At first I just thought, "Those really pollute the model and make it look messy." But I couldn't just accept that thought. I needed to quantify my opinion in a more concrete way.

Let's look at the first attribute that we see:

[Required]

What does this do? Well, as far as I can tell, it instructs the UI tooling to make this a "required" field in whatever form is being generated for this model. But is it really required? I guess that depends on where you think the validation lives. But in this case, the validation is happening only in the UI. But that UI code exists in the model. Not ideal, if you ask me. And since the validation is only in the UI, this is perfectly legal:

var app = new Application();
// The "required" field currently has no value
app.ApplicantFirstName = string.Empty;
// The "required" field still has no value

Not very "required" if you ask me. Sure, the UI has some automated magic in place if you use some specific tools, but what if you don't use those tools? What if you have different UIs? One of them may be enforcing this (maybe), but others aren't.

It occurs to me then that this sort of thing is handy for very simple applications. Applications where you don't have different layers interacting across different enterprise levels. Only one application will ever use this model, it will always use the same tools and technologies, no other application will ever share its database, etc. But, honestly, isn't that a very limiting attitude? Have we never seen a situation where someone says "this will always be true" and then a new requirement comes in which breaks that assumption?

So this annotation leaves us with a polluted model. But more to the point it leaves us with a false sense of validation. That field isn't actually required. It just says it is. The model should be the source of truth for enforcing business requirements, and this model isn't doing that. How about something like this?:

public class Application
{
  private string _applicantFirstName;
  public string ApplicantFirstName
  {
    get { return _applicantFirstName; }
    set
    {
      if (string.IsNullOrEmpty(value) ||
          string.IsNullOrWhiteSpace(value))
        throw new ArgumentException("First Name is a required field");
      _applicantFirstName = value;
    }
  }

  private Application() { }

  public Application(string firstName)
  {
    if (string.IsNullOrEmpty(firstName) ||
        string.IsNullOrWhiteSpace(firstName))
      throw new ArgumentException("First Name is a required field");
    ApplicantFirstName = firstName;
  }
}

Now that's a required field. Was that so difficult to write? Not really. Is it difficult to understand? I don't think so. Sure, it's significantly more code than the data annotation version. But it works. It works everywhere. No matter what code uses this model, no matter what tooling generates the UI, no matter what you do this model requires that field. It can't exist without it. The business logic is strictly enforced in the business layer, not sometimes casually suggested to some UI layers.

(Sure, you'll also want to write more validation logic in the UI to make for a better user experience. Just getting an exception within the controller on the first incorrect field isn't ideal. We'll cover validation a lot more in a later post. Suffice it to say, there's more than one kind of validation. And it's OK to write the same checks in multiple places as long as they're for different purposes.)

Now let's take a look at that other annotation:

[Display(Name = "First Name")]

Display? This is the model, not the UI. Why are display concerns happening in the model? Again, this makes it easy for some auto-generating tooling to create forms for you. I get that. But what happens when you want to change that form? You either can't because the tooling doesn't support what you want to do, or you have to change the model for a simple UI change. I don't particularly like either of those options.

And what happens when this model is being used by multiple applications within the business? Marketing required that the field displays as "First Name" in their application because users (customers) understand it. HR, however, wants it to be called "Given Name" because their users are accustomed to that label in another off-the-shelf system they use. So who wins?

Since the UI is being defined in the model, your options are to either create a second model as a completely separate application for the other department or to tell the departments that they need to agree on a name because it can't be different. Again, what kind of options are those? It's far, far too limiting. The field label in the UI is entirely a UI concern. It can be "First Name" or "Given Name" or "Foo Baz" or anything it's required to be within that UI. The application-level logic translates the UI fields to models to interact with the business logic. At least, that's the way it should be.

I get it that these things reduce the amount of code people have to write for simple applications, I really do. But I just see this sort of thing as being far too limiting. Personally, I'd much rather have an understanding of constructors and property setters and exception handling and domain modeling than have an understanding of one particular Microsoft widget tool that works only in specific environments (where that tool is supported and used).

Maybe I'd be happier if I developed a taste for Microsoft's Kool-Aid and enjoyed it in blissful ignorance. But I just can't. We live in an age of frameworks and I truly enjoy how they've made things easier by adding more layers of abstraction to the foundations of software development. I even use a number of frameworks all the time (StructureMap and Agatha are still high on my list of favorites, and that's not even mentioning the various DAL frameworks I use often). But there's a difference between using a framework to help you do something and letting the framework completely take over your development.

Don't be afraid to write a little code. Saving a few lines isn't worth it at the expense of mixing your concerns.