Monday, October 31, 2011

The Various Flavors of Validation

I recently wrote a post about Microsoft's Data Annotations and how they can misdirect one to believe that one's data is being validated when it really isn't. Essentially, what I'm against is relying on these annotations within the domain models for validation. All they really do is tell certain UI implementations (MVC3 web applications, for example) about their data validation. And they only do this if the UI implementation accepts it. The UI has to check the ModelState, etc. They're not really validating the data, they're just providing suggestions that other components may or may not heed.

This led to the question among colleagues... Where should the validation live? There is as much debate over this as there are tools and frameworks to assist in the endeavor. So my opinion on the matter is just that, an opinion. But as far as I'm concerned the validation should live everywhere. Each part of the system should internally maintain its own validation.

The idea is simple. There should be multiple points of validation because there are multiple reasons for validating the data. Will this lead to code duplication? In some cases, quite possibly. And I wholeheartedly agree that code duplication is a bad thing. But just because two lines of code do the same thing, are they really duplicated? It depends. If they do the same thing for the same reason and are generally in the same context, then yes. But if they coincidentally do the same thing but for entirely different reasons, then no. They are not duplicated.

Let's take a look at an example. My domain has a model:

public class Band
{
  private string _name;
  public string Name
  {
    get
    {
      return _name;
    }
    set
    {
      if (string.IsNullOrWhiteSpace(value))
        throw new ArgumentException("Name cannot be empty.");
      _name = value;
    }
  }

  private string _genre;
  public string Genre
  {
    get
    {
      return _genre;
    }
    set
    {
      if (string.IsNullOrWhiteSpace(value))
        throw new ArgumentException("Genre cannot be empty.");
      _genre = value;
    }
  }

  private Band() { }

  public Band(string name, string genre)
  {
    Name = name;
    Genre = genre;
  }
}

As before, this model is internally guaranteeing its state. There are no annotations on bare auto-properties to suggest that a property shouldn't be null. There is actual code preventing a null or empty value from being used in the creation of the model. The model can't exist in an invalid state. Data validation exists here.

Now, it stands to reason that the UI shouldn't rely solely on this validation. Why? Because it would be a terrible user experience. Any invalid input would result in an exception to be handled by the application. Even if the exception is handled well and the user is presented with a helpful message, it would still mean that the user sees only the first error on any given attempt. Trying to submit a large form with lots of invalid fields would be an an irritating trial-and-error process.

So we need to do more validation. The model is validating the business logic. But the UI also needs to validate the input. It's roughly the same thing, but it's for an entirely different reason. The UI doesn't care about business logic, and the domain doesn't care about UX. So even though both systems need to "make sure the Name isn't empty" they're doing it for entirely different reasons.

Naturally, because I don't want tight coupling and because my UI and my domain should have the freedom to change independently of one another, I don't want my UI to be bound to the domain models. So for my MVC3 UI I'll go ahead and create a view model:

public class BandViewModel
{
  [Required]
  public string Name { get; set; }
        
  [Required]
  public string Genre { get; set; }
}

There's a lot more validation that can (and probably should) be done, but you get the idea. This is where these data annotations can be useful. As long as the developer(s) working on the MVC3 application are all aware of the standard being used. (And it's a pretty common standard, so it's not a lot to expect.) So the controller can make use of the annotations accordingly:

public class BandController : Controller
{
  public ActionResult Index()
  {
    return View(
        repository.Get()
                  .Select(b => new BandViewModel {
                      Name = b.Name,
                      Genre = b.Genre
                  })
    );
  }

  [HttpGet]
  public ActionResult Edit(string name)
  {
    if (ModelState.IsValid)
    {
      var band = repository.Get(name);
      return View(
        new BandViewModel
        {
          Name = band.Name,
          Genre = band.Genre
        }
      );
    }
    return View();
  }

  [HttpPost]
  public ActionResult Edit(BandViewModel band)
  {
    if (ModelState.IsValid)
    {
     repository.Save(
        new Domain.Band(band.Name, band.Genre)
      );
      return RedirectToAction("Index");
    }

    return View(band);
  }
}

The data annotations work fine on the view models because they're a UI concern. The controller makes use of them because the controller is a UI concern. (Well, ok, the View is the UI concern within the context of the MVC3 project. The entire project, however, is a UI concern within the context of the domain. It exists to handle interactions with users to the domain API.)

So the validation now exists in two places. But for two entirely different reasons. Additionally, the database is going to have validation on it as well. The column which stores the Name value is probably not going to allow NULL values. Indeed, it may go further and have a UNIQUE constraint on that column. This is data validation, too. And it's already happening in systems all over the place. So the "duplication" is already there. Makes sense, right? You wouldn't rely entirely on the UI input validation for your entire system and just store the values in unadorned columns in a flat table, would you? Of course not.

We're up to three places validating data now. The application validates the input and interactions with the system, the domain models validate the business logic and ensure correctness therein, and the database (and potentially data access layer, since at the very least each method should check its inputs) validates the integrity of the data at rest.

Well, since this is a web application, we have the potential for a fourth place to validate data. The client-side UI. It's not necessary, of course. It's just a matter of design preference. Weigh the cost of post-backs with the cost of maintaining a little more code, determine your bandwidth constraints, etc. But if you end up wanting that little extra UX goodness to assist the user in filling out your form before posting it, then you can add more validation:

<body>
  @Html.ValidationSummary("Please fix your errors.")
  <div>
    @using (Html.BeginForm("Edit", "Band"))
    {
      <fieldset>
        Name: @Html.TextBox("Name", Model.Name)<br />
        Genre: @Html.TextBox("Genre", Model.Genre)
      </fieldset>
      <button type="submit">Save</button>
    }
  </div>
  <script type="text/javascript">
    $(document).ready(function () {
      $('#Name').blur(function () {
        if ($(this).val() == '') {
          $(this).css('background', '#ffeeee');
        }
      });
      $('#Genre').blur(function () {
        if ($(this).val() == '') {
          $(this).css('background', '#ffeeee');
        }
      });
    });
  </script>
</body>

(You'll want prettier forms and friendlier validation, of course. But you get the idea.)

For a simple form like this it's not necessary, but for a larger and more interactive application it's often really nice to have client-side code to assist the user in interacting with the application. But once again, this is a duplication of logic. As before, however, it's duplicating the logic for an entirely different purpose. A purpose outside the scope of the other places where the logic exists. The end result is the same, but the contextual meaning is different.

So where does validation live?
  • In the domain models - Validate the state of the models at every step.  Don't allow invalid models to be created and then rely on some .IsValid() construct.  Prevent invalid state from ever existing on the data in motion.
  • In the database - Validate the integrity of the data at rest.  Don't allow invalid data to be persisted.  One application may be reliable enough to provide only valid data, but others may not.  And admins have a habit of directly editing data.  Make sure the data model rigidly maintains the integrity of the data.
  • In the application - Validate the user's input into the system.  Present friendly error messages, of course.  But this is the first line of defense for the rest of the system.  Armed with the knowledge that the business logic won't tolerate invalid data, make sure it's valid before even consulting with the business logic.
  • In the client-side UI - Provide data validation cues to assist the user and maintain a friendly UX.

That's a lot of validation. But notice how each one performs a distinctly different function for the overall system. It's repeating the same concept, but in a different context.

Now what happens if something needs to change? Well, then it'll need to be changed anywhere it's applicable. Is that duplicated effort? Again, it depends. Maybe all you need to change is the UI validation for some different visual cues. Maybe you need to make a more strict or less strict application without actually changing the business logic. Maybe you need to change some validation in the database because you're changing the data model and the new model needs to enforce the business logic differently.

If you change the core business logic then, yes, you will likely have to make changes to the other parts of the system. This isn't the end of the world. And it makes sense, doesn't it? If you change the validation logic in the domain models then you've changed the shape of your models. You've changed your fundamental business concepts in your domain specific language. So naturally you'll want to check the database implementation to make sure it can still persist them properly (and look for any data migration that needs to happen on existing data), and you'll want to check your applications and interfaces to make sure they're interacting with the models correctly (and look for any cases where the logical change presents a drastic UI change since users may need to be warned and/or re-trained).

It's not a lot of code. And it's not a lot to remember. There is a feeling of duplication which we as developers find distasteful, but it's keeping the concerns separated and atomic in their own implementations. Besides, if you miss one then your automated tests will catch it quickly. Won't they?

No comments:

Post a Comment