Monday, March 24, 2014

Don't Judge Too Quickly

Writing software is a team effort. And as with any team effort, we have to work together. This includes the ever-dreaded notion of having to read, understand, and perhaps even maintain code written by somebody else. Maybe that "somebody else" is a close and trusted colleague, maybe it is a client-appointed counterpart, maybe it is a team member who left the team long ago. Regardless of who originated the code we're looking at, that developer is in some way a part of the overall team.

And let's be honest here... We don't like working on other people's code. It's not something to be ashamed of. After all software development is in many ways a creative endeavor, an ill-defined Venn intersection between creativity, engineering, salesmanship, sarcasm, and caffeine. And who wants to create on top of something that's already been created? A painter prefers a canvas to be blank. A sculptor prefers an unbroken stone. But these analogies extend only so far in our craft, and clinging too tightly to them detracts from the team.

So how do we approach existing ("legacy") code? Hopefully not with judgements and preconceptions.

Over the years I've heard lots of good advice from my manager. Recently he made a suggestion which particularly resonated with me, "Don't pass judgement on their code for at least a month." My initial internal reaction was to remind myself that "I know these code smells all too well" and that "I've had enough conversations with these guys to know that they have no idea what they're doing." The sheer arrogance of these thoughts quickly became apparent. Clearly I needed to do a little less deciding and a little more observing.

What can be observed is the history of the codebase. Why certain things were done certain ways, what other factors have been in place (including the factors which have long-since vanished after having left their discernible mark in the code). What becomes clear during this period is the subtle reminder that our clients and counterparts are not unintelligent, neither malicious nor insane, and that as reasonable people with reasonable goals they have time and again enacted reasonable decisions with the information they had. Maybe that information was incorrect or incomplete, or maybe that information has changed over time.

"This legacy code is a mess" is in every way a cop-out. It's how we throw in the towel, declaring to the world that we are not in a position to help. Even if it's correct, it's not productive. (This reminds me of another nugget of management wisdom I've heard, "Being right isn't good enough.") The productive approach is to understand the decisions of yesterday so that we can better advise on the decisions of today.

A significant reason why this advice resonates with me is because, during this "month of observation," a curious turning of the tables has occurred. While at this client, I've written some of the best code in my life. It's a small project, and one in which I was explicitly given freedom to demonstrate some technologies, patterns, and practices which could be applied elsewhere in the company's software.  It was both a solution to a simple problem as well as a proof of concept for solution patterns in general. And almost immediately after I completed development of this codebase the client hired a new Solution Architect who took one look at it and didn't like it.

It wasn't what he expected. Or, rather, it wasn't what he would have written. And because it didn't fit his mold, it was to him in every way a "legacy mess." How could this be? Patterns are documented, concerns are separated, test coverage is high, each individual fully-encapsulated component is elegant in its simplicity. (At least, I think so.) And it was everything he didn't want in an architecture.

Who is he to tell me that my baby is ugly? Well, who am I to tell any client that their baby is ugly?

What he hasn't been able to tell me is any meaningful reason why my code is "bad." If he could, I'd be happy to listen to his points. Any opportunity to learn and improve is welcome. But "I don't like it" isn't such an opportunity. So, following that same reasoning, what meaningful points can I provide the client about their code? Why don't I like it?

Those points can be productive and helpful, as long as they're backed by meaningful information. And that information starts with understanding the existing codebase and why each component is what it is. Despite years of experience, that understanding doesn't come from a cursory look at the code but rather from time spent "in the trenches" working, studying, observing. Doing so "for at least a month" sounds like as good a rule of thumb as any.

Monday, March 17, 2014

Dependency Injection Screencast

Here's a screencast of a presentation I've given a few times at local events, called "What Is Dependency Injection?" It's basically an intro-to-DI for an audience of developers (both new and experienced) who haven't been exposed to DI implementations but have been hearing about it, or perhaps developers who haven't used it but suddenly need to and want to know what it's all about.

Enjoy!

Friday, March 14, 2014

Refactoring Screencasts VI

Recently I've once again had the occasional opportunity to sit in a (mostly) quiet room and record some more screencasts, so here's the Dealing With Generalization series. Enjoy!

Pull Up Field

Pull Up Method

Pull Up Constructor Body

Push Down Method

Push Down Field

Extract Subclass

Extract Superclass

Extract Interface

Collapse Hierarchy

Form Template Method

Replace Inheritance With Delegation

Replace Delegation With Inheritance

Tuesday, March 11, 2014

On .on(), or Event Delegation Explained

Just as I'm quite sure there are straightforward concepts unfamiliar to me in technologies and architectures with which I have little or no experience, so too are there such concepts which I take for granted each day but which befuddle otherwise capable developers who simply haven't had exposure to them. (Of course, this one also befuddles developers who really should understand it by now, but I digress.) So perhaps I can "do the universe a solid" and attempt to share that which I have and, for the most part, have been oblivious to the fact that there are others who do not have it.

In this case, I'm speaking of jQuery's .on() function. We've all seen the recommendations to use that, though the more I see such recommendations on, say, Stack Overflow answers the more I discover that there's no shortage of developers who treat .on() as a magic spell. An incantation which, when invoked, solves their problem. Or at least should, and when it doesn't they are again confused.

Allow me to start by stating a couple of unshakable truths regarding this sorcery:
  1. .on() is not magic.
  2. Event handlers are attached to elements, not to selectors.
  3. When an event handler is attached to an element, the selector is evaluated once at that time and never again.
Some of that sounded pretty matter-of-fact as far as blanket statements go, and indeed perhaps I could wordsmith it better. But I wanted to sound that way, because I want to get your attention. You may point out that .on() uses event delegation which clearly means the selector is evaluated again at another time. However, if you know what you're talking about then you know that's kind of a misleading statement because you're referring to a different selector. And if you don't know what I meant by that statement then this article is for you.

First, let's consider the following code:

<button class="editButton">Edit</button>
<script type="text/javascript">
$('.editButton').click(function () {
  // handle the button click event
});
</script>

A very simple and straightforward jQuery event handler. When the button is clicked, the handler code runs. In the highly dynamic world of AJAX and dynamic DOM structures, however, this straightforward approach doesn't always work. A developer then asks, "Why doesn't my click event handler execute for edit buttons that are added to the page via AJAX?" A very reasonable novice question, often met with an equally novice answer... "Just use .on() instead."

This might lead the developer to try this approach:

<button class="editButton">Edit</button>
<script type="text/javascript">
$('.editButton').on('click', function () {
  // handle the button click event
});
</script>

Now the developer is using .on(), but he didn't solve the problem. The magic spell didn't work. So the question is re-asked and further clarification is provided, again in the form of an incantation, "Use 'document' and add a second selector." This exchange leads the developer here:

<button class="editButton">Edit</button>
<script type="text/javascript">
$(document).on('click', '.editButton', function () {
  // handle the button click event
});
</script>

This "works" in the strictest definition of the word. (That is, when comparing "works" with "doesn't work" as an equally vague description of events.) It solves the problem of giving a man a fish so that he may eat today. And in a world where quarterly earnings are valued above long-term sustainability that may be enough. But as developers we don't necessarily live in that world. We have to stomach it from time to time, but our world is about long-term sustainability. The code we write needs to be understandable by whoever has to support it, including ourselves. That is, the developer needs to understand why this incantation "works."

Let's switch gears for a moment and discuss the DOM and events therein. We're all familiar with the fact that when you click something, it raises a click event. (There are plenty of other events as well, but for the purpose of this discussion we'll just focus on the most common of them... click.) Some of us may also be familiar with the fact that events "bubble up" in the DOM, which is at the heart of how .on() performs its "magic." Consider an HTML structure:

<body>
  <div>
    <button class="editButton">Edit</button>
  </div>
</body>

When you click on the button, the button invokes its click event and any handlers for that event attached to that element are thus invoked. But then the event "bubbles up" the structure. After the button invokes it event, the div invokes its click event, invoking any handlers attached to the div. Then the body invokes its click event for handlers attached to it. The top-level html tag also then invokes its click event, and the document object at the very top of the DOM invokes its click event.

That's a lot of click events for a single click. So how does this relate to the "working" incantation above? In that code, look at the element to which we're actually binding the click event:

<button class="editButton">Edit</button>
<script type="text/javascript">
$(document).on('click', '.editButton', function () {
  // handle the button click event
});
</script>

We're binding it to the document. Let's assume for a moment that the HTML being dynamically changed via AJAX calls is the div and the button. This means we could also have bound to the body or the html (though I don't think I've ever seen that latter one in the wild) and it would still "work." This is because any element in the DOM, when raising this event, propagates the event upwards through the structure of the DOM. (Unless, of course, propagation is explicitly stopped.)

Indeed, every click anywhere in the DOM is going to invoke the click event handler(s) on the document object. This is where that other selector in .on() comes in. It's a filter for the event propagation, indicating to jQuery that this particular function should only be invoked when the originating element matches that selector. That selector is evaluated when the event is invoked, whereas the selector for assigning the event was evaluated only once when the page was loaded.

Had we done this, our event handler would execute for any click anywhere on the page:

<button class="editButton">Edit</button>
<script type="text/javascript">
$(document).on('click', function () {
  // handle the button click event
});
</script>

This is also why the developer's first attempt to use .on() "worked" in the sense that his initially loaded button(s) still invoked the handler, because it still attached that handler to the selected elements. But as new elements are added to the DOM, those elements have no handlers attached to them and so it "didn't work" for them.

So basically, the event originates at the clicked element (in this case a button) and travels upward through the HTML elements all the way to the top. That event can be "handled" anywhere along the way. The sorcery of .on() then is simply that it's handled further up the chain instead of at the element itself. This allows us to dynamically alter elements within the document without losing the handler, because even though newly added elements don't have their own handlers assigned they do still "bubble up" their events to parent elements.

It's common to attach these handlers to the document object, though any common parent element will "work." Let's say you have a containing div and within that div you add/remove elements. As long as the div element isn't removed then you can keep your event handler there. (For a large and complex page it's probably best not to put all handlers on the document object, if for no other reason than saner organization of code.) Any common parent over the dynamic elements will "work."

It's not magic, and it's important to understand which parts of the code are evaluated at what time. The selector which targets the element(s) on which we're assigning a handler is evaluated once and never again. The selector which filters the child element(s) which originated the event is evaluated each time the event occurs.

Monday, March 3, 2014

Persistence Ignorance with Entity Framework

I've never been shy about the fact that I don't like Entity Framework. It's not because I think it's a bad framework, it's just that for my code it didn't seem to fit the task. And after all, the tools should support the need. The reason for this was mainly because I didn't want a data access framework to pollute my otherwise dependency-free domain.

For a lot of applications this isn't a problem, but I'm not working on those applications. For the stuff I work on, keeping dependencies isolated is very valuable. Additionally, any introductory walk-through of Entity Framework usually involved one of two things:
  1. Generate the models from the database. I really, really don't like doing this. Again, for a lot of applications this is fine. But more often than not I'm working in a fairly complex domain and need to have some real business-logic control over the models. I often find myself insisting such truths as, "Not every table is a business entity" or "Relational structures and object-oriented structures don't always match perfectly." In the end, as an object-oriented developer I like to have control over my models. (Also, I hate EDMX files. A lot.)
  2. With the advent of "Code First," generate the database from the models. While I certainly support the idea of starting with the domain modeling, and while I think it's an interesting approach to generate the database structure, does any significant real-world project actually do this? Are you really going to explain to your IT manager that they don't need to worry about their database schema and the framework is just going to handle that for them? Good luck with that.
So, really, what I've always wanted the framework to be able to provide for me was just a mapping between the models I create and the database schema I create. (Understanding that different models created in different applications may also map (albeit differently) to those same database tables.) And while I always thought that Entity Framework could do this, it wasn't obvious and I never had a compelling reason to dig into it.

But times change, and if you're doing data access work on the Microsoft stack these days then you'd be remiss not to be following along with Entity Framework. So recently I had an opportunity to really dig into it on a fairly simple project and see if I could achieve real persistence ignorance in an application. And, much to my surprise, I did. And very easily at that. So let's take a look at what's involved here...

First let's start with the models. (I told you I really support that.) Note that what follows is, of course, simplified for this example. But it works just fine. Now, in this business domain we have a concept of an Occurrence, with a stripped-down model here:

public class Occurrence
{
    public int ID { get; private set; }
    public bool InterestingCase { get; set; }
    public string Comments { get; set; }
    public Status ReviewStatus { get; set; }

    public string FieldToIgnore { get; private set; }

    public virtual ICollection<ContributingFactor> ContributingFactors { get; private set; }

    protected Occurrence()
    {
        ContributingFactors = new List<ContributingFactor>();
    }

    public Occurrence(bool interestingCase, string comments)
        : this()
    {
        InterestingCase = interestingCase;
        Comments = comments;
        ReviewStatus = Status.New;
    }

    public enum Status
    {
        New,
        In_Progress,
        Finalized
    }
}

public class ContributingFactor
{
    public int ID { get; private set; }
    public string Factor { get; private set; }

    protected virtual Occurrence Occurrence { get; private set; }
    public static readonly Expression<Func<ContributingFactor, Occurrence>> OccurrenceProperty = c => c.Occurrence;
    public int OccurrenceID { get; private set; }

    protected ContributingFactor() { }

    public ContributingFactor(string factor)
        : this()
    {
        Factor = factor;
    }
}

Simple enough, and as I said it's been stripped-down of a lot of business logic. (For example, the real one has a bunch of logic in the setters for tracking historical changes to the model at the field level, as well as a custom implementation of ICollection which prevents modifying the collection without using other custom methods on the model, again to ensure tracking of changes to values.) Let's identify a couple of the interesting parts:
  • The ID setter is private. This is because identifiers are system-generated by the backing data store and consuming code should never need to set one.
  • There's a contrived FieldToIgnore in the model. This is present in the example solely to demonstrate how the Entity Framework mappings can be set to ignore fields. By default it tries to make sense of every field, and a complex model can have a lot of calculated or otherwise not-persisted properties.
  • The collection of child objects is virtual, this helps Entity Framework populate it. It's kind of an example of EF leaking into the domain, but there's no compelling reason in the domain not to make it virtual, so I'm fine with it.
  • On the child object there's a protected reference to the parent object. Normally this might be public, but since the child can't exist outside the context of the parent then there's no need to be able to navigate back to the parent. Indeed, being able to do so introduces an infinite recursion when serializing and would require additional workarounds.
  • Also on the child object is a reference to the ID of the parent object. This will become clear when we discuss our database structure. Essentially it's needed because it's part of the key in my table. (I tend to use the identity column and the parent foreign key column as the primary key when a child is explicitly identified by its parent.)
  • You're probably also noticing that Expression property. That's another example of the Entity Framework implementation kind of leaking into the domain, sort of. It's there because the mappings in the DAL are going to need to reference that Occurrence property in order to map the relational structure. But since it's protected, they can't. Honestly, I've since discovered that having property references like this can be very handy for objects, so it's quickly becoming a more common practice for me anyway.
  • The child object is intended to be something of an immutable value type. The parent is the aggregate root and is the real domain entity, the child is just a value that exists on the parent.
A simple enough domain for an example. Now, our persistence-ignorant domain is going to need a repository and, just for good measure, a unit of work:

public interface OccurrenceRepository
{
    IQueryable<Occurrence> Occurrences { get; }
    void Add(Occurrence model);
    void Remove(Occurrence model);
}

public interface UnitOfWork : IDisposable
{
    OccurrenceRepository OccurrenceRepository { get; }
    void Commit();
}

That looks persistence-ignorant enough for me. So consuming code might look something like this:

using (var uow = IoCContainerFactory.Current.GetInstance<UnitOfWork>())
{
    var occurrence = new Occurrence(true, "This is a test");
    occurrence.ContributingFactors.Add(new ContributingFactor("Test Factor"));
    uow.OccurrenceRepository.Add(occurrence);
    uow.Commit();
}

Now, in order to get that to work (glazing over the IoC implementation, which isn't relevant to this example, but just know that I'm using a simple home-grown service locator) we're going to need our DAL implementation. That's the project which will hold the reference to Entity Framework. But before we get to that, let's create our tables:

CREATE TABLE [dbo].[Occurrence] (
    [ID]              INT            IDENTITY (1, 1) NOT NULL,
    [Comments]        NVARCHAR (MAX) NULL,
    [InterestingCase] BIT            NOT NULL,
    [Status]          INT            NOT NULL,
    CONSTRAINT [PK_Occurrence] PRIMARY KEY CLUSTERED ([ID] ASC)
);

CREATE TABLE [dbo].[ContributingFactor] (
    [ID]           INT            IDENTITY (1, 1) NOT NULL,
    [OccurrenceID] INT            NOT NULL,
    [Factor]       NVARCHAR (250) NOT NULL,
    CONSTRAINT [PK_ContributingFactor] PRIMARY KEY CLUSTERED ([ID] ASC, [OccurrenceID] ASC),
    CONSTRAINT [FK_ContributingFactor_Occurrence] FOREIGN KEY ([OccurrenceID]) REFERENCES [dbo].[Occurrence] ([ID])
);

Again, as you can see, I'm using a composite key on the child table. Everything else is pretty straightforward. Best of all so far is that nothing has referenced Entity Framework. The domain, the database, the application... They're all entirely ignorant of the specific implementation of the DAL. So now let's move on to that DAL implementation.

Entity Framework has objects which are analogous to the repository and the unit of work, called DbSet and DbContext respectively. Let's start with the unit of work implementation:

public class UnitOfWorkImplementation : DbContext, UnitOfWork
{
    public DbSet<Occurrence> DBOccurrences { get; set; }

    private OccurrenceRepository _occurrenceRepository;
    public OccurrenceRepository OccurrenceRepository
    {
        get
        {
            if (_occurrenceRepository == null)
                _occurrenceRepository = new OccurrenceRepositoryImplementation(this);
            return _occurrenceRepository;
        }
    }

    static UnitOfWorkImplementation()
    {
        Database.SetInitializer<UnitOfWorkImplementation>(null);

        // A fix for a known issue with EF
        var instance = SqlProviderServices.Instance;
    }

    public UnitOfWorkImplementation() : base("Name=EFBlogPost") { }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Configurations.Add(new OccurrenceMap());
        modelBuilder.Configurations.Add(new ContributingFactorMap());
    }

    public void Commit()
    {
        SaveChanges();
    }
}

The interesting bits here are:
  • A reference to the DbSet which will correspond to the repository.
  • A late-bound repository property to implement the interface. Note that we're passing a reference of the unit of work itself to the constructor, we'll see why when we build the repository.
  • A static initializer to set the database's initializer. (Also present is a small fix for an issue I spent an hour or so researching online regarding the SqlProvider. Without this line of code that doesn't do anything, Entity Framework was failing to load the provider at runtime. With it, no problems.)
  • The constructor passes a hard-coded connection string name to the DbContext's constructor. Feel free to make this as dynamic as you'd like.
  • An override for OnModelCreating which specifies our mappings. We'll create those in a minute.
  • The Commit implementation for the interface, which just called SaveChanges on the DbContext.
So far so good, pretty simple and with minimal code. Now let's see the repository implementation:

public class OccurrenceRepositoryImplementation : OccurrenceRepository
{
    private UnitOfWorkImplementation _unitOfWork;

    public IQueryable<Occurrence> Occurrences
    {
        get { return _unitOfWork.DBOccurrences; }
    }

    public OccurrenceRepositoryImplementation(UnitOfWorkImplementation unitOfWork)
    {
        _unitOfWork = unitOfWork;
    }

    public void Add(Occurrence model)
    {
        _unitOfWork.DBOccurrences.Add(model);
    }

    public void Remove(Occurrence model)
    {
        _unitOfWork.DBOccurrences.Remove(model);
    }
}

Even simpler, with even less code. It's basically just a pass-through to the DbSet object referenced on the unit of work implementation. (Indeed, you can just pass the DbSet itself in the constructor instead of the whole unit of work, but this plays nicer with dependency injector graphs.) So far there hasn't been much ugliness from the framework at all. How about the mappings...

public class OccurrenceMap : EntityTypeConfiguration<Occurrence>
{
    public OccurrenceMap()
    {
        ToTable("Occurrence").HasKey(o => o.ID).Ignore(o => o.FieldToIgnore);
        Property(o => o.ID).IsRequired().HasColumnName("ID").HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
        Property(o => o.InterestingCase).IsRequired().HasColumnName("InterestingCase");
        Property(o => o.Comments).IsUnicode().IsOptional().HasColumnName("Comments");
        Property(o => o.ReviewStatus).IsRequired().HasColumnName("Status");
    }
}

public class ContributingFactorMap : EntityTypeConfiguration<ContributingFactor>
{
    public ContributingFactorMap()
    {
        ToTable("ContributingFactor").HasKey(c => new { c.ID, c.OccurrenceID });
        HasRequired(ContributingFactor.OccurrenceProperty).WithMany(o => o.ContributingFactors).HasForeignKey(c => c.OccurrenceID);
        Property(c => c.ID).IsRequired().HasColumnName("ID").HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
        Property(c => c.Factor).IsRequired().IsUnicode().HasColumnName("Factor");
    }
}

Still pretty simple and straightforward. Indeed, I've written more code than I really needed to here. A lot of this mapping logic is implicit in the framework, but making it explicit I think makes it more clear and at a cost of very little additional code. Again, the interesting bits:
  • In the parent object's map we set the field to ignore as an extension of defining the table and key.
  • We explicitly tell it when a column is an identity. For the parent object this actually isn't a problem, the framework figures this out. However for the child object it doesn't figure it out (because of the composite key) and needs to be explicitly specified. I just added the specification to the parent object's mapping as well for consistency.
  • The child object uses an anonymous type to define the composite key.
  • The child object requires that a parent object exists.
And, well, that's about it. The code runs as-is and consuming code can instantiate a unit of work, interact with it, commit it, and be done with it. Change tracking is all handled by the framework. Even setting those private fields, since the framework internally uses reflection to do all of this.

In fact, if you inspect your models at runtime you'll find that they're not actually your models, they're a dynamically-built type which inherit from your models (having used reflection to define the type). This is how the framework gets into your domain in this case, to perform entity tracking and all that good stuff. Which is great, because it means that I don't have to actually change my models to account for Entity Framework. The framework does that at runtime, leaving my design-time code unpolluted and re-usable.

All things considered, and I never thought I'd say this... I'm really liking Entity Framework at this point.

(If interested, the code from this post is available here.)