Monday, March 3, 2014

Persistence Ignorance with Entity Framework

I've never been shy about the fact that I don't like Entity Framework. It's not because I think it's a bad framework, it's just that for my code it didn't seem to fit the task. And after all, the tools should support the need. The reason for this was mainly because I didn't want a data access framework to pollute my otherwise dependency-free domain.

For a lot of applications this isn't a problem, but I'm not working on those applications. For the stuff I work on, keeping dependencies isolated is very valuable. Additionally, any introductory walk-through of Entity Framework usually involved one of two things:
  1. Generate the models from the database. I really, really don't like doing this. Again, for a lot of applications this is fine. But more often than not I'm working in a fairly complex domain and need to have some real business-logic control over the models. I often find myself insisting such truths as, "Not every table is a business entity" or "Relational structures and object-oriented structures don't always match perfectly." In the end, as an object-oriented developer I like to have control over my models. (Also, I hate EDMX files. A lot.)
  2. With the advent of "Code First," generate the database from the models. While I certainly support the idea of starting with the domain modeling, and while I think it's an interesting approach to generate the database structure, does any significant real-world project actually do this? Are you really going to explain to your IT manager that they don't need to worry about their database schema and the framework is just going to handle that for them? Good luck with that.
So, really, what I've always wanted the framework to be able to provide for me was just a mapping between the models I create and the database schema I create. (Understanding that different models created in different applications may also map (albeit differently) to those same database tables.) And while I always thought that Entity Framework could do this, it wasn't obvious and I never had a compelling reason to dig into it.

But times change, and if you're doing data access work on the Microsoft stack these days then you'd be remiss not to be following along with Entity Framework. So recently I had an opportunity to really dig into it on a fairly simple project and see if I could achieve real persistence ignorance in an application. And, much to my surprise, I did. And very easily at that. So let's take a look at what's involved here...

First let's start with the models. (I told you I really support that.) Note that what follows is, of course, simplified for this example. But it works just fine. Now, in this business domain we have a concept of an Occurrence, with a stripped-down model here:

public class Occurrence
{
    public int ID { get; private set; }
    public bool InterestingCase { get; set; }
    public string Comments { get; set; }
    public Status ReviewStatus { get; set; }

    public string FieldToIgnore { get; private set; }

    public virtual ICollection<ContributingFactor> ContributingFactors { get; private set; }

    protected Occurrence()
    {
        ContributingFactors = new List<ContributingFactor>();
    }

    public Occurrence(bool interestingCase, string comments)
        : this()
    {
        InterestingCase = interestingCase;
        Comments = comments;
        ReviewStatus = Status.New;
    }

    public enum Status
    {
        New,
        In_Progress,
        Finalized
    }
}

public class ContributingFactor
{
    public int ID { get; private set; }
    public string Factor { get; private set; }

    protected virtual Occurrence Occurrence { get; private set; }
    public static readonly Expression<Func<ContributingFactor, Occurrence>> OccurrenceProperty = c => c.Occurrence;
    public int OccurrenceID { get; private set; }

    protected ContributingFactor() { }

    public ContributingFactor(string factor)
        : this()
    {
        Factor = factor;
    }
}

Simple enough, and as I said it's been stripped-down of a lot of business logic. (For example, the real one has a bunch of logic in the setters for tracking historical changes to the model at the field level, as well as a custom implementation of ICollection which prevents modifying the collection without using other custom methods on the model, again to ensure tracking of changes to values.) Let's identify a couple of the interesting parts:
  • The ID setter is private. This is because identifiers are system-generated by the backing data store and consuming code should never need to set one.
  • There's a contrived FieldToIgnore in the model. This is present in the example solely to demonstrate how the Entity Framework mappings can be set to ignore fields. By default it tries to make sense of every field, and a complex model can have a lot of calculated or otherwise not-persisted properties.
  • The collection of child objects is virtual, this helps Entity Framework populate it. It's kind of an example of EF leaking into the domain, but there's no compelling reason in the domain not to make it virtual, so I'm fine with it.
  • On the child object there's a protected reference to the parent object. Normally this might be public, but since the child can't exist outside the context of the parent then there's no need to be able to navigate back to the parent. Indeed, being able to do so introduces an infinite recursion when serializing and would require additional workarounds.
  • Also on the child object is a reference to the ID of the parent object. This will become clear when we discuss our database structure. Essentially it's needed because it's part of the key in my table. (I tend to use the identity column and the parent foreign key column as the primary key when a child is explicitly identified by its parent.)
  • You're probably also noticing that Expression property. That's another example of the Entity Framework implementation kind of leaking into the domain, sort of. It's there because the mappings in the DAL are going to need to reference that Occurrence property in order to map the relational structure. But since it's protected, they can't. Honestly, I've since discovered that having property references like this can be very handy for objects, so it's quickly becoming a more common practice for me anyway.
  • The child object is intended to be something of an immutable value type. The parent is the aggregate root and is the real domain entity, the child is just a value that exists on the parent.
A simple enough domain for an example. Now, our persistence-ignorant domain is going to need a repository and, just for good measure, a unit of work:

public interface OccurrenceRepository
{
    IQueryable<Occurrence> Occurrences { get; }
    void Add(Occurrence model);
    void Remove(Occurrence model);
}

public interface UnitOfWork : IDisposable
{
    OccurrenceRepository OccurrenceRepository { get; }
    void Commit();
}

That looks persistence-ignorant enough for me. So consuming code might look something like this:

using (var uow = IoCContainerFactory.Current.GetInstance<UnitOfWork>())
{
    var occurrence = new Occurrence(true, "This is a test");
    occurrence.ContributingFactors.Add(new ContributingFactor("Test Factor"));
    uow.OccurrenceRepository.Add(occurrence);
    uow.Commit();
}

Now, in order to get that to work (glazing over the IoC implementation, which isn't relevant to this example, but just know that I'm using a simple home-grown service locator) we're going to need our DAL implementation. That's the project which will hold the reference to Entity Framework. But before we get to that, let's create our tables:

CREATE TABLE [dbo].[Occurrence] (
    [ID]              INT            IDENTITY (1, 1) NOT NULL,
    [Comments]        NVARCHAR (MAX) NULL,
    [InterestingCase] BIT            NOT NULL,
    [Status]          INT            NOT NULL,
    CONSTRAINT [PK_Occurrence] PRIMARY KEY CLUSTERED ([ID] ASC)
);

CREATE TABLE [dbo].[ContributingFactor] (
    [ID]           INT            IDENTITY (1, 1) NOT NULL,
    [OccurrenceID] INT            NOT NULL,
    [Factor]       NVARCHAR (250) NOT NULL,
    CONSTRAINT [PK_ContributingFactor] PRIMARY KEY CLUSTERED ([ID] ASC, [OccurrenceID] ASC),
    CONSTRAINT [FK_ContributingFactor_Occurrence] FOREIGN KEY ([OccurrenceID]) REFERENCES [dbo].[Occurrence] ([ID])
);

Again, as you can see, I'm using a composite key on the child table. Everything else is pretty straightforward. Best of all so far is that nothing has referenced Entity Framework. The domain, the database, the application... They're all entirely ignorant of the specific implementation of the DAL. So now let's move on to that DAL implementation.

Entity Framework has objects which are analogous to the repository and the unit of work, called DbSet and DbContext respectively. Let's start with the unit of work implementation:

public class UnitOfWorkImplementation : DbContext, UnitOfWork
{
    public DbSet<Occurrence> DBOccurrences { get; set; }

    private OccurrenceRepository _occurrenceRepository;
    public OccurrenceRepository OccurrenceRepository
    {
        get
        {
            if (_occurrenceRepository == null)
                _occurrenceRepository = new OccurrenceRepositoryImplementation(this);
            return _occurrenceRepository;
        }
    }

    static UnitOfWorkImplementation()
    {
        Database.SetInitializer<UnitOfWorkImplementation>(null);

        // A fix for a known issue with EF
        var instance = SqlProviderServices.Instance;
    }

    public UnitOfWorkImplementation() : base("Name=EFBlogPost") { }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Configurations.Add(new OccurrenceMap());
        modelBuilder.Configurations.Add(new ContributingFactorMap());
    }

    public void Commit()
    {
        SaveChanges();
    }
}

The interesting bits here are:
  • A reference to the DbSet which will correspond to the repository.
  • A late-bound repository property to implement the interface. Note that we're passing a reference of the unit of work itself to the constructor, we'll see why when we build the repository.
  • A static initializer to set the database's initializer. (Also present is a small fix for an issue I spent an hour or so researching online regarding the SqlProvider. Without this line of code that doesn't do anything, Entity Framework was failing to load the provider at runtime. With it, no problems.)
  • The constructor passes a hard-coded connection string name to the DbContext's constructor. Feel free to make this as dynamic as you'd like.
  • An override for OnModelCreating which specifies our mappings. We'll create those in a minute.
  • The Commit implementation for the interface, which just called SaveChanges on the DbContext.
So far so good, pretty simple and with minimal code. Now let's see the repository implementation:

public class OccurrenceRepositoryImplementation : OccurrenceRepository
{
    private UnitOfWorkImplementation _unitOfWork;

    public IQueryable<Occurrence> Occurrences
    {
        get { return _unitOfWork.DBOccurrences; }
    }

    public OccurrenceRepositoryImplementation(UnitOfWorkImplementation unitOfWork)
    {
        _unitOfWork = unitOfWork;
    }

    public void Add(Occurrence model)
    {
        _unitOfWork.DBOccurrences.Add(model);
    }

    public void Remove(Occurrence model)
    {
        _unitOfWork.DBOccurrences.Remove(model);
    }
}

Even simpler, with even less code. It's basically just a pass-through to the DbSet object referenced on the unit of work implementation. (Indeed, you can just pass the DbSet itself in the constructor instead of the whole unit of work, but this plays nicer with dependency injector graphs.) So far there hasn't been much ugliness from the framework at all. How about the mappings...

public class OccurrenceMap : EntityTypeConfiguration<Occurrence>
{
    public OccurrenceMap()
    {
        ToTable("Occurrence").HasKey(o => o.ID).Ignore(o => o.FieldToIgnore);
        Property(o => o.ID).IsRequired().HasColumnName("ID").HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
        Property(o => o.InterestingCase).IsRequired().HasColumnName("InterestingCase");
        Property(o => o.Comments).IsUnicode().IsOptional().HasColumnName("Comments");
        Property(o => o.ReviewStatus).IsRequired().HasColumnName("Status");
    }
}

public class ContributingFactorMap : EntityTypeConfiguration<ContributingFactor>
{
    public ContributingFactorMap()
    {
        ToTable("ContributingFactor").HasKey(c => new { c.ID, c.OccurrenceID });
        HasRequired(ContributingFactor.OccurrenceProperty).WithMany(o => o.ContributingFactors).HasForeignKey(c => c.OccurrenceID);
        Property(c => c.ID).IsRequired().HasColumnName("ID").HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
        Property(c => c.Factor).IsRequired().IsUnicode().HasColumnName("Factor");
    }
}

Still pretty simple and straightforward. Indeed, I've written more code than I really needed to here. A lot of this mapping logic is implicit in the framework, but making it explicit I think makes it more clear and at a cost of very little additional code. Again, the interesting bits:
  • In the parent object's map we set the field to ignore as an extension of defining the table and key.
  • We explicitly tell it when a column is an identity. For the parent object this actually isn't a problem, the framework figures this out. However for the child object it doesn't figure it out (because of the composite key) and needs to be explicitly specified. I just added the specification to the parent object's mapping as well for consistency.
  • The child object uses an anonymous type to define the composite key.
  • The child object requires that a parent object exists.
And, well, that's about it. The code runs as-is and consuming code can instantiate a unit of work, interact with it, commit it, and be done with it. Change tracking is all handled by the framework. Even setting those private fields, since the framework internally uses reflection to do all of this.

In fact, if you inspect your models at runtime you'll find that they're not actually your models, they're a dynamically-built type which inherit from your models (having used reflection to define the type). This is how the framework gets into your domain in this case, to perform entity tracking and all that good stuff. Which is great, because it means that I don't have to actually change my models to account for Entity Framework. The framework does that at runtime, leaving my design-time code unpolluted and re-usable.

All things considered, and I never thought I'd say this... I'm really liking Entity Framework at this point.

(If interested, the code from this post is available here.)

No comments:

Post a Comment