Friday, June 20, 2014

Fun with Expression Trees

A common pattern I find in my MVC/WebAPI code is that my controller actions usually open a unit of work (or just a read-only repository therein), perform some query or insert/update something, and then return. It's a simple thing to expect a controller action to do, after all. But sometimes I come across an entity model where it's a bit difficult to put much of the logic on the model, and instead it ends up in the controller actions.

For example, imagine a system where your models are a fairly complex graph of objects, versions of those objects, dynamic properties in the form of other child objects, versions therein as well, etc. It's a common pattern when one builds a framework in which to configure an application, as opposed to building just an application itself.

Now in this system, imagine that you want all of your queries against "Entity" models to always ever return just the ones which have not been soft-deleted. (Viewing soft-deleted records would be a special case, and one we haven't built yet.) Well, if you have a lot of queries against those entities in their repository, those queries are all going to repeat the same ".Where()" clause. Perhaps something like this:

return someRepository.Entities
                     .Where(e => e.Versions
                                  .OrderByDescending(v => v.VersionDate)
                                  .FirstOrDefault()
                                  .Deleted != true))
                     .Where(// etc.

That is, you only ever want Entity records where the most recent Version of that Entity is not in a "Deleted" state. It's not a lot of code (at least, not in this simplified example), but it is repeated code all over the place. And for more complex examples, it's a lot of repeated code. And more importantly than the repetition, it's logic which conceptually belongs on the model. A model should be aware of whether or not it's in a "Deleted" state. The controller shouldn't necessarily care about this, save for just invoking some logic that exists on the model.

At first one might simply add a property to the Entity model:

public bool IsDeleted
{
    get { return Versions.OrderByDescending(v => v.VersionDate)
                         .FirstOrDefault()
                         .Deleted != true; }
}

Then we might use it as:

return someRepository.Entities
                     .Where(e => !e.IsDeleted)
                     .Where(// etc.

That's all well and good from an object oriented perspective, but if you're using Entity Framework (and I imagine any number of other ORMs) then there's a problem. Is the ORM smart enough to translate "IsDeleted" to run it on the database? Or is it going to have to materialize every record first and then perform this ".Where()" clause in the code?  (Or just throw an error and not run the query at all?) Most likely the latter (definitely the latter with Entity Framework in this case), and that's no good.

We want as much query logic as possible to run on the database for a number of reasons:
  • It's less data moving across the wire.
  • It's a smaller memory footprint for the application.
  • SQL Server is probably a lot better at optimizing queries than any code you or I write in some random web application.
  • It's a lot easier and more standard to horizontally scale a SQL Server database than a custom application.
So we really don't want to materialize all of the records so that our object oriented models can perform their logic. But we do want the logic itself to be defined on those models because, well, object oriented. So what we need on the model isn't necessarily a property, what we need is an expression which can be used in a Linq query.

A first pass might look something like this:

public static Func<Entity, bool> IsNotDeleted = e => e.Versions
                                                      .OrderByDescending(v => v.VersionDate)
                                                      .FirstOrDefault()
                                                      .Deleted != true;

Which we can then use as:

return someRepository.Entities
                     .Where(Entity.IsNotDeleted)
                     .Where(// etc.

This is a good first step. However, if you profile the SQL database when this executes you'll find that the filtering logic still isn't being applied in the SQL query, but rather still in-memory in the application. This is because a "Func<>" doesn't get translated through Linq To Entities, and remains in Linq To Objects. In order to go all the way to the database, it needs to be an "Expression<>":

public static Expression<Func<Entity, bool>> IsNotDeleted = e => e.Versions
                                                                  .OrderByDescending(v => v.VersionDate)
                                                                  .FirstOrDefault()
                                                                  .Deleted != true;

Same code, just wrapped in a different type. Now when you profile the database you'll find much more complex SQL queries taking place. Which is good, because as I said earlier SQL Server is really good at efficiently handling queries. And the usage is still the same:

return someRepository.Entities
                     .Where(Entity.IsNotDeleted)
                     .Where(// etc.

Depending on how else you use it though, you'll find one key difference. The compiler wants to use it on an "IQueryable<>", not things like "IEnumerable<>" or "IList<>". So it's not a completely drop-in replacement for in-code logic. But with complex queries on large data sets it's an enormous improvement in query performance by offloading the querying part to the database engine.

There was just one last catch while I was implementing this. In some operations I want records which are "Deleted", and in some operations I want records which are not "Deleted". And obviously this doesn't work:

return someRepository.Entities
                     .Where(!Entity.IsNotDeleted)
                     .Where(// etc.

How should one invert the condition then? I could create a second expression property called "IsDeleted", but that's tacky. Not to mention it's still mostly repeated logic that would need to be updated in both places should there ever be a change. And honestly, even this "IsNotDeleted" bothers me from a Clean Code perspective because positive conditionals are more intuitive than negative conditionals. I should have an "IsDeleted" which can be negated. But how?

Thanks to some help from good old Stack Overflow, there's a simple way. And it all comes down to, again, expression trees. Essentially what's needed is an extension which wraps an expression in a logical inverse. This wrapping of the expression would continue through the expression tree until it's translated at the source (SQL Server in this case). Turns out to be a fairly simple extension:

public static Expression<Func<T, bool>> Not<T>(this Expression<Func<T, bool>> f)
{
    return Expression.Lambda<Func<T, bool>>(Expression.Not(f.Body), f.Parameters);
}

See, while there's no ".WhereNot()" or ".Not()" in our normal Linq extensions, there is one for Expressions. And now with this we can wrap our expression. First let's make it a positive condition:

public static Expression<Func<Entity, bool>> IsDeleted = e => e.Versions
                                                               .OrderByDescending(v => v.VersionDate)
                                                               .FirstOrDefault()
                                                               .Deleted == true;

Now let's get records which are deleted:

return someRepository.Entities
                     .Where(Entity.IsDeleted)
                     .Where(// etc.

And records which are not deleted:

return someRepository.Entities
                     .Where(Entity.IsDeleted.Not())
                     .Where(// etc.

Profile the database again and we see that all of the logic is still happening SQL-side. And for the inverted ones, the generated SQL query just wraps the whole condition and negates it exactly as we'd expect it to.

Now, we can still have our calculated properties on our models and we can still do a lot with those models in memory once they're materialized from the underlying data source. But in terms of just querying the data, where performance is a concern (which isn't always, admit it), having some expression trees on our models allows us to still encapsulate our logic a bit while making much more effective use of the ORM and database.

Wednesday, May 21, 2014

Continuous Integration with TFS and ClickOnce

My current project is pretty fast-paced, so we need some good infrastructure to keep mundane concerns out of our way. As an advocate of eliminating cruft in the development process, I naturally wanted to implement a fully-automated continuous integration setup with building, testing, and publishing of the applications involved. Everybody's done this plenty of times with web applications, but it turns out that it's not quite so common with databases and ClickOnce applications. Since all three are included in this project, this is as good a time as any to figure out how to unify them all.

First some infrastructure... We're using Visual Studio 2013, SQL Server (different versions, so standardizing on 2008R2 as a target), and TFS. (I'm actually not certain about the TFS version. It's definitely latest or close to it, but not really being a "TFS guy" I don't know the specifics. It just works for what we need, I know that much. Beyond that, the client owns the actual TFS server and various controllers and agents.)

The entire solution consists of:
  • A bunch of class libraries
  • A WebAPI/MVC web application
  • A WPF application
  • A bunch of test projects
  • A database project (schema, test data)
The goals for the build server are:
  • A continuous integration build which executes on every check-in. (We're actually using gated check-ins too. I don't like gated check-ins, but whatever.)
  • A test build which executes manually, basically any time we want to deliver something to QA.
And each build should:
  • Compile the code
  • Execute the tests
  • Deploy the database
  • Deploy the web application (with the correct config file)
  • Deploy the ClickOnce WPF application (with the correct config file)
Some of this is pretty much out-of-the-box, some of it very much is not. But with a little work, it's just about as simple as the out-of-the-box stuff. So let's take a look at each one...

Compile The Code

This one is as out-of-the-box as it gets. I won't go into the details of creating a build in TFS, there's no shortage of documentation and samples of that and it's pretty straightforward. One thing I did do for this process, however, was explicitly define build configurations in the solution and projects for this purpose. We're all familiar with the default Debug and Release configurations. I needed a little more granularity, and knew that the later steps would be a lot easier with distinct configurations, so I basically deleted the Release configuration from everything and added a CI and a Test configuration. For now all of their settings were directly copied from Debug.

I used the default built template, and set each respective build (CI and Test) to build the solution with its configuration (Any CPU|CI and Any CPU|Test). Simple.

Execute The Tests

Again, this one is built-in to the TFS builds. Just enable automated tests with the build configuration and let it find the test assemblies and execute them.

Here's where I hit my first snag. I saw this one coming, though. See, I'm a stickler for high test coverage. 100% ideally. (Jason, if you're reading this... let it go.) We're not at 100% for this project (yet), but we are pretty high. However, at this early stage in the project, a significant amount of code is in the Entity Framework code-first mapping files. How does one unit test those?

The simplest way I found was to give the test assembly an App.config with a valid connection string and, well, use the mappings. We're not testing persistence or anything, just mapping. So the simplest and most direct way to do that is just to open a unit of work (which is just a wrapper for the EF context), interact with some entities, and simply dispose of the unit of work without committing it. If valid entities are added to the sets and no exceptions are thrown, the mappings worked as expected. And code coverage analysis validates that the mappings were executed during the process.

However, this is technically an integration test in the sense that EF requires the database to exist. It does some inspection of that database for the initial mappings. That's kind of what we're testing, so we kind of need a database in place. Perhaps we could write some custom mock that pretends to be a SQL database, but that sounds overly-complicated. For the simplest approach, let's just see if we can deploy the database as part of the build. The upside is that this will validate the database schema as part of the build anyway, which is just more automated testing. And automated testing is a good thing.

So...

Deploy The Database

A lot has changed in SQL Server projects between Visual Studio 2010 and Visual Studio 2012/2013. The project itself doesn't respect build configurations like they used to. But there's a new mechanism which effectively replaces that. When you right-click on the database project to publish it, you can set all of your options. Then you can save those options in an XML file. So it seemed sensible to me to save them in the project, one for each build configuration. (From then on, publishing from within Visual Studio just involves double-clicking on the XML file for that publish configuration.)

These XML files contain connection strings, target database names, any SQL command variables you want to define, and various options for deployment such as overwriting data without warning or doing incremental deploys vs. drop-and-create deploys. Basically anything environment-specific about your database deployments.

Now we need the TFS builds to perform a database publish. Since TFS builds are basically a TFS workflow surrounding a call to MSBuild, adding MSBuild arguments in the build configuration seemed like a simple way to perform this with minimal effort. First I included the Publish target for the build:
/t:Build /t:Publish
Then I also needed to specify to deploy the database and which publish file to use:
/t:Build /t:Publish /p:DeployOnBuild=true /p:SqlPublishProfilePath=CI.publish.xml
I'm actually not entirely sure at this point if all of those are necessary, but it works well enough. We're targeting both Build and Deploy for the build actions and setting a couple of project properties:
  • DeployOnBuild - This is the one about which I'm not entirely certain. This setting doesn't exist in my project files in source control, but it seems to be the setting needed for the SQL Server project to get it to publish. (Or maybe it's used by one of the other projects by coincidence? This was all a bit jumbled together while figuring it out so that's certainly possible.)
  • SqlPublishProfilePath - This is a setting in the SQL Server project file to tell it which of those XML files to use for its publish settings.
This executes as part of the MSBuild step and successfully deploys to the target CI database (or fails if the code is bad, which is just as useful a result), which means that the updated database is in place and ready by the time the TFS workflow reaches the test projects. So when the unit (and integration) tests execute, the CI database is ready for inspection by Entity Framework. All I needed to do was add an App.config to that particular unit test project with the connection string for the CI database.

But wait... What happens when we want to run those tests locally? If we change the connection string in the App.config then we run the risk of checking in that change, which would break the CI build for no particular reason. (Side note: I loathe when developers say "I'll just keep that change locally and not check it in." They always check it in at some point and break everybody else. Keep your team's environments consistent, damn it.) And App.configs don't have transforms based on build configuration like Web.configs do. (Side note: What the crap, Microsoft? Seriously, why is this not a thing? Web.config transforms have been around for, like, a billion years.)

We're going to need to include the correct configs for the WPF application deployments in a later step, so let's add a step...

Use A Different App.config

There are various solutions to perform transforms on an App.config. I tried a few of them and didn't much care for any of them. The most promising one was this tool called SlowCheetah, which came highly recommended by somebody with lots of experience in these sort of things. But for reasons entirely unknown to me, I just couldn't get the damn thing to work. I'm sure I was missing a step, but it wasn't obvious so I continued to look for other solutions.

We can use post-build xcopy commands, but we really don't want to do that. And based on my experience with projects which use that option I guarantee it will cause problems later. But one component of this solution does make sense... Keeping the App.configs in separate files for each build configuration. It'll likely be easier than trying to shoehorn some transform into the process.

After much research and tinkering, I found a really simple and sensible solution. By manually editing the items in the project file I can conditionally include certain files in certain build configurations. So first I deleted the Item entries for the App.config and its alternates from the csproj file, then I added this:

<ItemGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
  <None Include="App.config" />
</ItemGroup>
<ItemGroup Condition=" '$(Configuration)|$(Platform)' == 'CI|AnyCPU' ">
  <None Include="Configs\CI\App.config" />
</ItemGroup>
<ItemGroup Condition=" '$(Configuration)|$(Platform)' == 'Test|AnyCPU' ">
  <None Include="Configs\Test\App.config" />
</ItemGroup>

Notice the alternate App.config files in their own directories. The great thing about this approach is that Visual Studio doesn't respect the Condition attributes and shows all of the files. Which is great, because as a developer I want to be able to easily edit these files within the project. But when MSBuild comes along, it does respect the Condition attributes and only includes the files for the particular build configuration being built.

So now we have App.configs being included properly for each build configuration. When running tests locally, developers' Visual Studios will use the default App.config in the test project. When building/deploying on the server, MSBuild will include the specific App.config for that build configuration, so tests pass in all environments without manual intervention. This will also come in handy later when publishing the ClickOnce WPF application.

Next we need to...

Deploy The Web Application

Web application deployments are pretty straightforward, and I've done them about a million times with MSBuild in the past. There are various ways to do it, and I think my favorite involves setting up MSDeploy on the target server. The server is client-owned though and I don't want to involve them with a lot of setup, nor do I want to install things there myself without telling them. So for now let's just stick with file system deployment and we can get more sophisticated later if we need to.

So to perform a file system deploy, I just create a network share for the set up IIS site and add some more MSBuild arguments:
/p:PublishProfile=CI /p:PublishUrl=\\servername\sharename\
The PublishUrl is, obviously, the target path on the file system. The PublishProfile is a new one to me, but it works roughly the same as the XML files for the database publishing. When publishing the web application from within Visual Studio the publish wizard saves profiles in the Properties folder in the project. These are simple XML files just as before, and all we need to do here is tell MSBuild which one to use. It includes the environment-specific settings you'd expect, such as the type of deploy (File System in this case) or whether to delete existing files first, etc. (Now that I'm looking at it again, it also includes PublishUrl, so I can probably update that in the XML files and omit it from the MSBuild arguments. This is a work in progress after all.)

At this point all we need to do is...

Deploy The ClickOnce WPF Application

This one was the least straightforward of them all, mainly for one particular reason. A ClickOnce application is mostly useful in that it detects new versions on the server and automatically upgrades the client when it executes. This detection is based on the version number of what's deployed, but how can we auto-increment that version from within the TFS build?

It auto-increments, well, automatically when you publish from within Visual Studio. And most people online seem content with that. But the whole point here is not to manually publish, but rather to have a continuously deployed bleeding edge version of the application which can be actively tested, as well as have as simple and automated a QA deployment as possible (queue a build and walk away, basically). So we need TFS and/or MSBuild to auto-increment the build number. And they weren't keen on doing that.

So first thing's first, let's get it publishing at all before worrying about the build number. Much like with the publish profiles for the web application, this involved walking through the wizard once in Visual Studio just to get the project settings in place. Once in place, we can examine what they are in the csproj file and set them accordingly in the MSBuild arguments:
/p:PublishDir=\\servername\anothersharename\ /p:InstallUrl=\\servername\ anothersharename\ /p:ProductName=HelloWorldCI
The PublishDir and InstallUrl are for ClickOnce to know where to put the manifest and application files from which clients will install the application. The ProductName is just any unique name by which the application is known to ClickOnce, which would end up being its name in the Start Menu and the installed programs on the computer. (At this time I'm actually not sure how to get multiple different versions to run side-by-side on a client workstation. I'm sure it involves setting some other unique environment-specific value in the project, I'm just not sure what.)

So now clients can install the ClickOnce application from the network share, it has the correct App.config (see earlier), and everything works. However, at this time it doesn't detect new versions unless we manually update the version number. And we don't like "manually" around here. Researching and tinkering to solve this led me down some deep, dark rabbit holes. Colleagues advised and assisted, much Googling and Stack Overflowing was done, etc. I was very close to defining custom Windows Workflow actions and installing them on the server as part of the build workflow to over-write the csproj file after running it through some regular expressions. This was getting drastically over-complicated for my tastes.

While pursuing this approach, the big question was where I would persist the incrementing number. It needed to exist somewhere because each build would need to know what the value is before it can increment it. And I really didn't like the idea of putting it in a database somewhere just to support this one thing. Nor did I like the idea of storing it in the csproj file or any other file under source control because that would result in inconsistencies as gated check-in builds are queued. Then it hit me...

We have an auto-incrementing number on TFS. The ChangeSet number.

Now, I've never really edited the build workflow templates before. (Well, technically that's not entirely true. I did make some edits to one while specifically following steps from somebody else's blog post in order to build a SharePoint project before. But it was more SharePoint than TFS else and I had no idea what I was actually doing. So I didn't retain much.) And as such I didn't really know what capabilities were in place or how to reference/assign values. But with a little tinkering and researching, I put together something really simple.

First, I added a step to the build workflow template just before the MSBuild step. It was a simple Assign workflow item, and the value it was changing was MSBuildArguments (which is globally available throughout the workflow). Logically it basically amounts to:

MSBuildArguments.Replace("$ChangeSet$", BuildDetail.SourceGetVersion.Replace("C", String.Empty))

That is, it looks for a custom placeholder in the arguments list called $ChangeSet$ and replaces it with the ChangeSet number, which is also globally-available in the workflow as SourceGetVersion on the BuildDetail object. This value itself needs to have its "C" replaced with nothing, since ChangeSet numbers are prepended with "C". Now that I have the "persisted auto-incrementing" number, I just need to apply it to the project settings. And we already know how to set values in the csproj files:
/p:ApplicationRevision=$ChangeSet$ /p:MinimumRequiredVersion=1.0.0.$ChangeSet$
And that's it. Now when the ClickOnce application is published, we update the current version number as well as the minimum required version to force clients to update. Somewhere down the road we'll likely need to update the first three digits in that MinimumRequiredVersion value, but I don't suspect that would be terribly difficult. For now, during early development, this works splendidly.

So at this point what we have is:

  • Explicit build configurations in the solution and projects
  • XML publish profiles for the database project and the web application project
  • Web.config transforms and App.config alternate files
  • Conditional items in the csproj for the App.config alternate files, based on build configuration
  • A workflow step to replace MSBuild argument placeholders with the ChangeSet number
  • A list of MSBuild arguments:
    • /t:Build /t:Publish /p:DeployOnBuild=true /p:PublishProfile=CI /p:SqlPublishProfilePath=CI.publish.xml /p:PublishUrl=\\servername\sharename\ /p:PublishDir=\\servername\someothersharename\ /p:InstallUrl=\\servername\someothersharename\ /p:ApplicationRevision=$ChangeSet$ /p:MinimumRequiredVersion=1.0.0.$ChangeSet$ /p:ProductName=HelloWorldCI
Replace "CI" with "Test" and we have the Test build. If we want to create more builds (UAT? Production?) all we need to do is:

  • Create the build configurations
  • Create the config files/transforms
  • Create the publish profiles
  • Set up the infrastructure (IIS site, network shares)
  • Create a new TFS build with all the same near-default settings and just replace "CI" in the MSBuild arguments with the new configuration
And that's it. The result is a fully-automated continuous integration and continuous deployment setup. Honestly, I've worked in so many environments where a build/deploy consisted of a long, involved, highly manual, and highly error-prone process. Developers and IT support were tied up for hours, sometimes days, trying to get it right. What I have here, with a few days of research and what boils down to an hour or two of repeatable effort, is a build/deploy process which involves:

  • Right-click on the build definition in TFS
  • Select "Queue New Build"
  • Go grab a sandwich and take a break
I love building setups like this. My team can now accomplish in two mouse clicks what other teams accomplish in dozens of man-hours.

Monday, April 28, 2014

Composition... And Coupling?

Last week I had an interesting exchange with a colleague. We were discussing how some views and view models are going to interact in a WPF application we’re building, and I was proposing an approach which involves composition of models within parent models. Apparently my colleague is vehemently opposed to this idea, though I’m really not certain why.

It’s no secret that the majority of my experience is as a web developer, and in ASP.NET MVC I use composite models all the time. That is, I may have a view which is a host of several other views and I bind that view to a model which is itself a composition of several other models. It doesn’t necessarily need to be a 1:1 ratio between the views and the models, but in most clean designs that ends up happening if for no other reason than both the views and the models represent some atomic or otherwise discreet and whole business concept.

The tooling has no problem with this. You pass the composite model to the view, then in the view where you include your “partial” views (which, again, are normal views from their own perspective) you supply to that partial view the model property which corresponds to that partial view’s expected model type. This works quite well and I think distributes functionality into easily re-usable components within the application.

My colleague, however, asserted that this is “tight coupling.” Perhaps there’s some aspect of the MVVM pattern with which I’m unaware? Some fundamental truth not spoken in the pattern itself but known to those who often use it? If there is, I sure hope somebody enlightens me on the subject. Or perhaps it has less to do with the pattern and more to do with the tooling used in constructing a WPF application? Again, please enlighten me if this is the case.

I just don’t see the tight coupling. Essentially we have a handful of models, let’s call them Widget, Component, and Thing. And each of these has a corresponding view for the purpose of editing the model. Now let’s say our UI involves a single large “page” for editing each model. Think of it like stepping through a wizard. In my mind, this would call for a parent view acting as a host for the three editor views. That parent view would take care of the “wizard” bits of the UX, moving from one panel to another in which the editor views reside. Naturally, then, this parent view would be bound to a parent view model which itself would consist of some properties for the wizard flow as well as properties for each type being edited. A Widget, a Component, and a Thing.

What is being tightly coupled to what in this case?

Is the parent view model coupled to the child view models? I wouldn’t think so. Sure, it has properties of the type of those view models. In that sense I suppose you could say it has a dependency on them. But if we were to avoid such a dependency then we wouldn’t be able to build objects in an object-oriented system at all, save for ones which only recursively have properties of their own type. (Which would be of very limited use.) If a Wizard shouldn’t have a property of type Widget then why would it be acceptable for it to have a property of type string? Or int? Those are more primitive types, but types nonetheless. Would we be tightly coupling the model to the string type by including such a property?

Certainly not, primarily because the object isn’t terribly concerned with the value of that string. Granted, it may require specific string values in order to exhibit specific behaviors or perform specific actions. But I would contend that if the object is provided with a string value which doesn’t meet this criteria it should still be able to handle the situation in some meaningful, observable, and of course testable way. Throw an ArgumentException for incorrect values, silently be unusable for certain actions, anything of that nature as the logic of the system demands. You can provide mock strings for testing, just like you can provide mock Widgets for testing. (Though, of course, you probably wouldn’t need a mocking framework for a string value.)

Conversely, are the child view models tightly coupled to the parent view model? Again, certainly not. The child view models in this case have no knowledge whatsoever of the parent view model. Each can be used independently with its corresponding view regardless of some wizard-style host. It’s by coincidence alone that the only place they’re used in this particular application (or, rather, this particular user flow or user experience) is in this wizard flow. But the components themselves are discreet and separate and can be tested as such. Given the simpler example of an object with a string property, I think we can agree that the string type itself doesn’t become coupled to that object.

So… Am I missing something? I very much contend that composition is not coupling. Indeed, composition is a fundamental aspect of object-oriented design in general. We wouldn’t be able to build rich object systems without it.

Monday, April 21, 2014

Say Fewer, Better Things

Last week, while beginning a new project with a new client, an interesting observation was made of me by the client. As is usual with a new project, the week was filled with meetings and discussions. And more than once the project sponsor explicitly said to me, "Feel free to jump in here as well." Not in a snarky way mind you, he just wanted to make sure I'm not waiting to speak and that my insights are brought to the group. At one point he said, "I take it you're the strong silent type, eh?"

Well, I like to think so.

In general it got me thinking, though. It's no secret that I'm very much an introvert, and that's okay. So for the most part I have a natural tendency to prefer not speaking over speaking. But the more I think about it, the more I realize that there's more to it than that.

As it turns out, in a social gathering I'm surprisingly, well, social. I'm happy to crack a joke or tell a story, as long as I don't become too much a center of attention. If I notice that happening, I start to lose my train of thought. In small groups though it's not a problem. In a work setting, however, I tend not to jump in so much. It's not that I'm waiting for my turn to speak, it's that I'm waiting for my turn to add something of value.

This is intentional. And I think it's a skill worth developing.

I've had my fair share of meetings with participants who just like to be the center of the meeting. For lack of a better description, they like to hear themselves talk. The presence of this phenomenon varies wildly depending on the client/project. (Luckily my current project is staffed entirely by professionals who are sharp and to the point, for which I humbly thank the powers that be.) But I explicitly make it a point to try not to be this person.

Understand that this isn't because I don't want to speak. This is because I do want to listen. I don't need (or even really want) to be the center of attention. I don't need to "take over" the meeting. My goal is to simply contribute value. And I find that I can more meaningfully contribute value through listening than through speaking.

I'll talk at some point. Oh, I will definitely talk. And believe me, I'm full of opinions. But in the scope of a productive group discussion are all of those opinions relevant? Not really. So I can "take that offline" in most cases. A lot of that, while potentially insightful and valuable, doesn't necessarily add value to the discussion at hand. So rather than take the value I already know and try to adjust the meeting/discussion/etc. to fit my value, I'd rather absorb the meeting/discussion/etc. and create new value which I don't already know which targets the topic at hand.

That is, rather than steer the meeting toward myself, I'd rather steer myself toward the meeting. And doing so involves more listening than talking. Sometimes a lot more.

In doing so, I avoid saying too much. Other people in the meeting can point out the obvious things, or can brainstorm and openly steer their trains of thought. What I'll do is follow along and observe, and when I have a point to make I'll make it. I find this maximizes the insightfulness and value of my points, even if they're few and far between. And that's a good thing. I'd rather be the guy who made one point which nobody else had thought of than the guy who made a lot of points which everybody else already knew. The latter may have been more the center of attention, but the former added more value.

Listen. Observe. Meticulously construct a mental model of what's being discussed. Examine that model. And when the room is stuck on a discussion, pull from that model a resolution to that discussion. After all, concluding a discussion with a meaningful resolution is a lot more valuable than having participated in that discussion with everybody else.

Thursday, April 10, 2014

Agile and Branching

I've recently interacted with an architect who made a rather puzzling claim in defense of his curious and extraordinarily inefficient source control branching strategy. He said:
"One of the core principles of agile is to have as many branches as possible."
I didn't have a reply to this statement right away. It took a while for the absurdity of it to really sink in. He may as well have claimed that a core principle of agile is that the oceans are made of chocolate. My lack of response would have been similar, and for the same reason.

First of all, before we discuss branching in general, let's dispense with the provable falsehoods of his statement. The "core principles" of agile are, after all, highly visible for all to see:
We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation 
  • Customer collaboration over contract negotiation 
  • Responding to change over following a plan 
That is, while there is value in the items on the right, we value the items on the left more.
Pretty succinct. So let's look at them one-by-one in this case:



Individuals and interactions over processes and tools

The intent here is pretty clear. This "core principle" is to favor the team members and how they interact, not to favor some particular tool or process. If a process which works well with one team doesn't work well with another team, that other team shouldn't adopt or adhere to that process. Put the team first, not the process or the tool.

Source control is a tool. Branching is a process. To favor such things despite a clear detriment to the team and to the business value is to explicitly work against the very first principle of agile. Indeed, not only does agile as a software development philosophy make no claim about tools and processes, it explicitly says not to do so.

Working software over comprehensive documentation

In this same environment I've often heard people ask that these processes and strategies at least be documented so that others can understand them. While they may put some balm on the wound in this company, it's not really a solution. If you document a fundamentally broken process, you haven't fixed anything.

The "core principle" in this case is to focus on delivering a working product. Part of doing that is to eliminate any barriers to that goal. If the developers can't make sense of the source control, that's a barrier.

Customer collaboration over contract negotiation

In this case the "customer" is the rest of the business. The "contract" is the requirements given to the development team for software that the business needs. This "negotiation" takes the form of any and all meetings in which the development team and the business team plan out the release strategy so that it fits all of the branching and merging that's going to take place.

That negotiation is a lie, told by the development team, and believed by the rest of the business. There is no need for all of this branching and merging other than to simply follow somebody's process or technical plan. It provides no value to the business.

Responding to change over following a plan

And those plans, so carefully negotiated above, become set in stone. Deviating from them causes significant effort to be put forth so that the tools and processes (source control and branching) can accommodate the changes to the plan.



So that's all well and good for the "core principles" of agile, but what about source control branching? Why is it such a bad thing?


The problem with branching isn't the branching per se, it's the merging. What happens when you have to merge?

  • New bugs appear for no reason
  • Code from the same files changed by multiple people has conflicts to be manually resolved
  • You often need to re-write something you already wrote and was already working
  • If the branch was separated for a long time, you and team members need to re-address code that was written a long time ago, duplicating effort that was already done
  • The list of problems goes on and on...
Merging is painful. But, you might say, if the developers are careful then it's a lot less painful. Well, sure. That may be coincidentally true. But how much can we rely on that? Taken to an extreme to demonstrate the folly of it, if the developers were "careful" then the software would never have bugs or faults in the first place, right?

Being careful isn't a solution. Being collaborative is a solution. Branches means working in isolated silos, not interacting with each other. If code is off in a branch for months at a time, it will then need to be re-integrated with other code. It already works, but now it needs to be made to work again. If we simply practice continuous integration, we can make it work once.

This is getting a bit too philosophical, so I'll step back for a moment. The point, after all, isn't any kind of debate over what have become industry buzz-words ("agile", "continuous integration", etc.) but rather the actual delivery of value to the business. That's why we're here in the first place. That's what we're doing. We don't necessarily write software for a living. We deliver business value for a living. Software is simply a tool we use to accomplish that.

So let's ask a fundamental question...
What business value is delivered by merging branched code?
The answer is simple. None. No business value at all. Unless the actual business model of the company is to take two pieces of code, merge them, and make money off of that then there is no value in the act of merging branches. You can have those meetings and make those lies all you like, but all you're doing is trying to justify a failing in your own design. (By the way, those meetings further detract from business value.)

Value comes in the form of features added to the system. Or bugs fixed in the system. Or performance improved in the system. And time spent merging code is time not spent delivering this value. It's overhead. Cruft. And the "core principles" of agile demand that cruft be eliminated.

The business as a whole isn't thinking about how to best write software, or how to follow any given software process. The business as a whole is thinking about how to improve their products and services and maximize profit. Tools and processes related to software development are entirely unimportant to the business. And those things should never be valued above the delivery of value to the business.