Friday, April 15, 2011

Why I think CouchDB is Awesome

Recently I have gotten a little practice at thinking about how awesome CouchDB is. I gave a presentation entirely shot from the hip on this unique technology and the topic its advantages came up recently on the project's main mailing list. I figured trying to summarize some of this would make a good blog post. So... here is the list of my stand outs:


Easy horizontal scale path:

It's HTTP. Guess what there are already a lot of and well understood? HTTP load balancers and caching systems. It has master-master replication built in. Your database logic is in a design document, which is just another document that can be replicated across nodes.
Vertical scaling:
With support for Android and iOS you can now take the same database and use the built in replication to take it up and down from mobile, desktop, and server. Awesome.
HTTP JSON:
Anything can talk to it. You don't have to have a middleware piece to talk to it for many web application scenarios. This can greatly simplify things.
Changes API:
You can have a lot (I mean a lot) of systems monitoring for changes and apply filters to get just what you want. 
Concurrent at it's core:
Thanks to Erlang and the ideas of MVCC you can have many connections to the database with no worries. I think I have read that some test cases found that you would run out of IO bandwidth and ports before CouchDB would stop responding. It might be slow (probably have time outs), but it's up and talking. Write speed is limited since it's serialized on a node, but in either case you could load balance that.


Now this wouldn't be completely fair I suppose if I didn't list some of the pain points:

Documentation is a bit spread out.
You really have to read the book and mix in the wiki and blogs.
Auth/Auth system is pretty primitive:
Sounds like there is some ideas around to improve this and it'll probably get you through most cases. If it doesn't get the job done of course you could just add a middle layer to manage these things. Using other database technologies you would probably have a middle layer anyway that managed auth/auth, so probably not that much of a pain point.
Learning Curve:
You pretty much have to study CouchDB for a little while before you grok it. There aren't a lot of features to make you feel at home when coming from a traditional database technology. Biggest one obviously is that any real querying is done via predefined map/reduce views. These are computed incrementally and have no side effects. It's hard to get a decent mental model of using documents and map/reduce.

There are probably more for both the pro and the con list but that sums up what I tried to hit recently.

Monday, April 4, 2011

Source Control and Inverting the Motivation

Sorry I haven't written in a while.  I've been starting a new job (yes, again) and, you know, a transition sort of consumes a lot of time and effort.  Also, this one involves relocation.  But so far I've been unquestionably happy with the results.

Anyway, I've been working at a client site for a few weeks now as we define and document a project.  One of the subjects that needed to be addressed is one of source control.  They were previously hosting somewhere off-site, and they're looking to us for recommendations on what kind of setup they'll want as they move forward.  Not a problem for us, of course.  We specialize in helping clients.  So, for the purpose of this client, it's pretty cut and dry.

But the experience as a whole got me thinking a lot more about source control.  Sure, we all have our favorite products that we know inside and out (for the most part, anyway) and it often just boils down to Yet Another Religious War among geeks who all think their preferred product is better.  That's our culture, it's how we think.  But, thinking about it, there's a key aspect of source control that I feel is all too often overlooked.  The component of developer motivation.

Let me explain...

"Source control" is more than just a repository for code.  It's more than just a repository which tracks in minute detail any changes made to code.  The name itself implies a greater purpose.  "Control."  For it to be successful or useful by any measure, it needs to be a controlled point of access to code changes.  Not just track them, but insist on tracking them.  To this end, source control needs to motivate developers to use it.  It's something we've all touched upon before.  A source control system should facilitate what we're doing, not get in our way.  It should never create a barrier to what we're doing.  It should never motivate us not to use it.

Consider an example...
  • System 1 has a source control repository which holds the code.  There isn't really anything closely resembling continuous integration, nor is there even an automated deployment process.  Deployment to any environment, be it test or production, is done by a developer.  He compiles the code locally, sets up the configs locally, and copies the output to the server.
  • System 2 has a source control repository which holds the code.  Upon check-in (at least to the main repository), the continuous integration system compiles the code and runs its automated tests.  Once everything passes, it automatically deploys to a target environment.  Maybe not the official QA environment, maybe it's just an internal shared Dev environment.  But it's deployed somewhere and is immediately live.
Now, given these two systems, which one has source "control"?  Well, think of it from the point of view of a developer.  If you wanted to make a change and see the result of that change, each system has a shortest path to make that happen.  And any manager or professional of any kind will tell you that the shortest path is the path you should always expect people to take.

In System 1, the shortest path is to make the change, compile it, and copy it to the server.  In System 2, the shortest path is to make the change and check it in to source control.  The difference is clear.  System 2 motivates the developer to use the source control system.  System 1 inverts the motivation.  Such a setup actually provides incentive for the developer not to use source control.

Sure, the developer knows the importance of source control.  He wants to use it.  But you know how it goes... He's just going to skip it this one time because he just needs to quickly see the result of this change in the test environment.  He'll update the source control later.  Really, he will.  You would, right?

Both of these systems host the source code in a central repository controlled by the business.  Both of them provide tracking to any code changes made in the source control system.  But only one of them controls the source.  Only one of them acts as a gatekeeper.  Yes, in System 2 a developer can still circumvent the source control.  But it's a hassle.  And his changes will be over-written on the next check-in.  So why would he?  System 1 sits outside the path of development.  It's an extra step.  System 2 is the path of development.

And it happens more often than we think.  Ask yourself... How do you track changes to your database schema/procedures/views/etc.?  That's source code too, after all.  Those are a major part of the overall system and their changes absolutely must be tracked.  But even in System 2 above, how are database changes tracked?

I've seen plenty of System 2's out there which treat database changes exactly like System 1.  (Hell, I've even helped design and painstakingly maintained such a setup.  So I know about the motivation.)  It's a question I ask when I encounter a new setup at a new job or a new client site:
  • Me: "How do you track your database changes?"
  • Them: "Oh, just like our code.  We keep it in the source control repository."
  • Me: "How are the deployments handled?"
  • Them: "Simple, we run the scripts against the database."
  • Me: "Manually?"
  • Them: "Ya, why not?  Developers check in their change scripts and then run them at deployment time."
  • Me: "Doesn't that invert the motivation?"
  • Them: "Doesn't that do who with the what now?"
  • Me: "Well, the physical act of recording the change in source control is itself the extra step.  There's no motivation to actually do it.  The target database doesn't know the difference.  It's source, but it's not controlled."
It's not enough to just keep a repository of your source code.  It's not enough to track changes to that repository in minute detail.  It's not enough until you've created an environment where the shortest path from developer to server is through source control.  Not as an extra step, not as a to-do item on a deployment checklist pinned up next to the developer's desk, not as something off to the side which needs to be remembered.  That's just a source repository.  Source control needs to be in the path.  Not in the way, mind you, because that just provides the developer with motivation to go around it.  It needs to be part of the path of least resistance.