Simple Activity Logging in Ruby

Activity Logs are one of the most requested (and consequently, most under-used) features in an application. Managers love the idea of having a full history of everyone’s activity, even though their startup will probably pivot before they have a chance to look at these logs. Jokes apart though, the data in the activity logs are tremendous sources of business value which can help you iterate ever closer to your users requirements. So, it kind of helps to have handy a nice pattern to simplify the creation of activity logs.

The whole problem with activity logging is that it straddles the Controller and Model layers. The activity is being performed on a model, but the model has no knowledge of who is performing the activity. This knowledge is the preserve of the Controller. There have been many approaches to solve this problem. Some of them include setting global environment variables (ouch! not thread safe), using cattr_accessor and hacking around with threads and so on, or perhaps this approach recommended in http://rails-bestpractices.com/posts/47-fetch-current-user-in-models which includes thread local variables.

The other layer of complexity of course arises from the question of how to actually do the logging. Who should do it? When should it be done? Traditionally, approaches have centered around before/after callbacks in the model i.e Doing a before_save :get_attributes to get the original state of the object and then a after_save :log to write the activity log. We all know what the problems are with before and after hooks - they add behaviour to the model which is orthogonal to the models main concern which is persistence. Also, such hooks need to stubbed out while testing, etc. Also, adding hooks means we do not have any control over when stuff gets logged and when it doesn’t. If tomorrow we want to create a secret backdoor for the NSA to silently change data behind people’s backs, we can’t do that without changing the model, which means that a concern with logging is causing the class that deals with persistence to change - Holy SRP Violation, Batman!

If you’ve been in this situation, then I can say, fear no more - there’s a very simple way of solving this problem. Here goes - consider the following situation

We have Users, who can have Foos and also there are Staffs who do data entry as well and can modify Foos that belong to other users. In our ActivityLog, we want to persist the following stuff

Now, here’s the absurdly simple solution. Are you ready? Ok. In your controller, instead of calling @foo.save, say @foo.save_with_log(current_user). Got it? Ok, I’ll say it again.

Simple? We need the model to know about a user? Pass it in as a parameter!

Now, what about the model side? As much as we hate mixins as a form of inheritance/delegation, logging is one of those cases where mixins are a good approach (I’m willing to be convinced to the contrary though). This is behaviour that is shared across all classes that need logging and it actually has nothing to do with the behaviour of the class itself, so we’ll make a nice little module called ActivityLogger which looks like this. Please note, this is pseudo code (it hasn’t been tested)

and we update our model thus

The log_with :foo_activity_log is important. What we’ve done in the save_with_log and update_with_log methods that we included from ActivityLogger, is to inject a dependency on the appropriate class that will do the actual logging for us, in this case FooActivityLog. In order to save (or update) and log, we need the objects attributes, a handle to the user/staff who is performing the action and alongside these, we can pass any other data in a hash which the actual logger can use as it deems fit. To achieve all this, what the ActivityLog module does is to translate a call to @foo.save_with_log(staff) into a call to FooActivityLog.save_with_log(@foo, staff, options). So, what does the actual logger look like? Something like this

Several advantages to this approach which accrue from favouring the explicit over the implicit.

  • We have two implicit methods now, save_with_log and update_with_log, so we don’t need to mess with before and after callbacks.
  • The actual activity log model inherits from ActivityLog, so we can override behaviour as we wish. For example, while saving nested models we might want to call save_with_log on the nested model as well. For such situations, we can easily implement methods as we deem fit in the appropriate child class of ActivityLog
  • We are injecting controller variables explicitly, such as the current user or staff member performing the action. This saves us from having to mess with thread local variables. It also makes it super easy to test this functionality.

So, in short, being explicit about all our dependencies gives us a tonne of advantages over other approaches that depend on magic or callbacks.

What do you think?

Breaking up Fat Models With Delegation

Hope everyone had a relaxing Christmas with loads of great food, booze and gossip. I know I did!

So not being a classically trained CS guy, I have no idea what the name for this pattern is, but I find myself using it more and more to break up fat models. It basically involves a lot of explicit delegation in the class, but no decorator classes. I’m still wondering whether this pattern is a “good thing” or not, so I’d appreciate comments.

Use case - your model has been collecting a lot of crufty methods that don’t have anything, really, to do with the logic proper of the model. Classic examples are methods like full_name, which basically only deal with presentation or some orthagonal logic like currency conversion and so on. There have been several approaches to fixing the presentation issue, the most notable of which has been Draper, but I didn’t enjoy using Draper. For one, I don’t want to call UserDecorator.decorate(User.get(params[:id])). Why? I just don’t ok?! Just kidding - actually in my experience, decorating large arrays (think CSV dump for the last months data) takes f.o.r.e.v.e.r and damned if I didn’t get some subtle bugs with DataMapper associations on the decorated class. I didn’t dig too deep, being a shallow and easily influenced guy, and instead started looking for other solutions. I present mine below

So there are several funky things about this approach.

  • First of all, it is totally explicit. There is absolutely zero magic going on here.
  • Secondly, because of the injected dependency, we can choose the class we would like to use to present our object at runtime.
  • Thirdly, it saves us from having to explicitly decorate our objects.
  • Fourthly, the ViewDelegate object gets access to all the original objects methods using method_missing. This means that the implicit self in the ViewDelegate class is the original object for all practical purposes. Thus, any object that responds to the required methods can use this Delegate, not just a User.
  • Fifthly, the delegate objects are not instantiated until one of the delegated methods are called. This might have performance implications. Of course, the ViewDelegate can be memoized as well.
  • Lastly, this can be used for any type of delegation, not just for presentation. For example, one might choose to delegate a height method to a MetricUnitDelegate or to an ImperialUnitDelegate, depending on the context.

I have no idea if this is a good or even an original approach. Would love to hear from you in the comments.

Param Objects in Rails

In Ruby, everything is an object and this unembarrassed object-orientation gives Ruby much of its power and expressiveness. Being able to call methods on an Integer and override the behaviour of strings are just two of the awesome things that one can do due to this design decision in Ruby.

In Rails, however, sadly, there are large swathes which are not object oriented, and in my opinion, these areas tend to be the most painful parts of Rails. A case in point, and the one we discuss in this blog posts, is the params hash.

The params hash contains all the input sent in by the user to your application. It is, in effect, the way your user communicates with your API. It is the job of the controller to respond to these requests and for that it must do a number of things with the params. It must, first of all, check whether the user is allowed to send this request. For example, is the user trying to maliciously update some forbidden attribute on the object? Perhaps a deleted_at field or something similar? When it comes to functionality like search, we can also use Ruby’s metaprogramming abilities to DRY up our views and controllers by allowing the user to send in a param like scope=to_call and dynamically call the scope on the model. But what if the user send scope=delete? We don’t want to Quotation.send("delete") now, do we?

Also, many a times, providing a natural looking API to the user means we have to massage a lot of the parameters before we can send them on to our business logic. The API between the user and the controller may be very different than the API between the controller and the models and so a fair amount of massaging sometimes needs to be done. Currently, this is all being done in controllers. So, for example, one often sees things like sanitize_params or worse, code like this

Note the many conditionals as we try and protect ourselves from malicious code and try to massage the params into something that makes sense to our models.

Consider the following use case. You have a list of Quotations, they all have a particular status and in your model, you’ve gone and defined some nice scopes. You want to make these scopes available to the user by passing in a scope=foo parameter. Additionally, you have several search terms you can pass in and there are certain fields that are forbidden to be set in the params. All of this becomes sooooo much easier if you just objectify the params hash. As a bonus, your controllers become even more thin and your param validations and what not become ridiculously easy to test.

I believe the proper term for this is Primitive Obsession, eloquently discussed here by our very own Piotr Solnica. I agree with him, based largely on my own experience with this particular smell hanging around the params hash.

My solution is to turn the params hash into a class of its own. Infact, I make one class for every different controller action where I need it. For example, here’s the QuotationsIndexParams.rb file

So easy, simple and clear. If you don’t already know Virtus, check it out. It’s going to be in the new DataMapper2.

And now that we have full control over our params, our controller can becomes simple as well.

How easy is that?

In case you want to be more permissive about the params you accept, you can use an OpenStruct instead of using Virtus. This way, you do not need to know beforehand the attributes you’ll be setting.

In general, it seems to emerge that there is one layer missing in Rails. Something between the Controller and everything else. Between the Controller and the Models, we often have to put Service Objects to keep our controllers free of logic. Between the Controller and the Views, there have been several attempts at Presenters or Decorators like Draper and so on. Jim Gay in his book Clean Ruby has proposed using Contexts or UseCases to simplify our applications.

I am slowly beginning to think the perhaps MVC is only half the picture, and objectifying params is just one step in a multi-front war on spaghetti code in our Rails apps. While the concepts presented in this post are quite simple to implement, I’m wondering if perhaps a nice params objectifying gem might be a good idea so we don’t have to roll custom solutions each time. I have a great name for it too - parampara!

Decouple your APIs from their implementation

A common pattern seen across Rails applications is the following update action in controllers

Convention over configuration is a very powerful way of getting developers up the curve quickly. However, it can lead to a certain amount of automatic programming which obscures the possibility of creating beautiful APIs with our standard RESTful actions. In the above case, years go by before Rails programmers even begin to notice that the tight coupling between the route, the controller action and the model is just an illusion. Infact, resource != controller + model!

Be RESTful

A resource is a completely different layer of abstraction than the controller or the model. Controllers and models are elements of an MVC framework. Resources are the nouns in the language of the web. We use Models and Controllers to implement resources in our web application, but breaking the coupling between routes, models and controllers is one step in the direction to Rails nirvana.

Why is this important?

I’ve found it very helpful to adhere strictly to a RESTful architecture. This means thinking of everything as a resource that responds to the six default RESTful actions. This helps to keep your interface really clean, your controllers really lean and simple to test and pushes your application logic into the model layer where it rightfully belongs.

A “for example”

Let’s take a look at a simple case like a board game. Once a game is set up, the following things can be done to it

  • a player can join
  • a player can leave
  • a player can make a move
  • a player can resign

All of these constitute an “update” to the game resource. It might be tempting to start adding controller actions like join_game, leave_game, move and resign. Let’s see what happens if we do.

Interesting. Our controller is exploding and is not very DRY at all. Is there some way we can be more RESTful about this? Here’s technique number 1

Decouple your Controllers and your Models

Rails nested resources to the rescue! We declare Players and Moves as nested resource of Game

Please note, there is no model called Player. A Player is nothing but a User. Secondly MovesController#create doesn’t call @move.save. It calls @game.add_move and all the corresponding logic that truly belongs in game is being called from the Moves controller. Thus, we’ve created a resource called Player out of the model user, and our Moves resource uses the Game model API to add moves to the game. There is no spoon!

Decouple your API from your actions

So let’s say even after thinking really hard you can’t find a resource that can give you the API you want. An example? Oh, let’s say - an article going through the various states of “moderated”, “published”, “unpublished” etc. Instead of adding various methods like approve, publish, unpublish, we can decouple our API from our actions and drive all these state changes through the update method. i.e.

Note, we’ve replaced the update_attributes method with Article#update which can contain all the convoluted logic to deal with the parameters sent. As you can see, we’ve decoupled our API from our controller and our model. The shape of the API as exposed to the outside world has very little to do with the internal implementation in terms of models.

Your API is the little jewel of your app. Users of your app will judge you based on the intuitiveness and consistency of your API. Therefore, it is not a good idea to shoehorn your API to fit current Rails conventions. Rather, you can and should try as far as possible to decouple your API from its implementation.

Another advantage - if your app is to work at any kind of serious scale, at some time you are going to have to consider a polyglot implementation. Maybe you hand over to a service written in Go or call an external web-service. At this point, a clean separation between your app’s API and its implementation will really stand you in good stead.

Readable Specs == Business Value

Consider the situation of a product manager or software architect overseeing the work of several dozen if not hundreds of programmers. If you have been in this situation you know what the problems are

  • Ensuring delivery quality is a nightmare - you are spending as much time verifying the work of the junior programmers as you are solving problems.
  • It is very difficult to maintain a clear separation between the architect and the implementation team. Architects are expensive and should not be concerning themselves with low level implementation details. Also, architects need to think at a different level of abstraction than the implementers. However, without this clear separation, you find that a lot of the senior programmers time is wasted on low level issues
  • Levels of supervision being high, code reviews become time consuming and expensive. Bugs slip through regardless and such bugs become expensive in terms of client satisfaction.
  • Programmers leave frequently and new programmers need to come up the curve quickly in order to avoid interruptions to service delivery.
  • Good programmers are hard to find. One has to make do with locally available talent which most likely cannot deliver working code without supervision. Because good programmers are hard to find, they are expensive. Also, they bring about “key-man” risk in your organisation.

If your organisation is not serious about writing tests for your code, it is obvious that the product delivery manager is going to have a hard time supervising the output. It is very time consuming to read through source code and one has to verify the correctness of the behaviour by hand. This is clearly not scalable.

If you are writing specs, you’re in a much better position. Reading the specs is much easier than reading code and passing specs mean that the code is performing as desired. However, it is possible to turn the humble spec into a powerful force multiplier and generate huge amounts of business value through the simple expedient of making the spec readable.

Consider the following three examples. Which one would you rather work with?

First, the actual implementation itself.

This is very cumbersome to work with. The reader has to try and follow along the logic in their mind, an impossible task. Failing that, they must test the functionality by hand which makes the whole process very slow and time-consuming. Clearly not a scalable process. This is why the software industry developed automated testing.

Then, a naively written spec of questionable readability

This is much better. It tests the behaviour of the program automatically and so this means that simply verifying that the spec is properly written means that passing tests give you a lot of confidence in the behaviour of the software. However, the spec is very long and verbose and it is possible that the spec itself has bugs in it. Secondly, lazy programmers can slip in some tests that do not actually test anything.

And lastly, a readable loan spec

Ah! So much easier to work with. A casual glance at the spec reveals the intention of the spec and passing specs give confidence as to behaviour. A delivery manager working with the latter is going to be orders of magnitude more productive than with the first. i.e. if you are not writing specs, you are living in the dark ages. If you are, then making them more readable is perhaps the single most effective thing you could do to improve productivity.

Last weekend we were doing a TDD course with a client and here’s the spec file that emerged from it. I consider this to be a very readable spec file and my thesis is that were I to be the delivery manager on this project, I have achieved complete decoupling between the architecture and the implementation of the product. Here’s the spec for a simple tic-tac-toe game.

Readable specs enable the product manager or architect to simply forget about implementation issues and concern himself only with the API and behaviour issues. Specs like these will enable you to manage many more projects than you currently do. With specs like these, you can add more people to the team seamlessly since these specs act as a very readable form of documentation. Readable specs completely decouple your architecture from it implementation. This is a huge win considering the manpower churn that goes on in the industry.

The entire RSpec team must be saluted for creating this amazing piece of software. Whenever I look at a Ruby codebase these days and I don’t see a spec/ directory, I do a little head-shake inside. There is just so much value hidden inside this gem! David Chelimsky and team, do take a bow!

If you’d like your team to write readable specs, give me a shout.

Painless Controller Testing In Rails

There seems to be reasonable support for the opinion that controllers do not require testing and that integration tests or acceptance tests are sufficient testing for controllers. I think in large part this opinion arises because controllers are hard to test. In this post, I’d like to share a technique I use for painless controller testing, but before that we can try and answer the question whether acceptance/integration tests are sufficient tests for the controllers as well.

In my opinion, what an application does and the visual representation of those actions are two completely different things. To give a trivial example, when testing the create method using capybara one might say page.should have_text("Item was saved"). Is this a sufficient test? Could there be a situation where the controller thinks an item has been saved but it hasn’t actually? I think so. A much more comprehensive test would be to specify expect { post :create, {:item => {...}}}.to change(Item, :count).by(1) in a controller test. Secondly, if tomorrow you get an edgy designer who says hang on, this message is really boring. We want the flash message to say “Awesomesauce! You’re bloody brilliant you are!” then you have no way of knowing that your controller is ok but your views are out of date. Coupling the UI with the functionality is unnecessary complexity, which brings us neatly to our next point.

Controllers are the API of your application. They are how your application talks to the outside world and is a major piece of your architecture. This code does some very specific things and it must be under test coverage. Additionally, testing is less about catching bugs than it is about coaxing a solid architecture out of your various constraints and requirements. Test Driven Design really works and the payoffs are so fundamental that I for one am never going back to the old way of doing things. I’m not saying TDD is the silver bullet of software development, just that it is probably the easiest way to come up the curve when it comes to thinking architecturally. My previous articles on Test Driven Design and State Machines all came out of rigorously applying TDD and really taking a step back to address the issues that were making my tests painful. The payoff is huge - code that is very modular and easy to maintain.

So I took the same approach to controller testing. The first thing to do is to take a naive approach to controller testing. If you look at the controller spec file that RSpec generates, it does a pretty good job of enumerating the responsibilites of a controller. Let’s quickly go over these

  • variable assignment - does the controller gather the correct data from various places?
  • message expectations - does the controller call the correct methods with the correct parameters on the correct object?
  • response handling - does the controller render the correct template or redirect to the correct URL?

Here’s the rspec output

These are the classic responsibilites of the controller. Technically, it is not the responsibility of the controller to worry about the side-effects of the method calls it makes. However, for some issues there is no better place to put this spec and I personally prefer this functionality to be tested in the controller. RSpec agrees and does the same -

  • side effects - does the controller add/delete items as requested? Specifically you will often see a expect { ... }.to change(Foo, :count).by(1).

In a simpler world, this would be enough to decide if the controller is working as specified. For better or for worse, our world is not so simple and we have additional responsibilites in our controller, specifically authentication and authorization. Deciding whether to allow a particular action to a particular user is responsibility of the controller and this adds a lot of pain to our tests. Now for each user role, we need to have one set of tests. Already we start to feel the pain but we want to delineate the pain more precisely so we plough through it, ending up with an RSpec output that looks like this.

OK, this is just for two user roles and we already have a spec file of 400 lines. Clearly, an unsustainable situation!. A spec should be easy to read and comprehensible at a glance. We need to refactor. Ploughing through the pain of our repetitive tests we learned a few things about our controller. The first thing we learned is that the only method that behaves differently from the others is the index method. The reason for this is that when a user is not authorised to edit or update some object, a CanCan::Unauthorized is raised. It is only for the index action that we might want to return different objects based on access control. What this means is that we only need to login with credentials to test correct assignment of data for the index method. And we do so with a small shared example group like so

Voila, we have a one-liner to check data assignment for any given user role.

Now, we move on to testing the various other actions. Since the remaining actions now only need to know whether the user is authorised or not, we don’t need to actually login. This is a good place to use a stub because the stubbed functionality is unlikely to change. Here’s a look at the little DSL we might implement to help us keep our tests DRY.

One nice thing about this approach is that it preserves the informative failure messages from RSpec.

The whole example including testing authentication, authorisation, message expectations, variable assignment, response handling and side effects is about 80 lines of RSpec with about 40 lines of shared examples.

The whole example is available here: https://github.com/svs/painless_controller_tests/blob/master/spec/controllers/items_controller_spec.rb

Easy Tag Search With Sequel

One of the most under-rated libraries in the Ruby world in my opinion is the brilliant SQL builder library Sequel. Sequel allows you to build SQL statements in Ruby and compose them using all kinds of nice logic and Turing-complete programming language goodness. So, why in the world of ActiveRecord and DataMapper would we need something that helps us write SQL?

Well, ActiveRecord and DataMapper are ORMs. They exist in order to persist your objects to the database and work fine as long as the particular object is the correct abstraction through which to approach your desired functionality. Sometimes however, you need more than an ORM. You need to be able to look at all the data in your database holistically. You need something where the level of abstraction is the Relational Algebra, so you can make broad, sweeping statements about your entire database. One typical use case for such a level of abstraction would be a search feature. When you search, you probably want to span a lot of tables in your database and ORMs handle such a use case clumsily - you don’t really want to be firing off one query per table now do you?

In digiDoc, we offer powerful tag search as described in this video here:

You can actually get this entire functionality including stuff not shown in the video in about a 75 lines of Ruby code (150 if you care about readability), and that is what judicious use of Sequel will get you.

Here’s the code -

As you can see, Sequel provides a beautiful abstraction of SQL and allows you to exploit the full power of your data store.

The tag filter bits where we compare for “must have” and “must only have” are based on a clever trick I read on StackOverflow (sorry can’t find the link). Instead of comparing against a given set of tags, what we do is to aggregate the count of tags matching the given criterea per record and then check if the count is more than the number of tags we’re looking for. This would have been extremely difficult without changing the level of abstraction. As Uncle Bob says - perspective is everything.

These days, I am using Sequel in all my projects. I cannot think of a better way to work with aggregation, reporting, search and other stuff that doesn’t have to do with object persistence.

If you liked this article, and would love your web-apps to have similar cool features built with the same level of care and consideration, consider hiring our new Ruby consulting firm - Sealink Consulting. We’re based in Mumbai but available to work remotely. Email svs at this domain to get in touch.

Are Comments a Code Smell? It Depends

Of late there have been numerous posts for and against comments in source code. On the anti side, we have for example DHH, whose recent post Clarity over Brevity in Method and Variable Names on the 37signals blog calls out comments as a code smell saying that the code should be self explanatory. Or Jeff Atwood’s post here. Comments are of course not without a cost, and once written, they have to be updated if the code is updated. On the pro-side, we have numerous posts saying comments should actually be more prominent than the code as they are an invaluable source of documentation. So what’s the answer? As usual, for any question worth asking, the answer is - it depends.

A fundamental property of good software is that it is easy to change it, which means that it is easy to understand the code. Good programmers therefore write code that is easy to understand. Comments are one very important tool in achieving the desired communication, but is there a way to write comments without having the overhead of maintaining them?

Kent Beck recently wrote a piece called Naming From the Outside In in which he discusses a very interesting concept - that various parts of your system change at different speeds. A commenter there linked to an illuminating article about rates of change in buildings and the implications on architecture (http://www.scottraymond.net/2003/5/19/pace-layers/). Here we have a clue as to how to write comments with a vastly reduced burden of maintainence.

Every piece of code that you write has three Is associated with it - intent, interface and implementation. All of these change at different speeds. In an object oriented language for example, the intent behind creating a class almost never changes, the public interface changes infrequently or in small increments while the implementation is frequently in flux due to refactorings and other activities.

The Intent of a class must be commented. While each individual function might be quite self explanatory, it cannot convey the intent of a class as a whole. Also, the cognitive load of reading a whole class in order to understand what it does can be greatly reduced by starting the class off with comments that convey the intent of the class.

As for the Interface, i.e. the public API, you should be documenting it with comments that feed into YARD, TomDoc or any other automatic documentation generating tools.

Implementation should be more or less self documenting. It is here that we want to avoid the overhead of maintaining comments as the code is free to change fast. Here, we push all our complex code down into private methods with descriptive names and don’t bother with comments about the implementation. This is because the intent of the implementation will already be documented in the specs, which already do change with changes to the code.

In the above example, we hide away the complicated list comprehension behind a descriptive method name like by_year_and_month. This frees the reader from the burden of having to comprehend the various maps, group_bys and so on that we resort to in order to massage our data into the expected format. This is an example of self-documenting code. As for the complex list comprehension, well, if you want to know exactly how that works, you should be able to find something in the specs that says describe "by_year_and_month" do

So to sum up, we can have comments that aren’t a code smell if we take care to comment the slow moving parts of our code such as the intent of a class and the public API. For everything else, there’s self-documenting code and you can push all your complexity down into private methods which can be unreadable to humans and without comments as long as there are specs that express the intended behaviour.

Why I Migrated Away From MongoDB

I recently concluded a migration away from MongoDB to PostgreSQL for one of my apps - digiDoc. I’d like to tell you why I did so.

To be honest, the decision to use MongoDb was an ill-thought out one. Lesson learned - thoroughly research any new technology you introduce into your stack, know well the strengths and weaknesses thereof and evaluate honestly whether it fits your needs or not - no matter how much hype there is surrounding said technology. What follows is also not a litany of the usual compaints against MongoDB such as data corruption, global write lock, shard configuration and so on. For our use case, mongodb failed at a much more basic level.

digiDoc is all about converting paper documents like receipts and business cards into searchable database, and so a document database seemed like a logical fit(!). Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>. I even contributed some patches to mongoid-map-reduce but the whole experience of aggregating data with mongodb was so ugh that I couldn’t bring myself to work on the app beyond a point. That is of course, a bad place to be in.

Secondly, with a document database, you lose the independence of your data access paths. People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud. Consider a simple thing like an audit trail. In a document database, normally, this would be a set of embedded documents in the item being audited. Now, lets say you want to see a list of all the actions performed by a particular user. Boing! Trapped. You have to load every document in the database and extract the udit trail from it, then filter it in your app for the user you’re looking for. Just the thought of what that would do to my hardware was enough to turn me off the whole idea. JOINs are cool! And guess which problem you are more likely to have - needing joins, or scaling beyond facebook?

Thirdly, mongodb is quite feature poor. Perhaps this has changed in the last few months, but when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case. Now my model, which I have carefully constructed so as to adhere to the Single Responsibility Principal needs to have callback hooks to save this search string everytime it is updated. And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.

Fourthly, and this one completely blew my mind - somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost. I thought we were scaling hard when one of our customers suddenly had 1111 documents overnight. Imagine my disappointment when I realised it was actually four 1s, added together. They’d become strings along the way. Now in this case, perhaos the fault was mine but somehow, I can’t see this happening with Postgres. And when you add up all the hours it takes to deal with these niggling problems and the surprising lack of features, it all leads to some pretty obvious conclusions.

Fifthly, what was I getting in return for dealing with all this? Web scale doesn’t interest me so much. digiDoc is tiny and RDBMS have proven themselves to work at whatever scale we’re likely to achieve. Then it might have been the lask of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.

I completed a migration to Postgres yesterday. Very happy. Aggregation is a breeze, search is a breeze and we’ve built some pretty powerful tag search and management features that do not even bear thinking about in mongodb. Postgres turns your data store into what it was always meant to be - a mere detail in the scheme of your app, not an overarching presence forcing you to adapt to its requirements.

Lesson learned - Be very circumspect when turning your back on 40 years of computer science.

Clarity over Brevity in Method Calls

A method name is actually just one part of a method invocation and since methods take arguments, one can use the arguments to provide a lot more clarity of what the method is doing than simply restricting oneslef to the method name. Infact, methods with names like make_person_an_outside_subscriber_if_all_accesses_revoked are just begging for a plethora of methods like make_person_an_inside_subscriber_if_all_accesses_not_revoked, make_person_an_outside_observer_if_some_accesses_revoked and so on. I am not sure I would like to work with such an API, let alone maintain it.

Instead, my suggestion is to for clear method invocations. Clear method invocations are a much better indicator of a well thought out API, where reading and writing the code starts to seem very natural. For example, the above method call would seem just as natural if called as

@person.set_role(:to => :outside_observer, :if => :all_accesses_revoked?)

Just as readable and without the deleterious effects on your API.

Interestingly, the method definition of set_role suffers a bit, and here’s where we have an aesthetic tradeoff.

However, I would much rather write some Tomdoc on the method than pollute my API. Clearly this is a matter of choice. The ideal solution of course would be named arguments, which will come in Ruby 2.0.