Information Driven Architectures

Below are some initial thoughts on having a single Information API not just for applications, but for the entire host organisation. Throughout the below, I use the word “information” distinctly from “data”, with information being the transformation of data into something useful (for some value of useful).

The majority of applications being written today try to solve the various problems they are required to solve by modelling business concepts and their underlying data. This leads to having domain objects like Users, Posts, Comments, Payments and so on, and the corresponding tables in the datastore to store these data. What often gets overlooked in the whole process is the information contained in these basic objects and the relationships between them, and the whole cascade of effects that each new piece of data brings.

For example, there is no table in the datastore that says “Signups on March 12 2014” or “Retention for Cohort 27” or “Friends of foo@bar.com”. Of course there aren’t such tables. We use relational databases precisely so we won’t have to model any and every concept that can be thought of but we can, instead, generate them on demand from simpler data.

And it works. Worked. Up to a point. With the demands of Internet business what they are these days, business users - customer support and marketing for example - need this data to be correct and up to date and their use cases are often as important as those of paying customers. With increasing scale and complexity, generating these tables on demand is prohibitively expensive. In many cases, relational databases are not up to the task.

There have been two attacks on the problem, both of which have their own trade-offs. The first has been “web-scale” document databases. The second has been hosted peta-byte scale data warehouses such as Redshift and BigQuery.

Document stores try to take away the problem of generating information from data on demand. Instead, the information is stored in a ready-to-read format and reading it takes zero CPU cycles. This stuff scales, but as a downside, the data becomes unwieldy to use. For example, nested documents are expensive to access and all the lovely features of relational databases such as independent data access and ad-hoc joins are lost.

Large scale, parallel, hosted data warehouses are another attack on the problem. They are now cheap enough to be available to everyone but they often entail having expertise in data warehouse modelling (a black art, by all accounts), batched ETL processes which introduce another point of monitoring and failure and of course, these warehouses cannot be used to drive application behaviour in real time as they are not what one would call “production” infrastructure. Their use case is for internal customers and they are often out of sync with the production datastore by as much as a day.

So it seems an intractable problem - give up relational databases or embrace ETL and data warehouses? There must be a third way.

And there is. It’s what I like to call Information Driven Architectures. This essentially entails that we stop looking at the user-facing application as a special case. Instead, we have to see it as one component of an information concentrating entity - the business - which has a whole continuum of information needs. Some of them move fast and in real-time, such as scores in a game that your users play, while others move slowly, like the monthly report that goes to investors or the annual tax filings. Seeing one of them as a first-class concern while ignoring the other is bound to lead to a lopsided emphasis on some data as opposed to others. Such lopsidedness is necessarily unhealthy to the prospects of the business.

On the other hand, seeing every piece of data in the organisation as part of a whole brings some interesting new perspectives into the mix.

First of all, we are now obliged to include all kinds of datastores into the mix. Postgres and Elasticsearch are obvious candidates, but your read-replicas, your monitoring APIs, Google Analytics, Google Drive Spreadsheets and Mailchimp stats are all parts of this same organism. In order to drive healthy feedback loops between users, marketers and product managers, all of them need to be looked at as first class citizens of the information landscape within the organisation.

So, for starters, we agree that monolithic data architectures are not possible any more. Data must live where it feels at home. Transactional data must live in relational databases. Realtime data must live in memory. Search indexes in Elasticsearch, user behaviour in Mixpanel, and stuff you don’t mind losing in MongoDB. There live a heterogenous collection of “Data APIs”, which deal with low level issues of getting data into the system. Your ORMs and event tracking javascripts are part of this ecosystem.

This of course leads us to the second conclusion - Information, being available in a variety of data stores, needs a unifying API. This is a restatement of the principle which says that one should code to interfaces, not implementations. Today, you’re more than happy to generate weekly user cohorts from your transactional datastore, but soon it is going to be time to have that done with Redshift. Can you make this change without breaking every single client that needs that report?

An Information Driven Architecture is not just for the app that users use. It’s for the whole organisation. And for this, above the low level “Data APIs”, we need an “Information API” for the whole organisation. This is the API which does indeed have URLs which point to “Signups on March 12 2014” or “Retention for Cohort 27” or “Friends of foo@bar.com”. The various implementations of these concepts will change and evolve over time, which is why it becomes essential to have a unifying interface for every concept that makes sense for the organisation. It is this unifying API that will enable seamless data flow throughout the organisation.

A valuable corollary of this architecture is that information changes at various speeds, and this architecture enables us to control how many resources are devoted to keeping up with changes. A weekly report that is calculated each time it is requested is insanity. Such a report should be calculated on first read and cached for the rest of the week. Having a unified information architecture enables us to control how, when and where data are turned into information. This control enables us to economise on resources while still having available information that is relevant.

To sum up

  • a viable Information API must be able to unify all underlying datastores of the organisation and present the data therein in desired formats.
  • Such an API must use the same vocabulary as the domain being modelled, and as such work at a higher level of abstraction than the “implementation” level which would speak SQL or some such dialect. As such it shields consumers from implementation details.
  • Such an API must also be able to specify when information is updated, when it is to be cached and when those caches are to be invalidated.
  • Information disseminating processes must only use the information API. Weekly reports mailed to users and realtime dashboards alike must run off the information API. There will be exceptions of course, but as organisations embrace service-oriented architectures and the Information service becomes a first-class citizen of the infrastructre, these exceptions will be fewer and further between, restricted increasingly to use cases where latency requirements are very aggressive.

Once this is done, data can be written as per the usual process, but the conversion of that data into information is now mediated by an organisation-wide information API.

I admit these are initial and admittedly scattered thoughts here. It’s what I like to call “lean blog post” :-) .I’d love to hear feedback on the concept https://news.ycombinator.com/item?id=8418006.

Monocultures

This is a post about programming, but I am going to draw on industrial agriculture and its resultant monoculture to use as an analogy for software development, so bear with me as I take some time to lay down a common context.

The popular story is that subsequent to famine after famine, India embarked on a revolutionary program to modernise her agriculture sector. Scientific methods of agriculture replaced the centuries old methods that had remained largely unchanged for millennia. Yields skyrocketed and farm productivity soared in the Punjab. Famine has been banished from the subcontinent and now granaries are full to bursting. The official line is that the Green Revolution has been an unqualified success and it’s progenitor, Dr Norman Borlaug was even awarded the Nobel Peace Prize. (Of this last fact, you can make what you wish.)

On the other side of the argument are some pretty prominent environmentalists. In her book ‘The Violence of the Green Revolution’, Vandana Shiva sheds some highly unflattering light on the claims made by industrial agriculture. She does not argue that industrial monocultures do not yield more crop than conventional agriculture. In fact, she agrees wholeheartedly. Trouble is that crop yield is a very narrow metric to measure farm output by. Farms are not factories that churn out grains but are complex ecosystems which do not submit to a teleological approach.

Shiva argues that food crops are but one of the things that conventional farms produce, and the increase in yield is at the cost of these other things. For example, rice with thinner stalks yields more rice per acre but the chaff and stalks which used to feed the farm animals does not obtain any more and has to be bought for money by the farmer, making the farm and the farmer a little less self sufficient. The natural pesticides that used to grow in the fields do not grow any more and now industrial fertiliser needs to be used. Seed banks are useless as now only seeds resistant to the chemical pesticides can survive in this environment. The inevitable result is that what used to be a self-sustaining ecosystem has now turned into a monoculture that requires vast amounts of inputs from external sources in order to sustain itself. The self-sufficient farmer has become dependent on forces outside his control. Monocultures are less resistant to pests and climate change. The whole system, in effect, works until the day it doesn’t at which time it collapses hard. Never bet against the second law of thermodynamics, it seems.

Which of these is true? Which is better? These are all questions that can be endlessly debated. What should be reasonably uncontroversial is that when one choses one over the other, one should do it with full knowledge of the choice. The danger is in choosing without knowing what one chooses. And it’s this unconscious choice that I’d like to address.

Just like agriculture has it’s ‘fake’ metric of yield, the startup software delivery mechanism is also measured in similar terms. Things like lines-of-code are of course too barbaric to merit much discussion these days, but the more dangerous metric in startup-land is, of course, time-to-market. We’ve all heard these old saws before - Startups win when they get to market first. No startup ever failed because of bad code. Get Shit Done. Ship it. etc. etc.

All very well, but who stops to consider the cost of a feature? At one client, the outsourced developer team said they’d have X feature set ‘ready by Friday even if we have to work all night’. And who, my friend, is going to maintain the code you wrote at 3 am after 5 straight 16 hour days? Startup product teams think they are selling kitchen appliances. Actually they are more like Agriculture Minister of a developing country in a time of climate change. When you don’t know whether you will have a drought or a flood next year, when you don’t know whether temperatures will be three degrees above or seven below long term averages, what do you optimise for? The answer seems to be obvious - resilience. You build the farm sector that works under the widest variety of possible outcomes. Yet when delivering MVPs, startup product owners often worry more about specific feature sets than about their ability to deal with change. And this is where the danger of the monoculture bites.

If you have little money, many competitors, low barriers to entry and a shifting technological landscape, will you still optimise for a given feature set? Or will your optimisations be more general? Optimise perhaps for team cohesion over team size? For feature ownership rather than feature delivery? For sustainability of processes over lines of code written? For reputation over outcomes? For adaptability over expertise? For relationships over transactions?

Are you playing with new languages and platforms or is it ruby ruby ruby rails rails rails all the time? Are you experimenting with Service Oriented Architectures or is it Rails monoliths all the time. How many different message queues have you experimented with? How many conference videos did you see last week? Did you get time to sit quietly and just think about that tough problem? Did you attend a javascript meetup where they talked about testing front-ends? Have you booted up a raspberry pi…..

Time to market is important. It’s just not the only important thing. Here are some other things that are important

  • developers that stay on the team a long time - developers are not fungible and good developers that stay on a team for a long time can and do contribute heavily. Are you optimising for keeping your best developers around for ever?
  • side projects - “hey boss, I just deployed my side project this weekend and Capistrano 3 works great. Let me take our legacy deploy system and upgrade them.” Is this a conversation worth having? You can’t have it if you’re working 100 hour weeks.
  • "We need a new internal facing webapp for customer support. How long will it take?" - "Two weeks". - "Take four. Build it in clojure."
  • "Take the rest of the day off from putting code into production and prepare your slides for the local javascript meetup"

Most of the imaginary conversations above revolve around taking the focus off code in production and focussing instead on the processes that lead to good production code. Effective software development teams, like farms, are complex ecosystems that do not yield to teleological approaches. Code is just the stuff that sustains startup revenues, but if your development processes start becoming monocultures, you will need increasing amounts of energy to keep it productive. Project managers, quarterly reviews, Gantt charts and so on will eat up your resources much more surely than giving developers time to explore new technologies and sending them to conferences.

Startups fail for a variety of reasons. Getting to market second is the biggest fear of most entrepreneurs, but they’d do better to develop a healthy fear of developer churn and excessive managerial overhead.

As my final point, I’d like to point to HelpShift, a product built by a company initially known as InfinitelyBeta. I only have an outsiders view on events, and what happened is well known - they started with a product called paisa.com, which failed. They then iterated with a help-desk system which pivoted into a help-desk solution for mobile apps. This happened over a period of 4 or so years. In all these years, never once did I meet a stressed developer, never once did I hear of ‘discipline’ being imposed, of Gantt charts and product managers and so on. Notwithstanding early failure, investment from well known funds, setting up offices in the Valley and eventual success, I never felt any change in my friends at HelpShift. Always cool, always more interested in the path than the outcomes, generally not worried about specific instances of success or failure and instead working to build an organism that thrives in many environments. As far as I can tell (I’m no insider btw), they seem to have invested into building an awesome team of A+ developers that bring a wide variety of skills to the table and can adapt to changing circumstance.

So, when you choose to ‘hire 20 developers and churn out features fast’, know what you’re buying into. Monocultures can last a long time and even help you win. But they are fragile. And over anything more than the very short term turn out to be very expensive.

Hacker News Link

Bitcoin FUD

TLDR - Cryptocurrencies are the future of currency, not of money.

There’s a lot of talk about bitcoin these days, and from the tone of the article you can generally tell who’s got theirs and who doesn’t. The spectacular rise of BTC from a fraction of a cent to $1000 has basically split opinion on the whole into two camps. The’I-got-mine-see-how-smart-I-am’ camp goes on and on about how BTC are going to make them rich and the ‘oh-noes-i-have-no-bitcoins-i-suck-bitcoins-suck’ keeps predicting doom for BTC and eventual retribution for those who are violating the commandments against avarice and pride. In this to-and-fro, it’s easy to lose sight of the long term. BTC has been around for a very short while in the grand scheme of things so let’s try and see where this whole scene might end up in the near long term.

Before we begin, let’s draw a distinction between wealth, money and currency. These definitions are quite amorphous and endlessly debated but let’s take these generally as givens for our purposes.

  • Wealth is all the stuff you have which is valuable - your house, land, bank deposits, stocks and bonds and so on are your wealth.
  • Money is that part of your wealth that is denominated in a fashion that other people are generally happy to accept. So your bank deposits are money but your 30 shares of Apple are not money and neither is your house. No one is going to accept a fraction of a share of Apple in exchange for a sack of potatoes, but they will accept fractions of a dollar. Money is in this sense a widely accepted token of wealth and as such represents wealth in its most liquid form.
  • And lastly, when the ‘money’ gets a token that can be handed over to another person to signify change of ownership of the money, that token is known as currency. Rupees, Dollars, Cents etc. are forms of currency.

Currency gets its value from the money it is a token for. Money gets its value in very interesting ways. Here’s a small riddle I like to ask people new to these concepts - Is gold money because it is valuable or is gold valuable because it is money? Think about that for a bit (or not) and then read on….

Money gets it value from at least two sources that I can think of

  • the properties of the thing being used as money and
  • the social, political and economic milieu (and of late, technological as well as we see with BitCoin)

For millennia, gold has been the ultimate form of money because of intrinsic properties of gold that make it the ultimate token of wealth. Easily divisible, hard to counterfeit, stable supply, totally indestructible and lack of any other use for gold made it a very convenient token of wealth. Because of these properties, people were happy to accept gold in exchange for goods and services and because people were happy to accept gold, gold became more and more useful as money.

However, we haven’t been using gold as money for several decades now. 99% of money in the world today gets its value not from the paper it is printed on but the entire dynamic geo-political system that collectively subscribes to the notion that money has value. It’s a brilliant trick this one, and one not easily pulled off and even less rarely sustained. These days money gets its value chiefly from these three sources

  • You can pay taxes with it (since, like forever)
  • You can repay debts with it and
  • You can buy oil with it

The second point was the one I found most interesting. If you look at a Federal Reserve Dollar, it says that the note is legal tender for settlement of debts. What that means is that if Moe owes you money and he offers you dollars, you have to accept them, no matter your opinion of the US Dollar. If Moe goes to a judge and says “Here’s the two thousand dollars I owe Moe”, the judge will cancel the debt against Moe, even if the USD has gone the way of the Zimbabwean Dollar in the mean time, and 2000 dollars won’t even get you a cup of coffee any more.

If you look carefully at the above three points, they all have one thing in common and then we can conclude that money gets value these days from the coercive power of the state. In fact, any money that doesn’t intrinsically have value as a token of exchange due to its own properties gets its value from the coercive power of government. This is your common or garden variety ‘fiat’ currency system.

So where does that leave BitCoin? Pretty much squarely in the former camp. BitCoin is the new gold with some pretty major differences. Like gold, it is indestructible, uncounterfeitable (allegedly), infinitely divisble and so on. As for the differences, there are two major ones - one in BitCoin’s favour and one working against it. The first is that you can send BTC down a wire and they can end up almost instantaneously anywhere in the world, blithely ignoring national boundries and so on. The second factor which works against BitCoin is that there is nothing stopping anyone from starting a competing currency. The supply of BitCoin is limited, but the supply of cryptocurrencies is potentially infinite. My belief, as of this writing, is that the latter effect will preclude BitCoin from becoming a global store of value.

Also, issues in the crypto and legal and regulatory acceptance of BitCoin represent risks to the value of BitCoin. It can’t be denied that not knowing who, what or where made BitCoin is quite a hurdle to its legitimacy. Is it a CIA backdoor? Do we really understand the crypto? etc are all questions. Imagine if no one knew who had made Linux. Would it still power almost the entire Internet? When national fiat currencies collapse, they give plenty of warning and there are huge resources and geopolitical machinations that step in to prevent outright collapse. There is no such system protecting BitCoin. A loss of confidence in BitCoin can happen overnight and there is going to be no one there who has the power to keep the currency going as a viable proposition. These risks mean that people will hold on to BitCoins only as far as their risk appetite goes.

So, what’s the value of a BTC? Essentially, its value lies in being able to provide frictionless, anonymous, instantaneous payments. For BTC to be anything more than a fringe player on the world financial stage, it must offer frictionless exchange with existing currency systems. For this, it will remain in demand. However, for a currency to truly become a force in the financial system, it must also operate as a viable, long-term store of value. And this is probably not going to happen with BitCoin due to the risks inherent in its structure, recent events notwithstanding.

The only reason people are demanding bitcoin right now is because the price is rising. The number of things you can buy with it are not proportional to the value of coins in circulation. This is a characteristic of bubbles.

If you’re in any doubt about whether currencies intrinsically have value, just think - when the Euro launched, did people rush to buy it? Not really ( I was there). The Euro was backed basically by faith in Eurpoean monetary and currency institutions, and the Euro did add value by taking away the currency risk from intra-European transactions, but beyond that, no new value was added by changing the currency. Currencies only measure value, they do not create it.

If you’re buying BitCoin simply because the price is rising, consider yourself warned.

Roll your own web framework in half an hour

i.e. Fun With ActionDispatch

So, a long time ago, in a land far far away there once lived a powerful programmer and this programmer had created a web framework called Rails. Rails was a massive and monolithic framework and about as hard to tame as a dragon and so the Ruby community went to work on solving that particular problem. Armed with hexes, swords and gems, the community surrounded the dragon and cut him up into tiny pieces with the result that what we today know as Rails is mostly a curated (or omakase, iyw) collection of modular functionality that actually works independently of the framework. The dragon got turned into a bunch of tiny and not too ugly dragons that play well with others.

So it has come to pass that making your own web framework in Ruby has now become a trivial undertaking, one that with practice can be completed in the first ad break of the Saturday night Bollywood blockbuster on Sony TV.

So, what’s in a framework then? In order to complete our framework before Katrina Kaif finishes eating her chocolate, we’re going to cheat a bit and change the definition of web framework. Well, we’re going to change it from Rails’ definition to something more in tune with this idea from Bob Martin from his eye-opening talk called Architecture The Lost Years http://www.youtube.com/watch?v=WpkDN78P884 (Highly reccommended if you haven’t seen it already) - A web framework is something that makes it possible to serve your application on the web.

By this definition, the web framework only does all http-y things basically translating http requests into method calls to the domain logic of your application and returning nice status codes and so on to complete the response. It also handles sessions, cookies and user authentication. All the domain logic including persisting data is handled by your application which sits inside its own gem.

So before we go any further, let’s answer the question that is burning in everyone’s mind - Why did Sarah Lund do it? Err….sorry…wrong audience. The question we’re going to answer is - For helvede dude! Why make your own framework?

Seriously?! ‘Because we can’ doesn’t answer anything!

Let me say this upfront. The chance that your new framework gets used by anyone apart from you answers true to #nil? However, there’s no doubt that you’re going to have a better understanding of many of the components that make up your day-to-day experience as a Ruby developer.

Also, Rails controllers are - how do I put this gently - pretty weird! There you are reading about this wonderful thing called Single Responsibility Principle and then you look at your Rails controller blithely ignoring object oriented purity and yeah, it makes you think. Don’t take my word for it - Gary Bernhardt nails it right here

Maybe Rails is just a little too omakase for you. Maybe you don’t entirely trust the LiveController. Maybe you just want to know what makes up a web framework. Whatever your reason for rolling your own framework, Rack and ActionDispatch have your back.

In the following example, we’re going to build a little chess playing API. Very little. Just enough to prove that we have a real web application. The source code for all this is here: https://github.com/svs/ryowf

Rack

Everyone knows what Rack is, right? It sits behind your web server, turns your web requests into nice Ruby hashes and provide a uniform API so you and your pair don’t have to spend hours arguing over names. A rack app is anything that responds to #call(env) with something like [200, {}, “hello world”]. Simple. Our little Rack app looks like this.

We have a method called #call which receives the env (which in turn contains the request and associated data) and responds with something Rack can send back to the web server. The first thing we need to do is to figure out which functionality was requested, for which we need a router. Routers are built with ActionDispatch.

ActionDispatch

ActionDispatch is a lovely little gem which lets you do this -

It’s basically the Rails router and the backbone of our web framework. Our router looks basically like this

The router in turn responds to call by calling the “handler”.

A small word about the design of our framework here. Since we don’t like Controllers that do 10 things, our controller only does two or three. Still a violation of SRP but hey - a tremendous improvement. So what we’re aiming for is for every controller action to be its own class. for example, instead of saying

we want to say something like this

This has a number of advantages. Each class is now back to doing way fewer things. We cannot accidentally expose an action to GET requests because the methods are named according to the request method.

The controller action is basically now calling out to the domain logic and returning a response. If you want more control over how the json is formatted, hand off to a formatter of your choice, use ActiveRecordSerializers or rabl or roar or any of a number of lovely presenters.

With this in mind, we can write our little chess playing API like so

Conclusion

Router.rb + ControllerAction.rb are together 43 LOC. That’s all you need to get a decent router and decent controller DSL to put your app on the web. Need authentication - use warden. Need authorization - use any of the authorization gems. Need caching —— you get the idea. Nothing in this approach precludes you from using the gems you love.

Everything else that Rails provides is basically either a massive convenience or insupportable bloat, depending on how you look at it. Rails does make the life of developers very easy by providing helpful rake tasks, caching, turbolinks, helpers, multiple environment support, code reloading, asset pipeline, a freaking ORM INSIDE THE WEB FRAMEWORK!! It saves hundreds of man weeks arguing about where to put stuff (simple, put anything anywhere). Some of these things are required if you’re generating HTML on the server but if like the rest of people who like to enjoy programming you are also mostly writing APIs and microservices, half of Rails is already not required. To do the job of exposing your business logic to the web, Rails is more and more seeming like overkill.

One of the goals behind the beautiful and very successful modularisation of Rails 3 was indeed to let a thousand web frameworks bloom and I would say Rails core team has done an admirable job. Whereas previously Rails was an all or nothing proposition, we now have the freedom to choose our individual tools, whether they be ORMs, templating engines or what have you. These same freedoms now allow us to step out into the world with really lean, focussed code in case we want to esches the excessive ceremony of Rails.

This is just one of the many reasons I love Ruby and the Ruby community so much!

rBus is now open source

Just a small note that the codebase that powers rBus is now open source.

Personally, I believe that the problems of urban mobility are some of the most important problems of the urban age and the solutions for these cannot come from a single source. Also, rBus is inspired very much by the open source software movement and is intended to work as a community based solution to a shared problem. Thirdly, I have had more than one request to deploy the rBus code for other entrepreneurs looking to solve the same problem. Under these circumstances, the proprietary nature of rBus source code was becoming a stumbling block which has, happily, now been removed.

The source code is available at http://github.com/svs/rbus under an Apache 2 license. You may use the code but not the rBus trademark or logo. I hope to maintain the canonical branch of this source code and so do please contribute your modifications back to this repo.

Also, very pleased to have some more green squares in my github profile!

Extreme Decoupling FTW

After spending years writing fat, ugly classes, I suppose it is inevitable that the pendulum swings the other way and we head towards ‘wat? you got a class for that..?!?!?!?’ territory. For the moment though, the wins are huge and keep coming.

Dependency Injection is just a big word for explicitly passing in stuff that your object is going to depend on. Recently, I’ve been working on an app that automatically goes and make reservations using certain APIs. Testing it is always problematic because one has to constantly stub out the actual API call with something that doesn’t contractually oblige you to several thousand rupees of payments :-) Also, the frontend is being worked on by a separate team and I don’t want the dev/staging server to make actual bookings during development. With DI, dealing with situations like this is easy.

While building the app, I decided that nothing was going to talk to anything without talking to something else first. Here’s a rough sketch of the architecture.

The TripBooker first calls out to a VendorSelector service which provides a list of vendors based on any filtering rules that might apply. Then, each vendor is passed into a VendorBooker service that does the booking with that vendor. It also calls out to the CredentialSelector service to choose an appropriate set of credentials for the calls. What does the VendorBooker look like?

Oooh….more indirection. The VendorBooker creates objects of class FooCustomer and FooBooking and uses the class FooBookingResponse to parse the results from the booking. Internally, FooBooking calls the wrapper class to the API with the appropriate parameters. FooBooking is the translator class that translates from a generic Booking object into one that fits with Vendors::Foo's idea of what a booking is. The API wrapper class Vendors::Foo has no idea about anything that just happened above. It merely accepts some arguments to the constructor and makes appropriate calls to the foo.com API.

So, what does four levels of indirection get you? Easy pluggability. As I mentioned, I’d had to stub out the calls to the foo API during testing and I was about to deploy the app on to a staging server so our frontend team could write code against it, but I didn’t want to actually make bookings. So, I decided that I would have a different setup for development, staging and testing. Basically, during testing, I make VendorSelector return [:dummy] as the vendor. Then the class DummyBooking and DummyBookingResponse returns a dummy booking without making any API calls.

By being explicit about the dependencies at every step of the way and making sure each class only does one thing, we’ve made life super easy when we need to introduce new behaviour in particular situations.

This is just one of the ways in which DI helps massively. Here’s a flowcharty diagram

Simple Activity Logging in Ruby

Activity Logs are one of the most requested (and consequently, most under-used) features in an application. Managers love the idea of having a full history of everyone’s activity, even though their startup will probably pivot before they have a chance to look at these logs. Jokes apart though, the data in the activity logs are tremendous sources of business value which can help you iterate ever closer to your users requirements. So, it kind of helps to have handy a nice pattern to simplify the creation of activity logs.

The whole problem with activity logging is that it straddles the Controller and Model layers. The activity is being performed on a model, but the model has no knowledge of who is performing the activity. This knowledge is the preserve of the Controller. There have been many approaches to solve this problem. Some of them include setting global environment variables (ouch! not thread safe), using cattr_accessor and hacking around with threads and so on, or perhaps this approach recommended in http://rails-bestpractices.com/posts/47-fetch-current-user-in-models which includes thread local variables.

The other layer of complexity of course arises from the question of how to actually do the logging. Who should do it? When should it be done? Traditionally, approaches have centered around before/after callbacks in the model i.e Doing a before_save :get_attributes to get the original state of the object and then a after_save :log to write the activity log. We all know what the problems are with before and after hooks - they add behaviour to the model which is orthogonal to the models main concern which is persistence. Also, such hooks need to stubbed out while testing, etc. Also, adding hooks means we do not have any control over when stuff gets logged and when it doesn’t. If tomorrow we want to create a secret backdoor for the NSA to silently change data behind people’s backs, we can’t do that without changing the model, which means that a concern with logging is causing the class that deals with persistence to change - Holy SRP Violation, Batman!

If you’ve been in this situation, then I can say, fear no more - there’s a very simple way of solving this problem. Here goes - consider the following situation

We have Users, who can have Foos and also there are Staffs who do data entry as well and can modify Foos that belong to other users. In our ActivityLog, we want to persist the following stuff

Now, here’s the absurdly simple solution. Are you ready? Ok. In your controller, instead of calling @foo.save, say @foo.save_with_log(current_user). Got it? Ok, I’ll say it again.

Simple? We need the model to know about a user? Pass it in as a parameter!

Now, what about the model side? As much as we hate mixins as a form of inheritance/delegation, logging is one of those cases where mixins are a good approach (I’m willing to be convinced to the contrary though). This is behaviour that is shared across all classes that need logging and it actually has nothing to do with the behaviour of the class itself, so we’ll make a nice little module called ActivityLogger which looks like this. Please note, this is pseudo code (it hasn’t been tested)

and we update our model thus

The log_with :foo_activity_log is important. What we’ve done in the save_with_log and update_with_log methods that we included from ActivityLogger, is to inject a dependency on the appropriate class that will do the actual logging for us, in this case FooActivityLog. In order to save (or update) and log, we need the objects attributes, a handle to the user/staff who is performing the action and alongside these, we can pass any other data in a hash which the actual logger can use as it deems fit. To achieve all this, what the ActivityLog module does is to translate a call to @foo.save_with_log(staff) into a call to FooActivityLog.save_with_log(@foo, staff, options). So, what does the actual logger look like? Something like this

Several advantages to this approach which accrue from favouring the explicit over the implicit.

  • We have two implicit methods now, save_with_log and update_with_log, so we don’t need to mess with before and after callbacks.
  • The actual activity log model inherits from ActivityLog, so we can override behaviour as we wish. For example, while saving nested models we might want to call save_with_log on the nested model as well. For such situations, we can easily implement methods as we deem fit in the appropriate child class of ActivityLog
  • We are injecting controller variables explicitly, such as the current user or staff member performing the action. This saves us from having to mess with thread local variables. It also makes it super easy to test this functionality.

So, in short, being explicit about all our dependencies gives us a tonne of advantages over other approaches that depend on magic or callbacks.

What do you think?

Extract Workflow Objects From Fat Models

If you’re wondering why I’ve got fat-model obsession, there’s one simple reason - CodeClimate is a highly addictive and enjoyable way to learn just how much your code sucks while telling you where exactly you should be focussing your improvements and once you get started, you just can’t stop. Basically, refactoring extensively so that all your classes are tiny and focussed, and the accompanying feeling of freedom and fluidity that comes from this drives your code quality higher and higher.

People have talked at length about the benefits of SRP but all that reading hadn’t prepared me for the actual experience of whittling classes down to doing just one thing.

  • Firstly, you can write really focussed tesets. Now more than half of the files in my models directory have nothing really to do with Rails. As POROs, testing them becomes super fast.
  • The elimination of churn from the classes is the biggest source of relief for me. Once the class is specced and written, it almost never changes and this gives me a lot more confidence that changes I make in one part of the code won’t have unintended consequences elsewhere.

The whole experience has been so fundamentally mind altering that I would recommend everyone try out CodeClimate.

During this process, naturally, I learned a lot about keeping models focussed on one thing. CodeClimate’s biggest problem always has been with the models that have an embedded workflow in them, and I figured it might be worthwhile to try and extract the workflow into an object all its own. Here’s my story.

To start with, I have the following scenario. The model in question is the Quotation model, which handles the creation of quotations to be sent to clients. The quotations go through various stages, first being approved or rejected by an admin, then being sent to the client and finally being paid. The original model took care of defining and persisting the attributes. In addition it could answer various questions about what all could be done with the quotation in whatever state it happened to be in, as well as handling all state transitions (including sending email and so on).

Broadly, the class looked like this

The Quotation class is the God class of this application. It has a trip associated with it, a duration, is pooled with other Quotations in shared cabs, it has several fields which represent various orthagonal states (lead quality is one state, the workflow which leads to payment and closure is another), it shows up in the users account and so on. The whole class is very complex and was begging to be ripped apart, scoring as it did, a big fat F on CodeClimate.

So, among other things (such as creating separate scope and search classes), I started to extract a workflow class from this class using the dependency injection technique I talked about in my previous post. As the model currently is, there is a very tight coupling between the Quotation class and its workflow. It is not also possible to have more than one workflow field in the model. In addition, there is no way to have separate workflows based on say, the user associated with the quotation (i.e. different criteria and workflows for different classes of users like guest user, trial user, premium user, etc.) I am happy to report that ripping apart the class and using dependency injection has solved all these problems.

So, the first thing I did was to extract a QuotationStatusPolicy class which would handle all the questions about whether a given quotation was approvable, sendable and so on. i.e.

Using this method, it becomes trivially simple to use a different status policy class for various classes of users.

Now, we need to rip out the heart of the beast, the actual workflow itself. This is done as follows

The last thing thats left is to wire up these new classes into the Quotation class. i.e.

All tests passing? We’re good to go!

Breaking up Fat Models With Delegation

Hope everyone had a relaxing Christmas with loads of great food, booze and gossip. I know I did!

So not being a classically trained CS guy, I have no idea what the name for this pattern is, but I find myself using it more and more to break up fat models. It basically involves a lot of explicit delegation in the class, but no decorator classes. I’m still wondering whether this pattern is a “good thing” or not, so I’d appreciate comments.

Use case - your model has been collecting a lot of crufty methods that don’t have anything, really, to do with the logic proper of the model. Classic examples are methods like full_name, which basically only deal with presentation or some orthagonal logic like currency conversion and so on. There have been several approaches to fixing the presentation issue, the most notable of which has been Draper, but I didn’t enjoy using Draper. For one, I don’t want to call UserDecorator.decorate(User.get(params[:id])). Why? I just don’t ok?! Just kidding - actually in my experience, decorating large arrays (think CSV dump for the last months data) takes f.o.r.e.v.e.r and damned if I didn’t get some subtle bugs with DataMapper associations on the decorated class. I didn’t dig too deep, being a shallow and easily influenced guy, and instead started looking for other solutions. I present mine below

So there are several funky things about this approach.

  • First of all, it is totally explicit. There is absolutely zero magic going on here.
  • Secondly, because of the injected dependency, we can choose the class we would like to use to present our object at runtime.
  • Thirdly, it saves us from having to explicitly decorate our objects.
  • Fourthly, the ViewDelegate object gets access to all the original objects methods using method_missing. This means that the implicit self in the ViewDelegate class is the original object for all practical purposes. Thus, any object that responds to the required methods can use this Delegate, not just a User.
  • Fifthly, the delegate objects are not instantiated until one of the delegated methods are called. This might have performance implications. Of course, the ViewDelegate can be memoized as well.
  • Lastly, this can be used for any type of delegation, not just for presentation. For example, one might choose to delegate a height method to a MetricUnitDelegate or to an ImperialUnitDelegate, depending on the context.

I have no idea if this is a good or even an original approach. Would love to hear from you in the comments.

Param Objects in Rails

In Ruby, everything is an object and this unembarrassed object-orientation gives Ruby much of its power and expressiveness. Being able to call methods on an Integer and override the behaviour of strings are just two of the awesome things that one can do due to this design decision in Ruby.

In Rails, however, sadly, there are large swathes which are not object oriented, and in my opinion, these areas tend to be the most painful parts of Rails. A case in point, and the one we discuss in this blog posts, is the params hash.

The params hash contains all the input sent in by the user to your application. It is, in effect, the way your user communicates with your API. It is the job of the controller to respond to these requests and for that it must do a number of things with the params. It must, first of all, check whether the user is allowed to send this request. For example, is the user trying to maliciously update some forbidden attribute on the object? Perhaps a deleted_at field or something similar? When it comes to functionality like search, we can also use Ruby’s metaprogramming abilities to DRY up our views and controllers by allowing the user to send in a param like scope=to_call and dynamically call the scope on the model. But what if the user send scope=delete? We don’t want to Quotation.send("delete") now, do we?

Also, many a times, providing a natural looking API to the user means we have to massage a lot of the parameters before we can send them on to our business logic. The API between the user and the controller may be very different than the API between the controller and the models and so a fair amount of massaging sometimes needs to be done. Currently, this is all being done in controllers. So, for example, one often sees things like sanitize_params or worse, code like this

Note the many conditionals as we try and protect ourselves from malicious code and try to massage the params into something that makes sense to our models.

Consider the following use case. You have a list of Quotations, they all have a particular status and in your model, you’ve gone and defined some nice scopes. You want to make these scopes available to the user by passing in a scope=foo parameter. Additionally, you have several search terms you can pass in and there are certain fields that are forbidden to be set in the params. All of this becomes sooooo much easier if you just objectify the params hash. As a bonus, your controllers become even more thin and your param validations and what not become ridiculously easy to test.

I believe the proper term for this is Primitive Obsession, eloquently discussed here by our very own Piotr Solnica. I agree with him, based largely on my own experience with this particular smell hanging around the params hash.

My solution is to turn the params hash into a class of its own. Infact, I make one class for every different controller action where I need it. For example, here’s the QuotationsIndexParams.rb file

So easy, simple and clear. If you don’t already know Virtus, check it out. It’s going to be in the new DataMapper2.

And now that we have full control over our params, our controller can becomes simple as well.

How easy is that?

In case you want to be more permissive about the params you accept, you can use an OpenStruct instead of using Virtus. This way, you do not need to know beforehand the attributes you’ll be setting.

In general, it seems to emerge that there is one layer missing in Rails. Something between the Controller and everything else. Between the Controller and the Models, we often have to put Service Objects to keep our controllers free of logic. Between the Controller and the Views, there have been several attempts at Presenters or Decorators like Draper and so on. Jim Gay in his book Clean Ruby has proposed using Contexts or UseCases to simplify our applications.

I am slowly beginning to think the perhaps MVC is only half the picture, and objectifying params is just one step in a multi-front war on spaghetti code in our Rails apps. While the concepts presented in this post are quite simple to implement, I’m wondering if perhaps a nice params objectifying gem might be a good idea so we don’t have to roll custom solutions each time. I have a great name for it too - parampara!