Matt Briggs

"Not all code needs to be a factory, some of it can just be origami" -why, the lucky stiff

Data-attributes Are an Anti-pattern

| Comments

HTML5 has a lot of cool things in it, but the one thing I wish I could remove are data-attributes, because of the crimes against clean front-end code that it seems to encourage.

What is this clean web code you speak of?

We have 3 technologies that go into building a web app, HTML, CSS, and JavaScript. All three operate on an abstract concept called the DOM, in their own ways.

  • css
    This is the language we use to declaratively set the visual properties of our UI. It consists of a path matching syntax, and a series of rules. Clean css is a) readable, b) doesn’t repeat itself too much, and c) is modular (i.e. you shouldn’t have styles intended for one thing leak into another thing) CSS is very hard (and frustrating) to learn, and even harder to write well.

  • javascript
    This is the language we use to specify the behaviour of our application. Up until the last 2 years there wasn’t a lot of guidance on how to do this properly, but nowadays there is a tonne. Clean javascript is worthy of a book rather then a half paragraph, but for the purpose of this blog post, clean javascript is keeping your behaviour in javascript and your javascript out of the html. Also, that your DOM centric code should be segregated from the more abstract code.

  • html
    Html is the language we use to form the base structure of the DOM. Many people confuse HTML with being the DOM, but that usually comes from not much javascript experience. The HTML should be expressing the structure of your interface in a very abstract way. For example, if you have a navigation sidebar, it may look something like this

1
2
3
4
5
6
7
8
9
<nav>
  <header>Pages</header>
  <ul>
    <li><a href="foo.html">foo</a></li>
    <li><a href="foo.html">foo</a></li>
    <li><a href="foo.html">foo</a></li>
    <li><a href="foo.html">foo</a></li>
  </ul>
</nav>

There is nothing talking about whether this sits at the left, right, or bottom of the page. There is nothing that talks about how the links should be pjaxing the main content div of the app. All it describes is a navigation widget at a very high level.

  • the DOM
    This is where all of those things come together. The DOM is the in memory representation of your UI. It has event handlers bound to elements, it has styles, and it changes dynamically. When you hit view source in your browser, you are looking at the html. When you open the web inspector, you are looking at the DOM (made to look html, due to how confused people are about these things).

The role of data attributes

Data attributes are a new way of serializing information into a DOM node about what it represents, so that you are not forced to use the class attribute improperly. For example, a blog post could look like this

1
2
<article class='video' data-publish-date="2012-08-10">
</article>

We are using an article tag to represent the post, its class tells us what type of post it is (a video), and the data attribute is used to tell us something about it. This seems pretty obvious to me, class is for type of thing being represented, data is for that things data.

Now, with the rails 3 javascript helpers, to send some data to the server via AJAX, you do something like this

1
2
3
4
5
6
<form action="/posts" method="POST" data-remote="true" data-confirm="are you sure you want to post this?">
  <input type="text" name="[post]title" />
  <textarea name="[post]body"></textarea>

  <input type="submit" data-disable-with="Loading…" />
</form>

Now, this looks like a very elegant solution to a common problem. But it’s not really using data attributes the way they are intended to be used.

First we have the data-remote="true". Why would you use a data attribute for something that obviously should be a class? data-disable-with and data-confirm are even worse, since they have a) nothing to do with data, and b) have no business being in the HTML.

Why does it matter that rails co-opts data attributes?

In the small scale, it really doesn’t matter at all. More then that, it works very well. You can make arguments about purity and aesthetics, but at the end of the day, we are co-opting technology that was intended to model papers and blog posts, and using it to build applications. Rails as a whole is meant to build things like base camp, which is the smaller end of mid-sized application, so if you are building that kind of app then they will serve you well (just like the rest of the default rails stack).

If you are building highly dynamic apps, or larger scale apps, things start to break down. When people are taught by rails that data attributes are a way to configure javascript libraries, you end up with stuff like this

1
<%= text_field_tag "text_field_import_scenario_#{scenario.id}", "", :style => "width:346px;", :size => 24, :'data-autocomplete-path' => search_scenarios_quote_scenario_path(scenario.quote, scenario), :'data-autocomplete-raw-html' => true, :'data-autocomplete-send-form' => true, :'data-autocomplete-select' => "$j('#import_error_message').html('');$j('#text_field_import_scenario_#{scenario.id}').val(ui.item.name); $j('#object_id_#{scenario.id}').val(ui.item.value);", :'data-autocomplete-after-update-bad' => "$j('#import_link_#{scenario.id}').show(); resizePopup('import_pop_up_#{scenario.id}');" %>

or completely baffling things, like this

1
data-print-action="check_sku:selected_skus:Item"

One reason we strive for clean code is because it is easy to read. Since HTML is already a very verbose language, this becomes more important. Keeping things simple and focused is the heart of clean HTML, and the previous two examples are almost the antitheses of that. After 10 years we have finally gotten people to stop using inline styles, and the rails community is replacing that with something much worse to maintainable html, inline behaviour.

Ok fine, rails is doing it wrong, but there are still valid use cases, right?

The valid use case for data attributes are when you are doing relatively simple front end work, and jQuerying your way to victory. The javascript community has found that jQuerys DOM centric approach to code structure is a nightmare passed small scale, but if you are in the jQuery sweet spot, then you are also in the data-attribute sweet spot.

If you are doing more complex behaviours and interactions, making the DOM the source of truth is a bad idea. Your source of truth should be objects that wrap data structures and handle synching those data structures to a server. Beyond that, most of your UI will be rendered by javascript anyways, so you duplicating information that will not stay in synch.

TL;DR;

Data attributes are data, not javascript configuration. The rails way of using them works well in trivial cases, but gets exponentially worse the more complex your use cases get.

Mixins Are Not Always a Refactoring Anti-pattern

| Comments

Steve Klabnik just posted an interesting post about mixins. Steve is a really smart guy, and I usually agree with him, but I think his justification is a little bit weak in this case.

Mixin Refactoring through Class Gutting

Oh man, he is so right that this is an anti-pattern. It happens a lot in ruby, someone says “Hey, this thing is doing too much. The only method of code reuse I really believe in is mixins, so I’ll just take the implementation, and dump it into a mixin.”

By doing that, you haven’t decreased complexity, you have actually increased it by breaking locality. Steve introduces the idea of reducing complexity through ecapsulation (right on), and talks about Data Mapper and Repository. Very OOP, and great solutions, especially in larger systems. Still diggin what Steve has to say.

Method Count as a metric of complexity

Here is where we part ways. Lets take the Enumerable module in the ruby standard library. It adds 94 methods on to a given thing, with the requirement that that “thing” provides an each method.

But enumerable is an “idea”, and if something is enumerable, you sort of know how to work with it – through those 94 methods.

Steve talks about how encapsulation reduces complexity of the implementation, well Enumerable encapsulates the “idea” of enumerating. So that means that when providing a public interface, a data structure can focus on its fairly simple implementation, and only provide the most low level and simplist of methods (each), while bringing in Enumerable and let it do the heavy lifting to give the rich interface that people expect from a ruby data structure.

How is that increasing complexity? When I look at Enumerable, it is talking about a single concept. When I look at array, it is talking about a single concept. The only thing I can change to break the implicit protocol between the two is to break the each method at a fundamental level.

Composition would have been a terrible choice here, I think providing 94 stub methods and an internal enumerator object would just increase the complexity, not reduce it. Providing an enumerator as an external thing would have made the api much more of a pain to work with. Inheritance would be better then composition or separation, but the problem is that Array is a datastructure, it is not an “Enumerable”. Enumeration is an ability, not the root of a concept. I think the best choice here is mixin, and that it is fairly obviously the best choice. And I think most people who have implemented data structures in ruby would agree.

So what we have is something that is close to inheritance, but more of a “vertical slice” of functionality. An “ability” rather then a “thing”. This is what mixins give up, the ability to model “abilities” in a concise way.

What is complexity

Rich Hickey defines complexity as an interleaving of ideas. I think that is a great definition. In the case of Enumerable, you are providing significant functionality through providing a simple implementation, the only interleaving is that each method. Sure, the runtime method count is 94 methods higher, but who cares? When you are calling methods on array, you are thinking of it as a single thing. When you are maintaining array, you don’t have to worry about any interations with enumerable outside of each.

I think that the amount, and shape, of a mixins interaction with its containing class is a good measure of complexity. The amount and shape of a classes interactions with the internals of a mixin is a great metric of complexity. The only thing the number of runtime methods is telling you is that maybe you should be looking at those other things, which isn’t that great a smell.

The important thing here is interactions.

Large classes often become complex. But it isn’t a property of their runtime method count, or even inherant to their lines of code. It is because large classes and large methods tend to interact in ways that are hard to understand. Small classes can get complex too for the same reasons, but the larger the class, the easier it is to get to that place.

Why “Gut the class and dump” it into a mixin doesn’t work

It doesn’t work because you haven’t tackled the complexity of the interactions in the code. Maybe it needs to get pulled into another class, maybe methods need to get merged together. Or maybe you are just talking about an inherantly complex thing, and doing the earlier things will make it worse. In any case it is not the runtime method count that will tell you this, it is analysis of how the class interacts with itself and others.

Complexity Smells

Steve wasn’t writing about complexity smells in a general way, but since I have spent so much time talking about what isn’t a smell, I sort of feel compelled to talk about what is. I am sure he would agree with most, if not all of the following

  • When a mixin mucks with class internals.
  • When a mixin mucks with other mixins.
  • When you read the inheritor of a class, and can’t understand it without reading its parent
  • When you read an inherited class, but can’t understand it without its children
  • When there are so many interactions with other things that you have to read many classes to understand how a single thing works
  • When classes do too many things
  • When classes have too many dependancies
  • When classes are aware of too many other objects
  • When too many other objects are aware of a class

And that is just the tip of the iceburg. I would say that a significant percentage of our job is managing complexity in code, it is a huge and nuanced topic. Mixins are also not a simple thing, and are extremely easy to use in the wrong ways.

The Many Faces of Ruby Callables

| Comments

One of the most valuable ideas from functional programming is the idea of Higher Order Functions, or functions that take functions as an argument. It is such a good idea that it has become part of pretty much every modern language, whether functional or not. Amoung the OO imperative languages that have embraced this idea, the ruby community has probably gone the furthest, where it is the first tool a library writer will reach for more often then not.

The language feature required for this style of programming is known as first class functions, meaning functions that can be defined as a variable, passed around, and called by other parts of code. Ruby has four constructs for this, which are all similar, but have slight differences.

The Block

The idea behind blocks is sort of a way to implement really light weight strategy patterns. A block will define a coroutine on the function, which the function can delegate control to with the yield keyword. We use blocks for just about everything in ruby, including pretty much all the looping constructs. Anything outside the block is in scope for the block, however the inverse is not true, with the exception that return inside the block will return the outer scope. They look like this

1
2
3
4
5
6
7
8
9
10
def foo
  yield 'called foo'
end

#usage
foo {|msg| puts msg} #idiomatic for one liners

foo do |msg| #idiomatic for multiline blocks
  puts msg
end

Proc

The best way to think of a proc is that it is the more general form of a block. A block is tied to a specifc function (the whole coroutine thing), while a proc is just a variable. This means that you can easily convert a block to a proc.

An interesting use is that you can pass a proc in as a replacement for a block in another method. Ruby has a special character for proc coercion which is &, and a special rule that if the last param in a method signature starts with an &, it will be a proc representation of the block for the method call. Finally, there is a builtin method called block_given?, which will return true if the current method has a block defined. It looks like this

1
2
3
4
5
6
def foo(&block)
  return block
end

b = foo {puts 'hi'}
b.call # hi

To go a little further with this, there is a really neat trick that rails added to Symbol (and got merged into core ruby in 1.9). That & coercion does its magic by calling to_proc on whatever it is next to. So adding a Symbol#to_proc that calls itself on whatever is passed in lets you write some really terse code for any aggregation style function that is just calling a method on every object in a list.

1
2
3
4
5
6
7
8
9
10
class Foo
  def bar
    'this is from bar'
  end
end

list = [Foo.new, Foo.new, Foo.new]

list.map {|foo| foo.bar} # returns ['this is from bar', 'this is from bar', 'this is from bar']
list.map &:bar # returns _exactly_ the same thing

This is fairly advanced stuff, but I think it illustrates the power of this construct.

Lambdas

The purpose of a lambda is pretty much the same as the first class functions in other languages, a way to create an inline function to either pass around, or use internally. Like blocks and procs, lambdas are closures, but unlike the first two it enforces arity, and return from a lambda exits the lambda, not the containing scope. You create one by passing a block to the lambda method, or to -> in ruby 1.9

1
2
3
4
l = lambda {|msg| puts msg} #ruby 1.8
l = -> {|msg| puts msg} #ruby 1.9

l.call('foo') # => foo

Methods

Only serious ruby geeks really understand this one :) A method is a way to turn an existing function into something you can put in a variable. You get a method by calling the method function, and passing in a symbol as the method name. You can re bind a method, or you can coerce it into a proc if you want to show off. A way to re-write the previous method would be

1
2
l = lambda &method(:puts)
l.call('foo')

What is happening here is that you are creating a method for puts, coercing it into a proc, passing that in as a replacement for a block for the lambda method, which in turn returns you the lambda. One thing I often use this for is debugging in concert with tap.

1
[1, 2, 3].map {|i| i * 2}.reduce(:+)

This code maps an array of integers to an array of integers that have been doubled, and then sums them. If you want to see the result of the map, you can do something like this

1
[1, 2, 3].map {|i| i * 2}.tap(&method(:puts)).reduce(:+)

tap will yield the thing that it is called on to a block, and then return the original thing. So what I am doing is saying turn puts (which takes a single argument) into a method, coerce it into a block, and give it as the implementation for tap, meaning just puts out the value. Since tap returns the original thing, the rest of the method chain will be undisturbed.

Going Deeper with &:symbol

Lets say you are really digging the trick of &:sym, and you have a case where the block is going to yield additional arguments, but you actually WANT those arguments to be passed in as well when the Obj.send :sym happens. Symbol#to_proc is basically implemented like this

1
2
3
4
5
class Symbol
  def to_proc
    Proc.new { |obj, *args| obj.send(self, *args) }
  end
end

So, &:sym is going to make a new proc, that calls .send :sym on the first argument passed to it. If any additional args are passed, they are globbed up into an array called args, and then splatted into the send method call.

Ruby is pretty awesome

A lot of these capabilities exist in other languages, but very few imperative OO communities have run with them the way that rubyists have. A deep understanding of the tools available is an important part of any ruby developers journey to becoming an expert at the language. Back when I was looking for some new language to try and was trying to decide whether to roll with ruby or python first, rubys block obsession was what made me go ruby.

NPM Style Javascript Is the Conservative Choice

| Comments

I am sick of talking about semicolons. But after reading some comments on Tom Dale’s recent post on best practices, I think I need to talk about the reasoning behind NPM style, and what it does to your code. It is not about being “cool”, it is about dealing with two of the three types of bugs that are the hardest things to debug in the language.

I dont care if you use semi-colons or not, that is not what this blog post is about

What this post is about is

  • The problems NPM style is trying to address
  • How NPM style addresses them
  • Some very good reasons why you do not want to use NPM style

Even if you don’t want to use this style of coding, hopefully this post will give you some ideas on how to develop your own techniques for dealing with some of these issues.

The Problem With Commas

Even though we are talking about semi-colons so much, I find you run into bugs with commas in JS far more often. We had a deployment about a year ago that made the app unusable for most of our customers for more then the half day it took for us to find the issue and fix it. It was caused by code that looks something like this

1
2
3
4
5
6
7
8
9
var foo = [
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "foo", id: 123, description: "lorem ipsum"},
];

Can you spot the problem? It’s the last comma inside the array.

There are two huge issues with misplaced commas. First, it is a really easy to introduce bug. Let’s say you are working with backbone, and have something like this

1
2
3
4
5
6
7
8
9
10
11
12
13
MyModel = Backbone.Mode.extend({
  initialize: function(){
    //do some stuff
  },

  canTransistionState: function(newState){
    return this.get("state") == "new" && newState == "published";
  },

  transitionState: function(newState){
    this.set({state: newState});
  }
});

You are looking at the code, and think “You know, canTransistion is lower level then transition, how about I move it down?” You highlight canTransition, and press the button in your editor that moves the function down one

1
2
3
4
5
6
7
8
9
10
11
12
13
MyModel = Backbone.Mode.extend({
  initialize: function(){
    //do some stuff
  },

  transitionState: function(newState){
    this.set({state: newState});
  }

  canTransistionState: function(newState){
    return this.get("state") == "new" && newState == "published";
  },
});

You just got hit by the bug. What is worse, is lets say you fix that one, but then decide to delete the last function

1
2
3
4
5
6
7
8
9
MyModel = Backbone.Mode.extend({
  initialize: function(){
    //do some stuff
  },

  transitionState: function(newState){
    this.set({state: newState});
  },
});

Now you are hit by a bug that is orders of magnitude worse, since it will be fine in firefox and chrome, but will cause ie to die a horrible confusing death.

How NPM Style solves the problem

NPM style says lead the first line with the opening glyph, prefix all following lines with a comma, and close the thing on its own line. So my first example would be

1
2
3
4
5
6
7
8
var foo = [ {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          ];

I know, it looks rather odd. But more important to how things look, there is literally no way you can have a trailing comma if you never put a comma at the end of the line. You can delete or reorder any of the lines without any problem, except for the first one. And the way the first one is prefixed in the same place by a DIFFERENT glyph, makes it very hard to forget to treat it as a special case. Finally, when debugging a problem, it is way more obvious when something is wrong in the npm case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// NPM missing a comma
var foo = [ {name: "foo", id: 123, description: "lorem ipsum"}
          , {name: "fo", id: 13, description: "lorem ipsum"}
          , {name: "foo1", id: 123, description: "lorem"}
            {name: "fooooo", id: 123, description: "lorem ipsum"}
          , {name: "foobar", id: 12, description: "lorem ipsum"}
          , {name: "fobin", id: 123, description: "lorem"}
          , {name: "foo", id: 123, description: "lorem ipsum"}
          ];

// Crockford style
var foo = [
  {name: "foo", id: 123, description: "lorem ipsum"},
  {name: "fo", id: 13, description: "lorem ipsum"},
  {name: "foo1", id: 123, description: "lorem"},
  {name: "fooooo", id: 123, description: "lorem ipsum"}
  {name: "foobar", id: 12, description: "lorem ipsum"},
  {name: "fobin", id: 123, description: "lorem"},
  {name: "foo", id: 123, description: "lorem ipsum"},
];

Now, if I were scanning through hundreds of lines of code without knowing what I am looking for, the first example would leap out at me WAY more then the second.

You might say “In that class example, it would look retarded to align everything on the {“. This is true. Which is why I make a compromise, and do the following

1
2
3
4
5
6
7
8
9
10
11
12
13
MyModel = Backbone.Mode.extend({
  initialize: function(){
    //do some stuff
  }

, transitionState: function(newState){
    this.set({state: newState});
  }

, canTransistionState: function(newState){
    return this.get("state") == "new" && newState == "published";
  }
});

It is less reliable then true NPM style, but I find it still gives me the benefit of making the comma placement a lot more obvious, and I also find the first thing in my class tends to change far less then the last thing. It is not as fool proof as the top examples, but it is a definite improvement over Crockford style.

I think commas are a much bigger problem then semicolons, and even if you reject semi-colon first style, you should still switch to comma first, because it will dramatically reduce the chance of one of the worst pitfalls in the language from happening.

The problem with semi-colons

This is a far less common case then commas, but still really nasty due to how hard it is to debug and to catch. Lets say you are writing Crockford Style code, and do something like this. Note that this is silly code, but the problem is not aparent unless you are doing one of a few fairly abnormal things.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function foo(){
  var apples = 1;
  var bananas = 2;
  var array = [apples, bananas];
  var carrots = 3
  (function(){
    var a = 1;
    var b = 2;
    array = array + [apples, a, bananas, b, carrots];
  }());

  for(var i = 0; i < array.length; i += 1){
    console.log("this whole thing is just for distraction: ", array[i]);
  };

  return array + [1,2,3];
};

When you run foo(), you will get Exception: number is not a function. Whaaa?

The problem is that in javascript, whitespace is insignificant for () and for []. So the following are the same thing

1
2
3
4
5
6
7
8
9
foo();
foo
();

// and

foo[1];
foo
[1];

Why javascript would support such insane syntax is beyond me, but the (single) case this problem happens in the real world is illustrated by my first example, which means that because the carrot assignment was missing a semi-colon, the immidately invoking function instead calls 3(function(){}). Confused yet?

There are two main things that make this bug a killer. One is that the problem occurs on the line after the line that causes the problem. You need to be looking at pairs of lines to figure out what is happening. The second is what actually goes wrong is a confusing message if you are incredibly lucky. If you are unlucky, it will cause some comletely random behavior in your application that you can spend days trying to track down. Lastly, since this problem happens so incredibly rarely, and you are putting semi-colons at the end of every line, it is very hard to a) actually “see” the lack of a semi colon (for me at least, they fade into the background), and b) actually remember that this is an issue that can happen.

How NPM Mitigates the semicolon problem

Since this happens in one case if you are doing cross platform browser work (a line which starts with an opening parenthesis), and one additional time if you are lucky enough to be guarenteed a relatively new version of ecmascript (a line starting with an opening square bracket), NPM treats those as special cases. So the previous example would be

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function foo(){
  var apples = 1
    , bananas = 2
    , array = [apples, bananas]
    , carrots = 3

  ;(function(){
    var a = 1
    var b = 2
    array = array + [apples, a, bananas, b, carrots]
  }())

  for(var i = 0; i < array.length; i += 1){
    console.log("this whole thing is just for distraction: ", array[i])
  }

  return array + [1,2,3]
}

Notice the leading semicolon in front of the immediately invoking function?

Now, you might argue that it is even MORE invisible to not have the leading comma. First of all, once you get used to seeing ;(function(){}()), not having that leading comma is the thing that makes it look strange. Since it is at the start of the line, the fact it is missing also helps me immensely. Lastly, when debugging the problem, you aren’t looking for pairs of lines, you are looking for a single thing (that you can easily grep for)

Wait, aren’t there like, a bajillion other places where no semi-colons will screw you?

There sure are, but those other cases will never happen in real life, so you don’t need to worry. This topic has been discussed at great, great length recently, but if you would like to learn more, I would recommend the following resources

Enough about the semicolons

We are talking about an incredibly rare issue in the wild, and it is really time to stop. The much, much, much more common issue is the trailing commas, and really that is the biggest gain from using NPM style javascript

Why You Should Not Use NPM style javascript

Like everything else in this job, there are no hard rules, only ideas that are good for certain cases. Here are some great reasons not to use NPM

  • Your editor doesn’t support it

NPM style is quite popular, but not to the point where everything supports it. The snide comment would be that if your editor doesn’t support it, find a better one. But realistically, that is often not possible or desirable. Both Emacs and Vim support it out of the box, if you use Emacs I would highly recommending installing the excellent JS3 package, which I believe has the best js indentation out of anything out there (and yes, I have tried every popular current editor). In fact, I would say the two best choices currently for javascript work are Emacs with JS3 if you prefer light weight, or IntelliJ WebStorm if you prefer IDEs.

  • I understand the issues, but I don’t think it is worth switching to such a wild coding style because of them

There is nothing wrong with making this choice, the important thing is understanding why you are making it. Hopefully this blog post will help you debug a really nasty class of bugs in the future.

  • I primarily use Java/C#/C++, and this is just too different

It is important to remember that this is really a different language, and that even if you write it like java, there are cases where things work differently (especially semicolon stuff). You probably don’t want to go to this style if you write similar code all day, but you should try to think of less drastic ways to adjust your style that will help you avoid these pitfalls

Massive walls of text are fun

This was a pretty long post, but I think it is important. As a professional developer, it is your job to be educating yourself about the things you use to do your job, and creating processes that help you do it more efficiently and effectively. Most people think that NPM style is about being “different”, or making your code look “cool”. It really isn’t, it is about contorting the way you code for the benefit of writing better javascript that is easier to debug.

Why I Don’t Use Semicolons

| Comments

This weekend there was the latest of many outcries over the use of semi-colons. The problem came from twitter bootstrap breaking in jsmin due to a lack of semi-colons, fat saying that he wrote perfectly fine js, and that it is a bug in jsmin, followed by crockford declaring his code bad (in the way only crockford can), and that he wouldn’t bring jsmin down to the level of supporting code that bad.

The story hit hacker news, twitter, the irc, and probably other places I don’t follow, and caused quite a big deal of nerd rage over how someone could justify not using semicolons in javascript. The next day, Brandon Eich weighed in, and basically said that automatic semi-colon insertion wasn’t done properly, if he could go back in time he would have made them fully optional, but since they aren’t optional in all circumstances, you should probably use them all the time. This basically threw fuel on the fire, and kept the rage going on for another day or so.

The Problem With ASI (Automatic Semicolon Insertion) in Javascript

It really depends on who you ask. If you talk to Crockford, he will say something about how the spec is mystifying and the rules are obtuse so you should just always use them. If you talk to someone else who has read the spec, they will tell you that the spec is pretty clear, and well implemented across browser versions. There is really one place where it does not work they way one would expect.

Basically javascript treats whitespace and newlines for brackets the same across all the different types of brackets so that means both of the following are valid js syntax

1
2
3
4
5
6
7
8
if (foo === 1) {
  doSomething()
}

if (foo === 1)
{
  doSomething()
}

That is pretty much as one would expect right? The problem comes with the following

1
2
3
4
5
6
7
8
9
doSomething(1) // is the same as…
doSomething
(1)

// and…

listOfFoo[0] // is the same as…
listOfFoo
[0]

Why anyone would write that into a language is beyond me, because deliberately writing code like the previous examples is absolutely terrible. But apart from aesthetics, where this will get you into trouble is a case like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function wtf() {
  console.log("this function doesn't actually return anything")
  console.log("So when you call it, it evaluates to undefined")
}

wtf()
(function(){
  // this pattern is known as an immediately invoking function definition
 // it is mostly used to introduce local state

  var foo = 1

  SomeNamespace.funcName = function(){
    console.log(foo)
  }
}())

The above is a semi-realistic scenario where asi will punch you in the face in javascript. Basically, the intent is to call wtf(), and then make your immediately invoking function. What actually happens is that wtf gets called, returns undefined, and js attempts to call undefined as a function, passing in that inner function as an argument.

Now, getting undefined is not a function would be pretty confusing in that, and very hard to debug, especially if you don’t know the js quirks around ASI.

So, thats the problem, how do we deal with it?

There are several schools of thought.

  • One is to not actually explain the real issue, and give some handwaving and muttering about the inconsistencies of ASI, and how you should never rely on it.
  • Another is to learn the reason that bug occurs, and put a semi colon at the end of every line, whether it needs it or not
  • A third is to prefix a line when this issue will occur with a semi-colon, and omit them in all other instances, since you know it is safe to do so.

The first is the attitude taken by the vast majority of the javascript community. The second and third are taken by a few, more advanced developers. The third however, is shunned, and causes terrible arguments by people who take the first way of dealing with it.

Why all the hate?

If you write java code all day, and are appending ; to the end of each line whether it needs it or not, I can totally see it making sense to carry that tradition of needless ceremony over in javascript. If you come from any other language that supports ASI properly, I cannot understand how, after learning the facts about what the issue is, come to the conclusion that appending a redundant character to the end of 99.999% of your code is the way to solve a problem that comes up 0.001% of the time.

But my attitude is live and let live, if you want to do something I don’t get, I really don’t mind if it genuinely helps you avoid bugs. I would suggest wrapping all statements with parens, since sometimes they are needed and other times they are not (best to be safe!) and maybe append some //s to the end of your lines, just in case someone wants to add a comment later, it won’t cause syntax errors. If I saw code that looked like this

1
2
3
4
5
6
(var addStuff = function() {
  (var a = 1); //
  (var b = 2); //

  (return (a + b);); //
}); //

I would kind of be scratching my head, and wondering why the person did it that way. But at the end of the day, all of that other extraneous crap has exactly the same amount of reason to be there as the semicolons.

Whats so bad about a few semicolons?

I am going to talk about me, personally here, since there is a good chance that this stuff falls under the category of “Thats just the way Matts brain works”.

First, they are just noise. They have no reason to be there, and noise for me tends to fade into the background. After working with semi-coloned javascript for awhile, I don’t even notice the semicolons anymore. This is a really bad thing, because if I miss a semi-colon in the wrong place, I have a hard time linking the code I am looking at to that problem. By contrast, if I lead a line with a semi-colon in the one situation that matters, that stands out to me like a red flag. I can easily tell when its there and when it is not there.

Second, javascript is the only language I use that people even think about using semi-colons in. I spend most of my time in ruby, clojure, sass, html, and javascript. JS is the one difference with regards to ASI. That just exacerbates the problem for me, and makes it more likely I will miss a semi colon.

Third, I have zero problem remembering “Lead lines that begin with brackets with a ;”. This is a very simple rule, in a world where I have to remember that keypress is less consistent then keydown, hasLayout, how to vertically align things in css, and why onchange isn’t working when I programmatically change something in i.e.. You could make an argument that “Add a semi colon to the end of each line” is an even easier rule, but for me its not (refer to the previous two points for why).

Lastly, both in this post and in terms of importance, semicolons are ugly. They make an already verbose language just that much more verbose, and I actually care about things like aesthetics in code. This reason is far less important then the previous three, but for me it is just one more reason to not use redundant semi-colons

The hate

But hey, if you want to use them, I am totally fine with that. If I contribute to your project I will use them too, and if I miss one and you let me know, I will happily fix it. I have a canned regexp replace in emacs for scanning a file for possible semi-colon omission, and I dutifully run it before committing to someone else’s project.

In my experience, this is pretty common for the no-semicolon crowd. But for some reason the attitude of the semi-colon mafia is that of violent and vitriolic hatred. It is not ok that I choose a different way then them to deal with this problem, it is not ok that I talk about it, and try to educate people as to why they are doing this practice in the first place, and it is really, really not ok for me to expect to be afforded the same courtesy I offer them.

Why?

I ask myself this every time one of these arguments happen. I genuinely think that it is a lack of knowledge that causes this knee-jerk reaction, but actually imparting that knowledge usually changes nothing. It is a very big mystery to me, and I have seen others in the js community express similar bafflement (including Isaac, one of the leaders of the node community, node developer and creator/maintainer of npm)

Can’t we all just get along?

I would love, just absolutely love to hear a good reason why you need semi-colons everywhere. I don’t buy that javascript developers can’t remember that simple rule, because they deal with far more complex rules every day. I don’t buy that it is a tool issue, because every tool of the current generation has no problems with ASI. I don’t buy that it is a fundamental flaw in ASI, because I use many languages that I don’t write semi colons in, and never seem to have issues with them.

So please convince me. Politely, and in the spirit of coming to an understanding. I don’t understand why you make the choice you make, but I don’t think you are terrible for making it. Github has plenty of great developers, they ban semis. Thomas Fuches has been a community leader since before there really was a community, he doesn’t use them. Isaac is one of the leaders in the node.js world, and a great js developer, he shares most (if not all) of these opinions. Even if you can’t convince me, at least come to the same conclusion that I have on this issue, for whatever reason there are good developers who seem to completely abandon common sense around semi-colons, accept that and move on.

Awesome Emacs Plugins: CTags

| Comments

I wanted to write a series of posts on awesome emacs plugins I use, since I have put a lot of time and effort into my emacs configs. The funny thing I find about emacs though is that there is such a massive amount of functionality already provided, most neat things plugins do is augment stuff that is already there. So I think most of these posts are going to be a third about emacs, a third about a plugin, and a third about the glue code tying them together :)

Code Tags

The purpose of tags is to parse a codebase, and provide information about its structure (mostly for the purposes of navigation). There are many tools used to create tag index files, emacs even ships with one called etags.

Tags in Emacs

Coming from vim, one of the things I found was that emacs tag handling was inferior to vims for some reason. Most of the time, when I would do a c-] in vim I would land exactly where I would expect to. In emacs, I would find I needed to jump through the matches far more often to find what I wanted.

CTags

One difference was the tagging program. Vim uses something called exuberant-ctags, while emacs uses something called etags. From what I can tell, for the languages I use (javascript and ruby mostly), exuberant-tags does a noticeably better job.

Thankfully, ctags actually supports the format emacs is expecting, you just have to pass a -e argument. I only had to slightly modify my normal ctags command, and I had

1
ctags -e -R --extra=+fq --exclude=db --exclude=test --exclude=.git --exclude=public -f TAGS

The last thing I want to do is have to jump to a terminal and type that out, so I wrote this quick little function in elisp to do the heavy lifting for me

build-ctags
1
2
3
4
5
6
7
8
9
10
11
12
13
(defun build-ctags ()
  (interactive)
  (message "building project tags")
  (let ((root (eproject-root)))
    (shell-command (concat "ctags -e -R --extra=+fq --exclude=db --exclude=test --exclude=.git --exclude=public -f " root "TAGS " root)))
  (visit-project-tags)
  (message "tags built successfully"))

(defun visit-project-tags ()
  (interactive)
  (let ((tags-file (concat (eproject-root) "TAGS")))
    (visit-tags-table tags-file)
    (message (concat "Loaded " tags-file))))

That may be a bit confusing to people unfamiliar with elisp, so I’ll walk through it

1
2
(defun build-ctags ()
  (interactive)

This part means “Make an elisp function called build-ctags, and mark it as interactive so that it can be invoked via m-x”

1
2
(let ((root (eproject-root)))
    (shell-command (concat "ctags -e -R --extra=+fq --exclude=db --exclude=test --exclude=.git --exclude=public -f " root "TAGS " root)))

This means “Make a variable called root that is the result of the eproject-root function” eproject is another library I will cover some other time, but one thing it gives you is a function that returns the root path of the current project. You could just as easily replace it with (rinari-root) (if you use rinari for rails projects) and it would work just as well. I will assume shell-command is self explanitory :)

1
(visit-tags-table)

The last piece just means replace the currently loaded tag file with whats on the disk.

With that function, my navigation became a bit more comfortable in emacs, but I still found sometimes emacs would bring me to really strange places. After a bit of research, I found the incredibly obscurely named tags-case-fold-search variable. From the docs:

Documentation: *Whether tags operations should be case-sensitive. A value of t means case-insensitive, a value of nil means case-sensitive. Any other value means use the setting of `case-fold-search’.

Setting that to nil helped immensely.

etags-select

Now for the actual plugin :) etags-select If there is a single result, you jump straight to it, but if there are multiple results, it will pop up a window showing them all. n will go to the next match, p to the previous, and enter will select the current result and jump to that line.

I made another small command that I could bind to

1
2
3
4
5
6
7
8
(defun my-find-tag ()
  (interactive)
  (if (file-exists-p (concat (eproject-root) "TAGS"))
      (visit-project-tags)
    (build-ctags))
  (etags-select-find-tag-at-point))

(global-set-key (kbd "M-.") 'my-find-tag)

That function will check if the tags file is there, if it is, read it, if not, build it, then run that plugin function etags-select-find-tag-at-point.

To invoke it, put the point on a symbol and hit M-.

.ctags

Last thing, for some better ctags support for rails, and support for OO javascript, add this to your ~/.ctags

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
--regex-ruby=/(^|[:;])[ \t]*([A-Z][[:alnum:]_]+) *=/\2/c,class,constant/
--regex-ruby=/(^|;)[ \t]*(has_many|belongs_to|has_one|has_and_belongs_to_many)\(? *:([[:alnum:]_]+)/\3/f,function,association/
--regex-ruby=/(^|;)[ \t]*(named_)?scope\(? *:([[:alnum:]_]+)/\3/f,function,named_scope/
--regex-ruby=/(^|;)[ \t]*expose\(? *:([[:alnum:]_]+)/\2/f,function,exposure/
--regex-ruby=/(^|;)[ \t]*event\(? *:([[:alnum:]_]+)/\2/f,function,aasm_event/
--regex-ruby=/(^|;)[ \t]*event\(? *:([[:alnum:]_]+)/\2!/f,function,aasm_event/
--regex-ruby=/(^|;)[ \t]*event\(? *:([[:alnum:]_]+)/\2?/f,function,aasm_event/

--langdef=js
--langmap=js:.js
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/,object/
--regex-js=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/,function/
--regex-js=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\(([^)])\)/\1/,function/
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/,array/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^"]'[^']*/\1/,string/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^']"[^"]*/\1/,string/

--exclude=*.min.js
--exclude=.git

Navigating with Emacs

I find my experience now much better then it was before, but there is always room for improvement. Any comments, criticisms, or tips that I am missing would be hugely appreciated :)

Why I Like Object#tap

| Comments

In a recent Destroy All Software screencast, Gary mentioned how he really doesn’t like Object#tap. He was using it in this sort of context

1
2
3
4
5
6
7
8
9
10
class StoreCache
  def self.for_term(term)
  begin
    CachedScore.for_term(term)
  rescue CachedScore::NoScore
    RockScore.for_term(term).tap do |score|
      CachedScore.save_score(term, score)
    end
  end
end

He said he didn’t understand why people like that syntax so much, when you could just as easily do

1
2
3
4
5
6
7
8
9
10
class StoreCache
  def self.for_term(term)
  begin
    CachedScore.for_term(term)
  rescue CachedScore::NoScore
    score = RockScore.for_term(term)
    CachedScore.save_score(term, score)
    score
  end
end

with the differences being that the name of the variable is on the left side, and the return is more explicit. I sort of get where he is coming from, but I would not use tap that way.

What Object#tap means to me

I think he (and many others) see Object#tap as meaning “fancy method that give me a 1 character placeholder variable and implicit return”. I see tap as meaning “tap into the object initialization”, or more practically “This entire expression is related to object initialization.”

Typically, I wont use tap unless there is a high degree of locality, and you are talking about left-side = right-side type code. Something like this

1
2
3
4
5
6
def build_foo
  Foo.new do |f|
    f.bar = "Hi"
    f.baz = "Baz"
  end
end

Building out values on an object is an incredibly common pattern that is logically a single thing. Visually, tap is grouping the code for that pattern. Also, I find it reduces density in a place where the additional verbosity really doesn’t add anything in terms of clarity. At work, we are still using 1.8.7 ree, so when we need ordered hashes (often as identifiers for keys on objects), we have a lot of code that looks like this

1
2
3
4
UNIT_OF_MEASURES = ActiveSupport::OrderedHash
UNIT_OF_MEASURES[1] = "Eaches"
UNIT_OF_MEASURES[2] = "Cases"
UNIT_OF_MEASURES[3] = "Pallets"

I think the move from that to tap style is a significant improvement

1
2
3
4
5
UNIT_OF_MEASURES = ActiveSupport::OrderedHash.tap |uom|
  uom[1] = "Eaches"
  uom[2] = "Cases"
  uom[3] = "Pallets"
end

The last thing is the fact that its a single expression. I love implicit returns in ruby where your entire method is a single expression, it feels kind of lispy. Something like this

me likey
1
2
3
def foo
  some_predicate? ? "Hi!" : "Bye"
end

However, I am really not a fan of implicit returns when you just end a function with a bare word. If you are writing imperative style of code, I think each statement should actually be a statement that says what it does. Something like this just sort of feels like a mis-use of a language feature.

ugh
1
2
3
4
5
def foo
  thing = build_thing
  thing.some_method
  thing
end

This is something that I think falls squarely into personal style. But because of how I enjoy writing more expression oriented code, having an expression for a common pattern is a big plus for me.

Another interesting thing to note is that in rails-land, it is very common to use hash initializers for this kind of thing. Something like this

1
2
3
Post.create! author: current_user,
             published: true,
             category: "some-category"

While that syntax is very minimal, I actually prefer the Object#tap style of api, because I find it gives a clearer separation between plain old method arguments, and object initialization.

1
2
3
4
5
Post.create! do |p|
  p.author = current_user
  p.published = true
  p.category = "some-category"
end

Not Hatin On Gary

The dude is awesome, and everyone who is a professional ruby developer really should subscribe to his podcasts. IMO the guy is a master of OO, and his screencasts are far more valuable then 10$ and 15mins of your life per month.

Awesome Emacs Plugins: Evil Mode

| Comments

I want to do a series of posts on some of the cool emacs plugins I use. Before I do that though, I want to talk a bit about why I use and love emacs. The saying “Care about the code, not the tools” is an anathama to me, it is like “Care about breathing, but don’t worry about drinking”. Breathing is incredibly important, I agree, but consuming liquid regularily is pretty high up there on the list too! Anyways, this is my journey through tooling with working with code. This post is going to be a story of epic proportions, with very little “hard” content, but I plan on doing more posts that are more focused on the awesomeness of emacs plugins.

The Dark Years: Integrated Development Enviornments

I used IDEs for years, and while I appreciated the power, there was some things missing. The first thing was even with plugins, the barrier to customization was quite high. I love solving problems with code, and while solving other peoples problems is a fun and interesting (and profitable) endevour, solving your own problems is usually far more satisfying. Second, they are by their nature, built for a specific language and platform. Lastly, they are quite slow, and require quite a bit of resources. At my last job, it would take almost 7 minutes to go from a reboot to everything up and pointing to the right things. Now, even with a boatload of scripts, emacs loads in about 1s.

The Cult of Vim

When I started doing rails work, I started taking vim more seriously. Vim gave me the speed, and the custommizability. I quickly crafted a set of fairly elaborate configs where everything was exactly how I liked it. But beyond that, I discovered what a joy modal editing is. The best way I can describe it is a programming language for editing. The ability to think of fragments of code as objects that i can perform functions on is a wonderful and freeing thing. Once it becomes second nature, it feels like I am talking to my editor in a high level way with my fingers. A nice side effect is that my hands never leave the keyboard. It is highly efficient, but efficiency isn’t even the biggest benefit, it is a joy thing. I find it much more enjoyable to edit code with modal editing.

Eventually you reach a point where you want to be writing your own stuff in vim as well, and you have to start learning vimscript, which is possibly the most terrible language ever conceived of. Vim is awesome, vim plugins are awesome, but vimscript is just one big WTF. More then that, I was starting to get into lisp in a pretty big way, and you can’t compare vim experience to the emacs experience. I also was sort of frustrated by the lack of ability to show the output of an external process in a seperate buffer, especially since I do TDD.

The Light of Emacs

The old arguments between vim and emacs focused on speed, but when you are comparing either to eclipse, the amount of speed difference beween vim and emacs now is unnoticable. Same deal with resources, emacs is sitting at 250megs right now, which is more then vim, but a small fraction of chrome. That brings us to the strengths, which for emacs is it does pretty much everything vim does, but better, except for the act of actually editing text. It also does way more then vim can do, some of it quite unique (org mode) useful (regexp-builder) or suprisingly powerful (calc).

The other major thing is elisp vs vimscript. I have a strongly passionate (bordering on irrational) love of lisp, so for me it was not even worth talking about, something lispy vs a really terrible dynamic imperative language, I will choose the lispy thing every time. elisp is far from the greatest lisp out there, but compared to vimscript it is wonderful. There is also a philsophy in emacs that emacs is a platform, with an editor implemented in it. Vim philosphy is vim is an editor that you can script if that makes you happy. Very different, and since I have such a dedication to my tooling, I definately appreciate the emacs side.

The thing that was always a sticking point for me was modal editing. As much as I love modal editing, I hate emacs editing. There is something about the way my brain works that makes it extremely hard to remember key chords. I can happy do ”ci'blahfT;;.” and have no problems, but c-c m-x t will leave my brain after about 10 minutes of not using it.

After some time lusting over emacs and being defeated by its keybindings, I read about viper-mode, which was a vim emulator. I tried it out, but after trying ci" and having it not work, I realized it wasn’t enough. I poked around the internet a bit, and found out about vimpulse, which promised much more in terms of vim emulation. I actually went a day on vimpulse before switching back, there were just too many inconsistancies and other small things that were missing, or not working the way they should.

Evil.

I went a few months, sort of keeping taps on emacs, but not really expecting to be able to switch. Eventually I heard about evil, and I eagerly installed it. Wow. After trying vim emulators in many other editors, which range from the level of viper mode to a bit under vimpulse, evil was like a light in the darkness. These guys really “get” vim, and are serious about re-implementing it. I have been using it for a few months every day, and there are only a few things missing.

  • it ignores punctuation. if you have function foo(){}, and your cursor is at the end of the line, in vim db will leave you with function foo, in evil, it leaves you with function. This is because vim will treat punction as a word, while evil does not. I still find this regularly frustes me.
  • there is no :g. There are a ton of : commands missing, but I find the lack of :g the thing I really miss, since most of the other stuff is covered by emacs functions
  • the b text object is broken in js2 mode. I use js2 a lot, love it, and consider it one of the reasons as a web developer to switch from vim to emacs.

Evil for Vim users

First, in emacs, next line/prev line is c-n, c-p. The reason you need to know this is because there are other emacs major modes that have nothing to do with editing, but will usually use c-n/c-p or n/p as the method to navigate from the next and previous item.

In vim, it applies scripts via various events. Combing two things together means running one event, followed by the other event. In emacs, you are composing modes. Every buffer in emacs has a single major mode, that defines what type of thing you are doing. A major mode could be “ruby”, but it could also be an email client. Emacs is the platform, and a major mode is the application. You also have minor modes, which add additional features to a major mode. I have a plugin called autopair, that will keep delimiters balanced (I use delimMate to do the same thing in vim), that is a minor mode. So your editing experience is basically defined by composing a major mode with multiple minor modes.

This is a really cool thing, if you write vim plugins, usually you see a block where they have to unregister all sorts of things they registered if the filetype changes, that is a non issue in emacs, because nodes are self-contained entities. The bad thing is in the age of web development, we have to deal with files that have multiple major modes (like erb, which can have html, css, ruby, and javascript). This is probably the biggest weakness of emacs at the moment, it was designed with a single major mode per buffer, so you cannot compose major modes. There are many attempted hacks around this, but IMO they are all terrible.

M-x in emacs means “Execute command”. All the commands will be available there. M-x describe-key is extremely useful, you type that then hit a key, and it will tell you what function it is bound to. Once you have that, you can do M-x describe-function and enter that function name to view its documentation.

Lastly, if you are a vim user, you need to know how to bind keys. The idea in emacs is instead of registering keys at a global level, each mode has a keymap that is self contained. So when you compose your modes, you are also composing keymaps. the syntax for keymaps is

keymaps
1
2
3
4
5
6
7
8
9
10
(define-key <keymap> key 'function)

; evil defines 3 maps for the various states, so to
; replicate a mapping I had in vim - nmap Y y$

(defun copy-to-end-of-line ()
  (interactive)
  (evil-yank (point) (point-at-eol)))

(define-key evil-normal-state-map "Y" 'copy-to-end-of-line)

That was a custom function definition, but you can also use any built in function after the '.

Everything in emacs I know, I know by its function name. If I use it a lot, I will map it to a key, so I have pretty much completely ignored the real keybinds.

Evil for Emacs Users

I honestly don’t think I am qualified to write this, because I was never a real emacs user. But if you were ever curious about why people go on and on about how awesome vim is, evil will pretty much give you that experience, and you can have it without leaving the editor you already know and love. I believe evil mode is more effective then traditional emacs, but even if it isn’t I am pretty sure it is more enjoyable. I would strongly encourage people to give it a chance for a month or so with a “Learn vim” tutorial, and see how they find thinking in text objects. We are definately in YMMV territory, but I find it a joy. You may find this post more tailored to your point of view

Repository Pattern in Rails

| Comments

I have been working a lot on an app using MongoDB as the datastore, and Mongoid as the OR/M (or ODM to be more specific). In a relational database, you keep your data as segregated as reasonably possible, and then join it together in appropriate ways when you need it. The up side to this is that it is incredibly flexible, and chances are you wont hit a situation where you need your data in a form that your datastore can’t give to you. The down side is that often, your data has a “natural” way that is joined together, and even though 99.9% of the time you are joining it together in that way, you still pay the cost every time.

In Mongo, your data is stored in that “natural” way in a json format. That means it is harder to shape the data in different ways, but it is free to get the data in the way it is intended to be used.

The Problem

This app was developed using a state-based testing approach, where every test sets up a situation, performs an action, and then asserts on the new state of the world. An interesting side effect of the Mongo way of storing data is that it makes it much harder to test smaller objects in isolation – if a comment is a hash in an array in a post, it is impossible to save without first saving the post. In more complex scenarios, the problem gets much worse, and I am finding that tests that should be rather simple are requiring far more setup then I would expect.

A Solution: The Repository

When it comes to data access, the book Domain Driven Design advocates using a repository layer that separates your domain objects from your data access strategy. This has several benefits:

  • Your domain objects stay simple. Rails developers tend to follow the “Thin controller, fat model” heuristic fairly religiously. There is nothing wrong with that per se, but it sort of implies it is ok to have massive classes that do dozens (if not hundreds) of things, so long as it is the model. The problem with that is that as the complexity of the application grows, these “God” models tend to become harder and harder to maintain – everything interacts with them, and they interact with everything. That kind of situation is what causes even small changes to cause ripples through your entire applications, and makes even simple maintenance tasks become quite daunting.
  • You segregate your interaction with third party code (ActiveRecord, or in my case Mongoid) from the rest of your application. You may say “Why is that necessary when you rarely, if ever change your data storage strategy?” The reason is that you don’t have control over that code, it is managed by a third party. So if they change something, and you are calling their code directly all through your application, your entire application needs to change. I work on quite a large enterprise rails app, and the 2.x -> 3.x upgrade was a huge undertaking, mostly for this reason.
  • By following the Single Responsibility Principal, mocking in tests is a joy. This is going to address the pain I am feeling doing state based testing with a document datastore, and I believe that mockist testing will directly address these problems (I also much prefer mocking, so it is not exactly a direction I am resisting)

First Challenge: Mapping

I have worked on systems in C# with manual object mapping in repositories, and it is a bit of a nightmare. You end up with hundreds of lines of right_side.property = left_side.property code, which apart from being horribly tedious to write, is actually a terrible source of hard to find bugs. Since we don’t have a true data mapper library in ruby, the only way I will even attempt this is if I can automate the process through some clever meta-programming. The first part of that is that I need to be able to retrieve attributes from my domain objects in a simple way, without complicating the objects too much. I also need to be able to get the “schema” out of my model, so I am able to infer what it wants to save and retrieve.

This is going to be a lot of work

The app is currently ~1k loc on the server side, which means it is going to take some time, but is really not an insurmountable task yet. The first step will be figuring out how to tackle the domain model side of the mapping problem, and extracting a domain model out of the current mongoid model. Stay tuned for more!