Sunday, August 8, 2010

Ways to modularize your Ruby code

In this post, I'll recommend several ways to properly organize code in Ruby projects. I'll also explain my reasoning and how I arrived at each solution.

Much of this builds on the excellent work of others in the Ruby community, and I've linked to these other writeups as appropriate in case you want to know more.

This subject is of particular interest to anyone who is packaging a gem library, but it should be handy to anyone who wishes to organize code within any Ruby project.

Namespacing with modules

Let's say you've got an isolated piece of functionality that doesn't depend on anything else, such as a simple key (random string) generator. You intend to always call it with its full namespace, so you don't get it mixed up with similarly named functions. To make something that you can call using Rigatoni::KeyGenerator.generate_key(n), you'd use the following code.

module Rigatoni
module KeyGenerator
def self.generate_key(key_length = 5)
puts "generate_key() called with length #{key_length}"
end
end
end

Note that I defined the function as self.generate_key() — the self keyword is crucial.

That's a subtle but critical difference. If I didn't include the self,
I'd have to include Rigatoni::KeyGenerator first and then run the
generate_key() function, which isn't what I want to do.

If I want to save myself a little typing while having some semblance of a namespace to qualify my method call, I can do this:

include Rigatoni
KeyGenerator.generate_key()

This is the best way to modularize a self-contained piece of Ruby code that's meant to be called independently.

Extending functionality through modules

The most common way in which I've seen Ruby modules used is to extend the functionality of existing classes. This is where they're used as mixins to extend a Ruby class.

Even then, there are two ways to extend the functionality of a Ruby class with modules: instance methods and class methods. By default, including a module in your class definition will give you new instance methods.

If you want to define class methods in a module, you have to jump through some extra hoops.

module Moo
module Ham
def self.included(base)
base.extend(ClassMethods)
end

module ClassMethods
def foo()
puts "foo called"
end
end

def bar()
puts "bar called"
end
end
end

class Bacon
include Moo::Ham
end

The module we define above is Moo::Ham. We have a dummy class, Bacon, which includes Moo::Ham. It includes both class and instance methods, which we can run with the following example code.

# Class method.
Bacon.foo

# Instance method.
b = Bacon.new
b.bar

This is a longtime Ruby idiom, and John Nunemaker unpacks this in his post, Include vs. Extend in Ruby.

Namespaced classes (for state-dependent modularity)

In the examples up until now, we've dealt only with methods that could be run independently without needing something already in place.

Let's say all you're using to modularize your code is Ruby modules. Then you start to notice that a lot of the methods have the same argument passed in. Either that, or you find yourself setting up or populating some variable again and again in order to perform the work.

module Moo
module Ham
def some_func_foo(access_key, api_key, x)
# Set things up with access_key and api_key.

# Perform work with x.
end

def some_func_bar(access_key, api_key, y)
# Set things up with access_key and api_key.

# Perform work with y.
end

def some_func_baz(access_key, api_key, z)
# Set things up with access_key and api_key.

# Perform work with z.
end
end
end

When you start to notice these things, it's time to turn your module into a class.

module Moo
class Ham
def initialize(access_key, api_key)
@access_key = access_key
@api_key = api_key

# Do other stuff here to set up what you need.
end

def some_func_foo(x)
# Perform work with x.
end

def some_func_bar(y)
# Perform work with y.
end

def some_func_baz(z)
# Perform work with z.
end
end
end

This keeps us from duplicating code. Note that we end up having to instantiate a class because the methods depend on the initial setup work being done, but our code is leaner and meaner this way.

# Old way. Gross.
Moo::Ham.some_func_foo(access_key_one, api_key_one, x)
Moo::Ham.some_func_bar(access_key_one, api_key_one, y)
Moo::Ham.some_func_baz(access_key_one, api_key_one, z)

# New way. Nice.
mh = Moo::Ham.new
mh.some_func_foo(x)
mh.some_func_bar(y)
mh.some_func_baz(z)

The major takeaway is that modules are not the only way to modularize our Ruby code.

Handling dependencies

Other times, you'll have methods that aren't state-dependent and which don't belong in a class. But they'll have a different kind of dependency: on external gem libraries being present.

Say you have a module, Foo, which you define in a file called foo.rb.

# foo.rb
module Foo
ABACAB="abacab"

def print_foo
puts "foo"
end
end

Then say you have another module, Bar, which depends on Foo.

# bar.rb
module Bar
def self.bar
require 'foo'
include Foo

puts ABACAB
print_foo()
end
end

You try to be a good citizen by pulling in Foo only when you need it. So now you're ready to run Bar.bar() from another script, run_bar.rb.

# run_bar.rb
require 'bar'

Bar.bar()

But then you run it, and you get an error with baffling and mixed results. The line referencing the constant ABACAB ran perfectly fine; the call to print_foo() failed. Let's move the require and include of Foo outside Bar's module definition and see what happens.

# bar.rb
require 'foo'
include Foo

module Bar
def self.bar
puts ABACAB
print_foo()
end
end

When we run run_bar.rb again, we see that this works for us.

It seems a little wasteful to pull in Foo, but given the results of our little experiment here, we've got no choice: if we want to call methods in the Foo namespace, we've got to pull in Foo at the top — outside the module definition and outside the method definition.

Besides, if we're pulling in bar.rb, our real intent is to go after the full functionality that Bar provides. The functionality that Bar gives us depends on Foo anyway, and won't work at all in its absence. We're not really being wasteful.

Summary

There are two major ways to modularize our Ruby code: modules and classes. Despite the name, modules are not the only way to modularize. Use classes if the proper behavior depends on state.

Start with a module by default. If you find yourself creating too many methods that take in the same parameter again and again, turn your module into a class which populates the initial values when you instantiate it.

When a module depends on other libraries, pull these other libraries in at the top of the file, outside the module definition. Calls to methods in these libraries won't be found otherwise.

Wednesday, July 14, 2010

JavaScript: Checking for undeclared and undefined variables

Up until now, the JavaScript I've written has typically been in pursuit of a bigger goal. This means that I worked around language quirks on the spot and moved on. I never stopped to consider the nitty gritty details of the language for commitment to long-term memory, because I wanted to get a working end product.

Recently, someone asked me about the best way to check for undefined JavaScript variables. I found myself at a loss, which was alarming to me since doing these kinds of checks is immensely important, practical, and a frequent fact of everyday programming no matter what programming language you're using.

The fact of the matter is that most of the time, I know where my variables are coming from. Something like this would work:

    var x;

if (x)
x.some_method();
else
alert("we can't use x");

In the vast majority of cases where I had to perform a null check, I knew that the variable was declared somewhere because it was of my own making.

What if you're counting on something having been declared somewhere else and being there? Let's try this.

    if (y)
y.some_method();
else
alert("we can't use y");

In this case, you should get a ReferenceError thrown. You'll see this if you're using the Firebug JavaScript console or the developer tools in Safari or Chrome.

After much experimenting and looking around the web, here's my definitive, reliable, robust, cross-browser way to perform this kind of check:

    if (typeof(z) != "undefined")
z.some_method();
else
alert("we can't use z");

It also works for when you're the one in charge of declaring a variable, too.

    var w;

if (typeof(w) != "undefined")
w.some_method();
else
alert("we can't use w");

So that fixes our problem. But look closer, and you may have a couple of nagging questions.

Why in the world would you be referencing a variable you didn't declare? If we only ever had to deal our own code, we should always know where the variables that we're using are coming from. But that is not the case for many of us; we rely on using others' code, third-party modules, or frameworks all the time.

An easy example is if you want to send debugging output to console.log in Firebug. When Firebug is disabled, console isn't a declared JavaScript object. (Safari and Chrome have their developer tools more integrated so this problem doesn't appear.) It helps to first check if there's a debugging console available before you start writing to it.

You could also find yourself in a situation where you're expecting some object to be initialized because it's an external library. If it's loaded over the network, and loaded separately from the rest of your code, you'll want to recover gracefully if it's not available for some reason (like the remote server is down).

Why do we have to compare to the string "undefined" instead of the keyword? Because typeof() always returns a string.

    console.log(typeof(j));             // "undefined"
console.log(typeof(typeof(j))); // "string"

This seems too simple/short. Will it work in all web browsers? This will work in any modern web browser that supports JavaScript. Just to be thorough, I've personally verified that it works as expected in Internet Explorer 8, Firefox 3.6, Safari 4, and Google Chrome 5.0.

Checking for undefined variables in JavaScript in this way has been around for a long time, and several other sources suggest this method. I wrote this up mainly as an extended explanation for why things are the way they are, and to explicitly address quirks like typeof() always returning a string.

What if it could also be null in some cases? Then you add a check for null. I've limited the check to checking for "undefined" to keep the explanation focused, but yes, in reality you'll want to do an additional check.

    var k;

if (typeof(k) != "undefined" && k)
k.some_method();
else
alert("we can't use k");

The check for null is pretty straightforward; it's "undefined" that trips us up. Note that, as written, this code will only proceed to check for the null condition if k has been defined. This is because of short-circuiting: if the first condition is true, there's no need to evaluate the rest of the other parts of the conditions.

When it comes down to it, the key insight is that JavaScript has "undefined" variables in two senses: first, when they're not even declared, and second, when a variable has been declared but hasn't been assigned a value. This is when it really is undefined in the strict sense.

We also saw that typeof() will always return a string, which means you need to compare its result to a string when you're running a conditional test.

Friday, May 21, 2010

The status quo

Sometimes there are good but non-obvious reasons for keeping the status quo. Competing interests may have, over time, been balanced and counter-balanced to form a functioning ecosystem. And as with any ecosystem, reducing the functioning whole into its constituent parts in the foolish pursuit of extracting isolated benefits is an intractable problem.

Other times, the status quo represents a deeply flawed system, fundamentally broken at the core. Such a system is characterized by people in power whose actions are driven primarily by the overriding interest in preserving their own favorable position, to the recurring detriment of the less favored.

The tough part is looking at a situation and making the call as to which one of these models applies. In some cases, the status quo may be made up of both a well-balanced system with benefits to all, as well as a rotten portion entrenched for no good reason but the preservation of the holders of power. Fixing what's broken is a separate problem, and we should be concerned with fixing the problem only after we have correctly identified it.

Saturday, April 17, 2010

Slowing down as an immersive experience

Through my entire time as a student, I never used Cliffs Notes or SparkNotes in place of assigned reading. I made it a point to read every book I was supposed to, down to the last word.

The trouble for me was that everyone else who resorted to these "study guides" knew enough about the real books to do well in their classes. From the perspective of efficiently using my time to reach the objective of a good-enough, basic understanding for writing essays, discussing the books in class, and impressing teachers, I lost out.

I remember one book in particular. Crime and Punishment really broke a lot of people, many of whom were like me and had, up until then, insisted on reading the book and not the summary booklet. But I was determined. I was hellbent on savoring that book and milking it for all it was worth — calculus, biology, and economics be damned.

As expected, I ended up with the same general recollection of the book's contents as the more reasonable folks who resorted to the Cliffs Notes.

But what I remember most poignantly is the feeling I had while reading that book. I became immersed in it to the point that I'd feel the cold sweat, delirium, and ever-present sense of dread that hounded Raskolnikov (the protagonist in the story) as he ran from the authorities and lived each day with the burden of his guilt.

I highly doubt that it was even possible for anyone who read the Cliffs Notes to experience that.

Whenever I approach any new text, it's in pursuit of that kind of total immersion. I know that I'm not going to remember every detail of what I read, but my ultimate takeaway from all the things I read is not the knowledge to be gleaned, as if I were some kind of one-man strip mining operation for facts. It's in putting myself in a position to be shaped and influenced; I want to see how the mind of another person works by temporarily forcing my mind into the mold of their thought process.

In order to do this, it's absolutely necessary to slow down.

Slowing down allows real life to happen, as events inject themselves into my reading experience. If I take a long enough and serious enough text and stew over it, its applicability quickly becomes apparent when I frame its ideas in the context of whatever I happen to be dealing with in my life.

On the flip side, the shortcomings of a text also make themselves apparent when I slow down and let life happen in between. The opportunity afforded by this perfect setup allows me to weed out and carefully qualify newfangled notions, because when it comes to novel and interesting ideas, I tend to give them the benefit of the doubt and adopt them a little too eagerly. A tempering influence helps, and a tempering influence provided by direct observation is the best that anyone could ask for.

Slow down to savor the richness of a text, and make time for ideas to run up against real situations. That way, you'll get to see how valid these ideas really are.

Thursday, February 11, 2010

JavaScript: Variables in regular expressions

Often, you want to look for a particular pattern within a string. Let's say you know that you want to look for the string "revenue" inside a given string.

function match_string_for_revenue(string_for_searching)
{
return string_for_searching.match(/revenue/gi);
}
var our_string = "We are looking for ReVeNuE.";
alert( match_string_for_revenue(our_string) );

You can store the pattern as a regular expression in its own JavaScript variable, which makes your code more readable and its intention better known.
function match_string_for_revenue(string_for_searching)
{
var pattern_to_look_for = /revenue/gi;
return string_for_searching.match(pattern_to_look_for);
}
var our_string = "We are looking for ReVeNuE.";
alert( match_string_for_revenue(our_string) );

But what if you want to generalize this matching and be able to search for any given pattern? Can you make it vary based on a parameter given, instead of keeping it in the code?
/* NOT GOING TO WORK */
function match_string(string_for_searching, valuable_substring)
{
var pattern_to_look_for = / + valuable_substring + /gi;
return string_for_searching.match(pattern_to_look_for);
}
var our_string = "We are looking for ReVeNuE.";
alert( match_string(our_string, "revenue") );

Well, that didn't work. What happened? That last alert() should have given you a null, which means that it wasn't able to match the pattern as expected.

Are we stuck? No. We just have to work around this using eval().
function match_string(string_for_searching, substring)
{
eval("var pattern_to_look_for = /" + substring + "/gi");
string_for_searching.match(pattern_to_look_for);
}
var our_string = "We are looking for ReVeNuE.";
alert( match_string(our_string, "revenue") );

This method will also work with Ruby and any other language that gives you the option to evaluate the contents of a string as code in that language.

Sometimes you'll find yourself having to generate code dynamically from a given string because there's no other way but to write real code in that language in a way that goes beyond merely passing in a string. If the language doesn't support having a string in there during the course of its normal operation, that's where eval() can come in handy.

Important Update (July 14, 2010): It's important, as Matt Austin points out, to consider the security implications of using eval(). See his comment below. Additionally, coderrr rightly pointed out that there's a more proper way to do it using RegExp:
function match_string(string_for_searching, substring)
{
return string_for_searching.match(new RegExp(substring, "gi"));
}

var our_string = "We are looking for ReVeNuE.";
alert(match_string(our_string, "revenue"));

This is the way I'd recommend doing it in the future. The specific use of eval() that triggered my idea for this post was free from the security concerns Matt raised, but seeing as how this post describes general usage, please use eval() sparingly or consider security implications of the kind Matt described.

Tuesday, February 2, 2010

The costs of configurable settings in your web application

By and large, the received wisdom when it comes to configurability and flexible settings for web applications is that more is better. Making something configurable frees up the engineer from having to make mundane (but necessary) changes that can be left to other people. It's desirable to change the state of software without having to rebuild or redeploy the code. There are a lot of obvious benefits to making software configurable, and these benefits are plain to all.

What I'm going to focus on here are the costs of configurability, based on my experience and observations over five years of writing web applications full-time. These costs apply to other kinds of software beyond the realm of web applications. I just choose to limit the scope of discussion to web apps because these costs manifest themselves the most when there's a fast development cycle and rapid iteration.

The following is intended to be a comprehensive, itemized breakdown of the costs of making an application setting configurable by the user. By itemizing and breaking these out separately, my intention is to make it easier to count the costs. For any given web application, it will be easier to see which of these apply, and to what extent.

The up-front cost of developing it as a setting. It comes as no surprise that the up-front cost is higher and requires more time than hardcoding, and is thus the most immediate and salient concern for the lazy programmer. If you're treating it as an investment and the setting is important, then of course it will pay off later. But if you're moving quickly and pushing out new features every week or even several times a week, the up-front costs of the things you're doing will be very important.

The cost of developing visual flexibility to accompany a setting. If your setting involves anything that has an impact on the visual presentation of any pages, be sure to consider that. For example, if you want to support different widths for a column of text, it is probably in your best interest to delineate discrete acceptable values rather than allowing continuous values. You'll know how to handle widths of 100, 150, and 300, for example, and you'll have to test them out to see how they look, but you probably don't want to allow just any width. Other situations may warrant a limited range on values; these values can be continuous, but restricted within a certain range. For example, if you have a setting for the maximum length of a summary text field, you'll probably want to test how your layout handles the minimum and maximum lengths of the summary text.

The cost of attention when sifting through all the other configurable settings. Even if it's relatively easy to change a setting, your love of settings probably means you've got even more settings. If you take the simple approach in laying these out in a list of settings for you to scroll through, there's a cost you incur. The cost of paying attention to this setting in the midst of all your other settings is one reason to stop and think whether you really want it to be configurable, but it's also one way you can force yourself to be more discerning and make only the most critical things configurable.

The cost of informing all stakeholders. This becomes more important as the size of your organization grows. The part to be concerned about here isn't in letting everyone know just that the setting exists or where to find it; that's a simple matter and thus negligible. It's in telling everyone involved and making sure that they understand what the setting does, whether it has any side-effects, valid values for the setting, and whether it's a critical setting that should be changed with caution. It's surprising how easily people forget, especially if it's a rarely used setting.

The cost of determining a default fallback value. How failsafe do you want your code to be? Chances are you'll have some situations — such as bringing up a greenfield installation, restoring from backup, and database unavailability — when you won't have access to the setting. If the setting isn't available for some reason, what kind of default value do you fall back to and how do you proceed? Will this default continue to hold true and be valid as your software matures?

The cost of abstraction and the cognitive load associated with translating it to concrete results. To make some things configurable, it may be necessary to generalize and turn the setting into an abstraction. Take the problem of content management functionality on your custom webapp. You may end up having to manage embedded Flash SWFs, images, simple blocks of text, or any combination of the above. Many content management systems call these "page elements" to encapsulate their full versatility, but this kind of abstraction is another step away from merely "image block" or "text block." What's the URL for the "graphical element" — in other words, the URL for the image? In using a general tool for a specific case, you end up having to refer to things in the most general terms first and translating them into their specific names, rather than calling them just what they are. This costs you time. Time is money, and it adds up.

Now, given these costs, what is one to do? In software engineering and in politics, we resort to compromise. For example, we could hide a setting from users so that only developers or sysadmins can change it. This way, you maintain the configurability that you want and avoid many of the unnecessary costs listed above.

It may also help to know the nature of each setting: are the acceptable values discrete or continuous? If they're discrete, such as officially supported sizes for visual elements, you can impose restrictions on values that you accept so that it's harder for the users of your application to shoot themselves in the foot.

Ultimately, in any discussion of costs, no matter how focused and limited in scope, we must remember to consider the benefits. Neither the costs nor benefits considered alone give us a complete view of any situation, and each must be considered to give context to the other. The difference is that costs are difficult and unpleasant to pinpoint and isolate. They don't receive the attention they're due, but who could blame us? We prefer to look on the bright side and think instead about reaping benefits. We enjoy watching our projects and our companies grow. But it's by being aware of our costs and knowing what hampers us that we free ourselves up to build more of what we want.

Wednesday, December 16, 2009

The reasoning behind Lumberjack

So why, in 2009, in a world of RIA frameworks, web-based applications, and a wide variety of blogging engines to choose from, would I write a desktop application in Java targeted exclusively for one company's proprietary blog platform?

First of all, it was tempting to write this as an Adobe AIR application. It would have fit my requirement that it be cross-platform and run as a desktop application, but I've never written anything with Adobe development tools before. Given the limited time I had on weekends to work on it, I wanted to get something written as quickly as possible rather than spending all my time learning a new platform. With Java, I could just hit the ground running, and it was just a matter of referencing the Swing-specific documentation. It boiled down to what was expedient and familiar because it would allow me to build something quickly.

With respect to the issue of making this application web-based, the main point is that I didn't want to start up a browser just to create new posts on Blogger. There's the Blogger Dashboard for that. Now, it's true that one could write a slimmer, lighter, faster-loading web-based client for Blogger without all the heavy clutter of the Blogger Dashboard, but it would still require that I start up a web browser; in the end, I wouldn't end up using it much. I wanted to build something that I would use and keep on using. (I have also been writing web applications for the past five years, and thought it would be fun to write something that ran on the desktop for once.)

Finally, there's the issue of this program being a client specifically for one company's proprietary blogging engine. Why not make it work for WordPress, TypePad, Posterous, and all the other major and popular blogging platforms? For that matter, why am I still using Blogger when there are so many newer, slicker platforms to choose from? I have to admit that I was tempted to switch out from Blogger — WordPress and Posterous in particular have impressed me the most — but when it comes down to it I'd rather have my blog on Google infrastructure than anywhere else. With that said, not everyone feels this way, so they choose newer, snazzier blogging software — which is how Blogger ended up neglected by hobbyist software developers. To this day, I still can't even log in to my Blogger account using Drivel, a desktop client with support for all sorts of blogging engines.