Strive to Limit Integration Points

Last week, I was working on a new feature of TextMe that required a call to one of our external service providers for some data. The call in particular was to lookup the carrier for a given mobile number. Sounds simple enough. However, we already had code that integrated with this provider in one component of our architecture, and I needed to make this call from another component.

A couple of options jumped out at me. I could pull the code I needed to use into a library that could be shared between the components, or implement some form of inter-process communication that would enable me to invoke the service from the one component, and have it processed by the component that already integrated with the service provider.

Pulling the code into a library would be the easier of the two to implement for sure. Like any project of reasonable size, we were already doing this for several other shared pieces of code. Adding one more to the list would be a piece of cake. The second option would require a bit more work. The component that integrates with the service provider runs as a daemon process, so using something straightforward like HTTP to handle the interprocess communication was out of the question. Instead, I’d likely have to utilize the queuing framework that we already had in place. What makes it more difficult is that the queuing library we use only handles asynchronous calls, and this would need to be a synchronous call. Not the end of the world by any means, but without a doubt more complicated than simply sucking the code into a library.

Even though option one was easier to implement, having two components in the architecture integrate with a 3rd party seemed like a bad idea. Sprinkling integration points throughout your application is usually a recipe for failure. Largely because it is only a matter of time before an integration point fails.

If we went with option one, we could have the library handle the failures. However, even if handled properly, failures like this usually have other consequences. For example, if the service never responded, it could cause requests to back up in the given component. Even if we implemented a timeout, it is likely that the timeout would be greater than the average response time, which means our system would take longer to process each request. If you had to deal with a lot of incoming requests at the time of the failure, you could be in for a world of hurt, especially if you had multiple components suffering from this issue.

With option two, we have a bit more control over the situation. First off, we would know there was one, and only one spot in our architecture that integrated with that particular service. This would allow us to better understand the potential impact of the failure, and the steps that needed to be taken to address it. Second, it would allow us to more easily implement a circuit beaker to prevent the failure from rippling across the system. If the circuit breaker was tripped, we could return an error, some sort of filler data, or queue the request up for processing at a later time. Third, we could potentially add resources to account for the situation. Since the work was being done in a completely different component, if it was simply a matter of increased latency on the part of our service provider, we could always spin up a few more instances of that component to account for the fact that some of the requests may be starting to back up.

In his fantastic book, Release It, Michael Nygard talks about integration points, along with a host of other topics regarding the deployment and support of production software. Any developer who writes code that will eventually be running in a production environment (which I hope is EVERY developer) should read this book. Regarding integration points, Michael says the following:

  • Integration points are the number-one killer of systems.
  • Every integration point will eventually fail in some way, and you need to be prepared for that failure.
  • Integration point failures take several forms, ranging from various network errors to semantic errors.
  • Failure in a remote system quickly becomes your problem, usually as a cascading failure when your code isn’t defensive enough.

However, even though integration points can be tough to work with, system’s without any integrations points are usually not that useful. So, integration points are a necessary evil. Our best tools to keep them in line are defensive coding, being smart about where you place the integration points in your system, and limiting the integration points in the system.

With the help of my colleague Doug Barth, we (mostly Doug) whipped up a synchronous client for the Ruby AMQP library. I then used this code to implement the synchronous queuing behavior I needed to keep the integration point where it belonged. Those interested can find the code in GitHub, at http://github.com/dougbarth/amqp/tree/bg_em.

Increase Design Flexibility by Separating Object Creation From Use

I just finished reading Emergent Design, by Scott Bain. Overall, I thought it was a pretty good book that touched on some important concepts in software design. I’ve read about one particular concept covered in the book a few times before, but the value of it didn’t sink in until I read Emergent Design. This concept states that code that creates an object should be separate from code that uses the object.

Separating code that creates an object from the code that uses the object results in a much more flexible design, which is easier to change. Creating this separation is also very easy to do. By simply avoiding a call to the new operator in the “client” code for the particular object you wish to instantiate, you are able to evolve your code to adjust to a variety of changes, most of which require no changes in the code that uses the object. Let’s walk through an example.

Let’s say we have a logging class, named Logger, that we use to log messages from our application. The class is pretty simple, and looks something like this.

public class Logger {
    private static final String logFileName = "application.log";
    private FileWriter fileWriter;
    private Class from;
    
    public Logger(Class from) {
        this.from = from;

        try {
            fileWriter = new FileWriter(logFileName, true);
        } catch (IOException e) {
            throw new RuntimeException("Log file '" + logFileName + 
                    "' could not be opened for writing.", e);
        }
    }

    public void log(String message) {
        try {
            fileWriter.write(
                from.getCanonicalName() + ": " + message + "\n");
            fileWriter.flush();
        } catch (IOException e) {
            System.err.println("Writing to the log file failed");
            e.printStackTrace();
        }
    }
}

In our application, we would typically use the Logger class like this:

Logger logger = new Logger(MyClass.class);
logger.log("Some message");

I think this is pretty typical, and seems to be the default pattern. Create the object that you need, and then use it. Simple and straightforward. However, the simplicity comes at the price of limited flexibility. For example, what if I wanted to limit the Logger class to only having one instance? Or, what if I wanted to start logging some messages to the database, and some to the file system? By combining the code that creates the object with the code that uses the object, we’ve greatly limited the ways in which we can evolve our design without affecting existing “client” code. Sure, we can work our way out of it, but since the Logger is a very popular class used by almost every other class in the system, it will require a lot of work to change.

So, how can we avoid this? How can we effectively encapsulate the creation of the object from the code that uses it? The very first “tip” in Effective Java, by Joshua Bloch, is to prefer static builder methods over constructors. Joshua suggests this for the same reasons Scott suggests separating code that creates the object from code that uses the object in Emergent Design. Instead of making your clients use the new operator to create instances of your object, provide them with a static builder method to do so.

    public static Logger getInstance(Class from) {
        return new Logger(from);
    }
    
    protected Logger(Class from) {
        this.from = from;

        try {
            fileWriter = new FileWriter(logFileName, true);
        } catch (IOException e) {
            throw new RuntimeException("Log file '" + logFileName + 
                    "' could not be opened for writing.", e);
        }
    }

Note that I changed the scope of Logger‘s constructor from public to protected. This will discourage other classes outside of the logging package from using it, while leaving the Logger class open for subclassing. With this new method in place, users of this class can now create an instance by doing the following.

Logger logger = Logger.getInstance(MyClass.class);
logger.log("Some message");

It seems silly to provide a method that simply calls new. But, doing so adds so much flexibility to the design, that Scott considers it a “practice”, or something he does every time without even thinking about it. Abandoning the constructor also opens a few doors. You are no longer required to return an instance of that specific class, giving you the freedom return any object of that type. You don’t always have to return a new instance, allowing you to implement a cache, or a singleton. You can use this flexibility to your advantage when evolving your design. Let’s see how.

Let’s say we get a request from our accounting department to log messages from code that deals with financial transactions (conveniently located in the net.johnpwood.financial package) to the database. This sounds like the birth of a new type of logger. Because clients are not using the new operator to create new instances of the Logger class, we can easily evolve Logger into an abstract class, keeping the static getInstance() method as the factory method for the Logger class hierarchy. After we have the abstract class, we can create two new subclasses to implement the individual behavior. All of this with no change to how the client uses the logging functionality.

Because the filesystem logger and the database logger don’t have too much in common, the Logger class has been slimmed down quite a bit. What remains is the interface for the Logger subtypes, defined via the abstract log() method, and a factory method to create the proper logger, which is implemented in getInstance().

public abstract class Logger {
    
    public static Logger getInstance(Class from) {
        if (from.getCanonicalName().startsWith(
                "net.johnpwood.financial")) {
            return DatabaseLogger.getInstance(from);
        } else {
            return FilesystemLogger.getInstance(from);
        }
    }
    
    protected Logger() {}
    public abstract void log(String message);
}

We now have two distinct classes that handle logging transactions. FilesystemLogger, which contains most of the old Logger code, and DatabaseLogger. FilesystemLogger should look pretty familiar.

public class FilesystemLogger extends Logger {
    private static final String logFileName = "application.log";
    private FileWriter fileWriter;
    private Class from;

    public static FilesystemLogger getInstance(Class from) {
        return new FilesystemLogger(from);
    }
    
    protected FilesystemLogger(Class from) {
        this.from = from;
        
        try {
            fileWriter = new FileWriter(logFileName, true);
        } catch (IOException e) {
            throw new RuntimeException("Log file '" + logFileName + 
                    "' could not be opened for writing.", e);
        }
    }

    @Override
    public void log(String message) {
        try {
            fileWriter.write(
                from.getCanonicalName() + ": " + message + "\n");
            fileWriter.flush();
        } catch (IOException e) {
            System.err.println("Writing to the log file failed");
            e.printStackTrace();
        }
    }
}

DatabaseLogger is also pretty simple, since I didn’t bother to implement any of the hairy database code (doesn’t help to illustrate the point…and I’m lazy).

public class DatabaseLogger extends Logger {
    private Class from;
    
    public static DatabaseLogger getInstance(Class from) {
        return new DatabaseLogger(from);
    }
    
    protected DatabaseLogger(Class from) {
        this.from = from;
        establishDatabaseConnection();
    }

    @Override
    public void log(String message) {
        LoggerDataObject dataObject = 
            new LoggerDataObject(from, message);
        dataObject.save();
    }
    
    private void establishDatabaseConnection() {
        // Connect to the database
    }
}

We’ve significantly changed how the Logger works, and the client is totally oblivious to the changes. The client code continues to use the Logger as it did before, and everything just works. Pretty sweet, eh?

As you can imagine, there are many other ways you can evolve your design if you have this separation of creation and use. If we need to create a MockLogger for testing purposes, it can be created in Logger.getInstance() along with the other Logger implementations. The client would never know that it is using a mock. If we ended up creating 10 different loggers, it would be trivial to have Logger.getInstance() delegate the creation of the proper Logger instance to a factory, moving the creation logic out of the Logger class. Again, no changes to the client.

Separating creation from use also allows you to easily evolve your class into a singleton (or any other pattern that controls the number of instances created). This doesn’t make much sense for Logger, since each unique Logger instance contains state. However, it does make sense for some classes. Evolving your class into a singleton simply requires a static instance variable on the class containing the instance of the singleton object, and an implementation of getInstance() that returns the singleton instance. If clients have already been using the getInstance() method to get an instance of the class, then no change would be required on their end. Here’s an example:

public class SomeOtherClass {
    private static SomeOtherClass instance = new SomeOtherClass();
    
    public static SomeOtherClass getInstance() {
        return instance;
    }
    
    private SomeOtherClass() {}
}

It is worth pointing out that static builder methods are not the only way to achieve this separation. Dependency injection frameworks like Spring and Guice do all of this for you. They take on the responsibility of creating the objects, and getting the instances to the code that uses them. If you are a disciplined developer, and never “cheat” by instantiating the objects directly, then all of the same benefits outlined above apply when using a dependency injection framework.

Like everything in life, there are cons that go along with the pros. Separating the code that creates an object from the code that uses the object is not the default pattern. It is not the norm. It will take time for you and your co-workers to get comfortable with this pattern. API documentation tools don’t “call out” static builder methods like they do constructors. This could have an effect on anybody using your library. Dependency injection frameworks take the creation of objects completely out of your code, moving it to some magical, mysterious land where things just happen, somehow. This also can take some time to get used to, especially for those new to the concept.

However, I feel that the benefits of separating creation from use far outweigh the drawbacks.

In our field, change is a constant. As a profession, we’re gradually learning to stop fighting change, and to start accepting it. This means designing for change. Doing so makes everybody’s life easier, from the customer to the developer. Separating creation from use is one, quick way we can increase the flexibility of our design, with very little up front cost.

Thanks to Mahesh Murthy for reviewing this post.

Build Your Own Sandbox Application

Sandboxes are fun. A simple box full of sand, a bucket, and a shovel somehow opens the imagination like nothing else. You can build anything you want, keep it around for a while if you like it, add to it, subtract from it, or crush it in Godzilla like fashion if you so choose. You can experiment with your creation in any way imaginable, without consequence. Creating things in such a carefree environment can be refreshing, and rewarding.

I think that it is a great idea for developers to have such an environment for themselves where they can try out new technologies, techniques, or processes. Setting up a sandbox now-a-days is pretty easy. Machines are cheap, or free if you’re not picky and keep your eyes and ears open. That 4 year old PC that your cousin is throwing away because he got a brand new Dell for Christmas will fit the bill just fine. Linux is free, runs great on older computers, and has the power and flexibility to host virtually any application. Sign up for a free DNS service such as DynDNS and poof, you’ve got yourself a little server that you can reach over the Internet. My sandbox is a dusty old dual Pentium III with 512 MB of RAM, which is running Ubuntu Linux. It fits the bill quite nicely if you ask me.

But a sandbox is only half the equation. For developers, we need an application that we can play with in the sandbox. Something we can poke and prod. A sandbox application so to speak. There are several reasons why you many want to create a sandbox application.

Platform for playing with new technologies

Perhaps the biggest reason for creating and maintaining your own sandbox application is that it can serve as a platform for trying out new technologies. Doing this at work can be tough. Your boss or client may not be thrilled to hear that you completely re-wrote part of the application to use the bleeding edge release of some hot new framework because you “thought it was cool”. But, there’s nothing stopping you from doing it with your own application. Even if you rely heavily on your application, there’s no reason why you can’t fork your code, and give the new technology a try on a separate branch. If it works out you can merge the code into the main branch, and if not you can always abandon the changes. Now, this approach will only work if you have your application under source control (which you should). If you don’t mind giving the world access to your code, GitHub will host your code for free. Otherwise, it’s very easy to setup your own source control system in your sandbox.

Material to blog about

Trying out new technologies, techniques, or processes can also give you plenty of material to blog about. If the technology/technique/process you are tinkering with is hot, it is very likely that many people will be interested in reading about your experience. If you blog frequently about topics that people are interested in, you’ll steadily increase readership. This could be very good for your career, as well known programmers generally don’t have a hard time finding work. Work usually finds them.

Create something that you will use

What’s the point in going through all of this trouble to create something if you never use it? I’ve created a few applications that I use on a regular basis. Not only did these applications address some need that I was currently facing, but having a sandbox application that you actually use means that you will be more likely to maintain and enhance it.

Looks great on a resume

Employers love to hire people who show an interest in their field outside of work. I’ve found that people passionate about their field are usually better at their jobs than those who show up at 9, work on the same ol’ stuff for 8 hours, and go home. Having a sandbox application shows people that you love what you do so much that you have dedicated time outside of work to create something that you care about. This especially holds true if you’ve put serious thought into your application, and are excited to show it off to anybody who asks to see it.

Release it, and potentially help others

One of my colleagues once said,

Your parents lied to you. You are not special. There’s millions of people out there just like you.

He wasn’t trying to be mean, this time :) He was simply pointing out that you are not alone. If you are facing a problem, odds are there are hundreds or thousands (or more) of people out there who are facing that same problem. Releasing your application could be helping all of these people. Perhaps it would help them so much that they would be willing to pay for it. Wouldn’t that be nice?

Open source your application

Open sourcing your application can be great for several reasons. Perhaps you’ve just finished migrating your application to the latest version some framework. Not only can you blog about your experience, but you can share the code with others so that they can see exactly how you did it. This could potentially help others looking to migrate their applications to the same framework. You could get feedback from the community about something you could be doing better, and learn something new. People looking at your code could spot a bug, giving you the opportunity to fix it before it affects you (especially if you use your application). Some employers ask to see source code samples as part of the interview process. What sort of reaction do you think you would get if you immediately spit out several repositories that you owned on GitHub for the prospective employer to browse at their leisure? I’m not sure about you, but I’d be pretty impressed.

However, there is a potential drawback in releasing your code to the world. Ironically, it’s the same as one of the benefits. The world can see your code. This can be a bad thing if your code is full of bugs, or sloppy. So if you plan on releasing your code, take the extra care necessary to ensure that the code reflects your best effort. Your code, and you as a developer, will benefit from the extra TLC.

The next big thing

You never know which crazy, off the wall idea will turn into the next big thing. Who would of thought that there was a real need for an application that lets you tell the world what you’re doing right now. If your idea for a sandbox application turns into something with real business value, then you never know where it will end up. Large companies will often spend big bucks to buy great ideas. I’d imagine it would be pretty cool to be on the receiving end of one of those deals.

Summary

Creating a sandbox, and a sandbox application is something that every serious developer should do. If nothing more, it will give you a place to tinker with new technologies and grow as a developer. There is little to no cost to set it up, and your imagination is the only limit.

Why Code Coverage Alone Doesn’t Mean Squat

Agile software development is all the rage these days. One of Agile’s cornerstones is the concept of test driven development (TDD). With TDD, you write the test first, and write only enough code to make the test pass. You then repeat this process until all functionality has been implemented, and all tests pass. TDD leads to more modular, more flexible, and better designed code. It also gives you, by the mere nature of the process, a unit test suite that executes 100% of the code. This can be a very nice thing to have.

However, like most things in life, people often focus on the destination, and pay little attention to the journey required to get there. We as human beings are always looking for short cuts. Some software managers see 100% code coverage as a must have, not really caring how that goal is achieved. But it is the journey to 100% code coverage that provides the benefits that most associate with simply having 100% code coverage. Without taking the correct roads, you can easily create a unit test suite that exercises 100% of your code base, and still end up with a buggy, brittle, and poorly designed code base.

100% code coverage does not mean that your code is bug free. It doesn’t even mean that your code is being properly tested. Let me walk through a very simple example.

I’ve created a class, MathHelper that I want to test. MathHelper has one method, average, that takes a List of Integers.

/**
 * Helper for some simple math operations.
 */
public class MathHelper {

    /**
     * Average a list of integers.
     * 
     * @param integerList The list of integers to average.
     * @return The average of the integers.
     */
    public float average(List<Integer> integerList) {
        ...
    }
}

Caving into managerial pressure to get 100% code coverage, we quickly whip up a test for this class. Abracadabra, and poof! 100% code coverage!

coverage-green-bar

So, we’re done. 100% code coverage means our class is adequately tested and bug free. Right? Wrong!

Let’s take a look the the test suite we put together to reach that goal of 100% code coverage.

public class MathHelperTest {

    private MathHelper _testMe;

    @Before
    public void setup() {
        _testMe = new MathHelper();
    }

    @Test
    public void poor_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
    }
}

Ugh! What are we really testing here? Not much at all. poor_example_of_a_test is simply verifying that the call to average doesn’t throw an exception. That’s not much of a test at all. Now, this may seem like a contrived example, but I assure you it is not. I have seen several tests like this testing production code, and I assume that you probably have too.

So, let’s fix this test by actually adding a test!

    @Test
    public void a_better_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
        assertEquals(5.0, result);
    }

Let’s run it, and see what we get.

java.lang.AssertionError: expected:<5.0> but was:<2.0>

Well, that’s certainly not good. How could the average of 2, 4, 6, and 8 be 2? Let’s take a look at the method under test.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (int i = 0; i < integerList.size() - 1; i++) {
            sum += integerList.get(i);
        }
        return sum / integerList.size() - 1;
    }

Ok, there’s the bug. We’re not iterating over the full list of integers that we have been passed. Let’s fix it.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the test once again, and very that our test now passes. That’s better. But, let’s take a step back for a second. We had a method with unit tests exercising 100% of the code that still contained this very critical, very basic error.

With this bug now fixed, we commit the code to source control, and push a patch to production. All is fine and dandy until we start getting hammered with bug reports describing NullPointerExceptions and ArithmeticExceptions being thrown from our method. Taking another look at the code above, we realize that we have not done any validation of the input parameter to our method. If the integerList is null, the for loop will throw a NullPointerException when it tries to iterate over the list. If the integerList is an empty list, we will end up trying to divide by 0, giving us an ArithmeticException.

First, let’s write some tests that expose these problems. The average method should probably throw an IllegalArgumentException if the argument is invalid, so let’s write our tests to expect that.

    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_an_empty_list() {
        _testMe.average(new ArrayList<Integer>());
    }
    
    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_a_null_list() {
        _testMe.average(null);
    }

We first verify that the new tests fail with the expected NullPointerException and ArithmeticException. Now, let’s fix the method.

    public float average(List<Integer> integerList) 
            throws IllegalArgumentException {
        
        if (integerList == null || integerList.isEmpty()) {
            throw new IllegalArgumentException(
                "integerList must contain at least one integer");
        }
        
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the tests again, and verify everything now passes. So, there wasn’t just one bug that slipped in, but three! And, all in code that had 100% code coverage!

As I said in the beginning of the post, having a test suite that exercises 100% of your code can be a very valuable thing. If achieved using TDD, you will see many or all of the benefits I list at the top of the post. Having a solid test suite also shields you from introducing regressions into your code base, letting you find and fix bugs earlier in the development cycle. However, it is very important to remember that the goal is not to have 100% coverage, but to have complete and comprehensive unit tests. If your tests aren’t really testing anything, or the right thing, then they become virtually useless.

Code coverage tools are great for highlighting areas of your code that you are neglecting in your unit testing. However, they should not be used to determine when you are done writing unit tests for your code.