Why Code Coverage Alone Doesn’t Mean Squat

Agile software development is all the rage these days. One of Agile’s cornerstones is the concept of test driven development (TDD). With TDD, you write the test first, and write only enough code to make the test pass. You then repeat this process until all functionality has been implemented, and all tests pass. TDD leads to more modular, more flexible, and better designed code. It also gives you, by the mere nature of the process, a unit test suite that executes 100% of the code. This can be a very nice thing to have.

However, like most things in life, people often focus on the destination, and pay little attention to the journey required to get there. We as human beings are always looking for short cuts. Some software managers see 100% code coverage as a must have, not really caring how that goal is achieved. But it is the journey to 100% code coverage that provides the benefits that most associate with simply having 100% code coverage. Without taking the correct roads, you can easily create a unit test suite that exercises 100% of your code base, and still end up with a buggy, brittle, and poorly designed code base.

100% code coverage does not mean that your code is bug free. It doesn’t even mean that your code is being properly tested. Let me walk through a very simple example.

I’ve created a class, MathHelper that I want to test. MathHelper has one method, average, that takes a List of Integers.

/**
 * Helper for some simple math operations.
 */
public class MathHelper {

    /**
     * Average a list of integers.
     * 
     * @param integerList The list of integers to average.
     * @return The average of the integers.
     */
    public float average(List<Integer> integerList) {
        ...
    }
}

Caving into managerial pressure to get 100% code coverage, we quickly whip up a test for this class. Abracadabra, and poof! 100% code coverage!

coverage-green-bar

So, we’re done. 100% code coverage means our class is adequately tested and bug free. Right? Wrong!

Let’s take a look the the test suite we put together to reach that goal of 100% code coverage.

public class MathHelperTest {

    private MathHelper _testMe;

    @Before
    public void setup() {
        _testMe = new MathHelper();
    }

    @Test
    public void poor_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
    }
}

Ugh! What are we really testing here? Not much at all. poor_example_of_a_test is simply verifying that the call to average doesn’t throw an exception. That’s not much of a test at all. Now, this may seem like a contrived example, but I assure you it is not. I have seen several tests like this testing production code, and I assume that you probably have too.

So, let’s fix this test by actually adding a test!

    @Test
    public void a_better_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
        assertEquals(5.0, result);
    }

Let’s run it, and see what we get.

java.lang.AssertionError: expected:<5.0> but was:<2.0>

Well, that’s certainly not good. How could the average of 2, 4, 6, and 8 be 2? Let’s take a look at the method under test.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (int i = 0; i < integerList.size() - 1; i++) {
            sum += integerList.get(i);
        }
        return sum / integerList.size() - 1;
    }

Ok, there’s the bug. We’re not iterating over the full list of integers that we have been passed. Let’s fix it.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the test once again, and very that our test now passes. That’s better. But, let’s take a step back for a second. We had a method with unit tests exercising 100% of the code that still contained this very critical, very basic error.

With this bug now fixed, we commit the code to source control, and push a patch to production. All is fine and dandy until we start getting hammered with bug reports describing NullPointerExceptions and ArithmeticExceptions being thrown from our method. Taking another look at the code above, we realize that we have not done any validation of the input parameter to our method. If the integerList is null, the for loop will throw a NullPointerException when it tries to iterate over the list. If the integerList is an empty list, we will end up trying to divide by 0, giving us an ArithmeticException.

First, let’s write some tests that expose these problems. The average method should probably throw an IllegalArgumentException if the argument is invalid, so let’s write our tests to expect that.

    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_an_empty_list() {
        _testMe.average(new ArrayList<Integer>());
    }
    
    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_a_null_list() {
        _testMe.average(null);
    }

We first verify that the new tests fail with the expected NullPointerException and ArithmeticException. Now, let’s fix the method.

    public float average(List<Integer> integerList) 
            throws IllegalArgumentException {
        
        if (integerList == null || integerList.isEmpty()) {
            throw new IllegalArgumentException(
                "integerList must contain at least one integer");
        }
        
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the tests again, and verify everything now passes. So, there wasn’t just one bug that slipped in, but three! And, all in code that had 100% code coverage!

As I said in the beginning of the post, having a test suite that exercises 100% of your code can be a very valuable thing. If achieved using TDD, you will see many or all of the benefits I list at the top of the post. Having a solid test suite also shields you from introducing regressions into your code base, letting you find and fix bugs earlier in the development cycle. However, it is very important to remember that the goal is not to have 100% coverage, but to have complete and comprehensive unit tests. If your tests aren’t really testing anything, or the right thing, then they become virtually useless.

Code coverage tools are great for highlighting areas of your code that you are neglecting in your unit testing. However, they should not be used to determine when you are done writing unit tests for your code.

The Google Chart API

At work last week, I spent some time helping to write a small tool that would performance and capacity test our applications. We thought it would be great if at the end of the test run, the tool would generate a series of graphs to display the trends of certain key metrics over the course of the test. A quick search of the web for charting tools lead me to the Google Chart API.

The Google Chart API is a very slick tool for generating all sorts of graphs. You build the graphs by specifying all of the data for the graph, the graph type, and the graph metadata in the URL. The URLs get a little nasty, but in all fairness, there is a ton of information being conveyed in the URL. So, it’s going to be nasty no matter what you do.

My only concern was relying on an external web service to generate the graphs. Since this is just a development tool, it wouldn’t be the end of the world if the Google Chart API went away. We would simply adjust the tool to use some other graphing library. And, as for all of the old charts from previous runs…we wouldn’t want all of those graphs simply “disappearing”. So, instead of using the Google Chart API URL directly in the reports, I do a wget on the URL, and store the image on my file system. I then reference that local image in the reports.

All in all, the Google Chart API is a very slick, easy to use tool. I’d recommend checking it out next time you need to quickly throw together some graphs.

Falling in Love with DSLs

We were recently given a free day at work to hack on a project that was outside the realm of our normal responsibilities, yet could still be beneficial to the company. We were encouraged to be creative, explore ideas that interested us, and see if we could come up with something to demo at the end of the day.

Service level testing, or functional testing, has been a hot topic at work recently. It’s no secret that we have a very large SOA at Orbitz, powered by Jini. Services, calling services, calling services…you get the picture. Historically, we have not been the best at automating the testing of these services. That is beginning to change. About two years ago, the team I work on developed a test tool that allows us to interact with our services at a very high level, keeping us out of the nitty-gritty details when invoking a service. It’s a command line based tool with a very simple, intuitive syntax. All of the details are accessible if we need them, but more often than not they just get in the way. In fact, we enjoyed working at this level of abstraction so much, that I wrote a functional test framework (named jwoodunit by my co-workers) that drove service level tests against our services, which were written in this same “language”. It allowed us to pump out service level tests is very short order that were easy to read, and easy to maintain. It has only just occurred to me that what we had created was really a Domain Specific Language (DSL).

We have a fairly large quality assurance team that is made up mostly non-developers. Most of the testing that is done is manual, or driven by an automation tool similar to Selenium. The problem is that the manual testing is slow and not reliable, and tools like Selenium tend to be brittle, since they are based on the layout of the HTML page. It also prevents you from testing any service that is not directly accessible through the web application. So, for my project, I wanted to see if I could take my team’s DSL, clean it up even more (it is still very “programmy”), and give it to our quality assurance team so that they could test our services in a more reliable fashion.

What I ended up with was a very high level, English like DSL that I call Trastel (Travel Service Testing Language). Trastel is implemented in Ruby. In fact, it’s safe to say that Trastel is Ruby. I didn’t implement a new language. I simply took advantage of Ruby’s fantastic meta programming capabilities to extend the language with functionality that is needed by the tests. A example test is worth 1000 words:

search.flights.on("orbitz").where(
   :origin => "ORD",
   :destination => "LAX",
   :departure_date => "2008/12/10",
   :return_date => "2008/12/15"
)

verify_at_least_one_result

foreach_result do
  verify_equal "ORD", origin
end

This does exactly what you’d expect. It searches flights on Orbitz, flying from Chicago’s O’Hare airport to Los Angeles International Airport on 2008/12/10, returning on 2008/12/15. We then verify at least one result came back, and that the origin of each flight is O’Hare. That’s it.

Implementing this test in Ruby was a breeze. Since everything is an object in Ruby, dynamically adding methods to the Object class gives us the ability to create pseudo-keywords, like “search”, and “foreach_result”. Trastel also sets an attribute named @response on the test, so the code that implements the verify methods can just check that, instead of having to specify that your checking the response of the service call. foreach_result will iterate over the @response if it is an array, yielding to the specified code block for each item in that array…giving us an easy way to check each element. The last bit of magic surrounds the “origin” method. “origin” isn’t a method on object. It’s a method on the the type contained in the @response. Thanks to Ruby’s method_missing method, I can forward that method call onto the object in the @response for it to tell me what origin means. Nice and easy.

Another great thing about Trastel is that it sucks in Active Support. Active Support brings with it a wealth of extensions to the core ruby classes, letting us do something like this:

search.flights.on("orbitz").where(
   :origin => "ORD",
   :destination => "LAX",
   :departure_date => 4.weeks.from_now,
   :return_date => 6.weeks.from_now
)

We still do all of the heavy lifting (creation of the service request, execution of the remote service, etc) in Java, using JRuby to get the Ruby code to play nice with the Java code. But, the DSL serves as an excellent means to collect the data needed to drive the test, call the Java code to invoke the service with that data, and verify the data in the response matches what was expected. It successfully keeps the person writing the test from having to know anything about actually invoking the service. Writing tests at this level of abstraction also shields us from the majority of service API changes that may come our way.

With all of the benefits that I can already see with this still very immature DSL, I’m kicking myself for not considering DSLs more seriously in the past. They have been around quite some time, and have had many well know developers singing their praises. I’m looking to see what other problems I’m currently facing that could be solved a little easier by letting me code a little closer to the domain. Had I know that it would be this simple, maybe I would have tried it a long time ago.

Testing your software

Over the past couple of years, I have taken quite an interest to software testing. I’ve been reading up on the subject and playing around with different testing tools and techniques, trying to find ways to better test my software. I know from experience that many developers write off testing as boring, pointless, or too difficult. However, I don’t believe that any of these arguments pan out. I’ve also come to better understand the need for comprehensive tests that run on their own and report any problems that they find.

I’ve found that some of the challenges I’ve faced trying to test my team’s production code have been as difficult, or more difficult than solving some of our business issues. Several of these issues have been very fun to solve. Not only did I have to face issues regarding how to test a given piece of code (which were numerous), but also how to go about automating all of our different tests. I am also currently looking into ways to make these automated tests run faster, so we can get notified quickly if our continuous integration build breaks.

As for testing being pointless, well, that argument just doesn’t hold up at all. Sure, comprehensive test coverage, even 100% coverage, will not prevent bugs from creeping their way into your code. But, assuming that you write a new test for each bug you find, it will ensure that once you squash that bug it will never come back. A good test suite acts as a safety net, letting you know immediately if code you are introducing breaks other existing code. These are just the functional benefits. Good tests also serve as excellent documentation on how to use your class or your system. This documentation, unlike some other forms of documentation that is separated from the code, never goes out of date. If it does, it usually results in a test breaking.

Testing can be challenging, but it is far from impossible. In fact, the more you do it, the easier it gets. Testing can be much more difficult on code that was never designed to be tested. Refactoring is the best way to make your code more testable. And, as a side benefit, making your code more testable usually means decoupling it from the rest of the system (always a good thing). However, due to time constraints or other issues, refactoring to this degree may not always be possible. In that case, several tools exist to make testing possible. Mocking tools like EasyMock, jMockit, and JMock allow you to mock out parts of your system that are not decoupled via an interface. Although this is not ideal, it does provide a means for you to test your previously untestable code. If your code is already written to use interfaces over implementations, dependency injection tools like Spring and Guice give you a great means to inject mock objects into your code at runtime.

You’ve probably noticed that I have not mentioned one form of testing over another (unit, integration, performance, etc). That is because everything above applies to all forms of testing. And, in addition to what is listed here, each of these testing types has their own list of benefits. My next several posts will will focus on a particular testing type, and talk about that type in much more detail.

Stay tuned!