Managing Development Data for a Service Oriented Architecture

A service oriented architecture (SOA) provides many benefits. It allows for better separation of responsibilities. It simplifies deployment by letting you only deploy the services that have changed. It also allows for better scalability, as you can scale out only the services that are being hit the hardest.

However, a SOA does come with some challenges. This blog post addresses one of those challenges: managing a common dataset for a SOA in a development environment.

The Problem

With most SOAs there tends to be some sharing of data between applications. It is common for one application to store a key to data which is owned by another application, so it can fetch more detailed information about that data when necessary. It is also possible that one application may store some data owned by another application locally, to avoid calling the remote service in certain scenarios. Either way, the point is that in most SOAs the applications are interconnected to some degree.

Problems arise with this type of architecture when you attempt to run an application in isolation with an interconnected dataset. At some point, Application A will need to communicate with Application B to perform some task. Unfortunately, simply starting up Application B so Application A can talk to it doesn’t necessarily solve your problem. If Application A is trying to fetch information from Application B by key, and Application B does not have that data (the application datasets are not “in sync”), then the call will obviously fail.

Stubbing Service Calls

Stubbing service calls is one way to approach this issue. If Application A stubs out all calls to Application B, and instead returns some canned response that fulfills Application A‘s expectations, then there is no need to worry about making sure Application B‘s data is in sync. In fact, Application B doesn’t even need to be running. This greatly simplifies your development environment.

Stubbing service calls, however, is very difficult to implement successfully.

First, the stubbed data must meet the expectations of the calling application. In some cases, Application A will be requesting very specific data. Application A, for example, may very well expect the data coming back to contain certain elements or property values. So any stubbing mechanism must be smart enough to know what Application A is asking for, and know how to construct a response that will satisfy those expectations. In other words, the response simply can’t contain random data. This means the stubbing mechanism needs to be much more sophisticated (complex).

Handling calls that mutate data on the remote service are especially difficult to handle. What happens when the application tries to fetch information that it just changed via another service call? If the requests are stubbed, it may appear that the call to mutate the data had no effect. This could possibly lead to buggy behavior in the application.

Also, if you stub out everything, you’re not really testing the inter-application communication process. Since you’re never actually calling the service, stubs will completely hide any changes made to the API your application uses to communicate with the remote service. This could lead to surprises when actually running your code in an environment that makes a real service call.

Using Production Data

In order for service calls to work properly in a development environment, the services must be running with a common dataset. Most people I’ve spoken with accomplish this by downloading and installing each application’s production dataset for use in development. While this is by far the easiest way to get up and running with a common dataset, it comes with a very large risk.

Production datasets typically contain sensitive information. A lost laptop containing production data could easily turn into a public relations disaster for your company, and more importantly it could lead to severe problems for your customers. If you’ve ever had personal information lost by a 3rd party then you know what this feels like. Even if your hard drive is encrypted, there is still a chance that a thief could gain access to the data (unless some sort of smart card or biometric authentication system is used). The best way to prevent sensitive information from being stolen by keeping it locked up and secure, on a production server.

Using Scrubbed Production Data

Using a production dataset that has been scrubbed of sensitive information is also an option. This approach will get you a standardized dataset, without the risk of potentially losing sensitive information (assuming your data scrubbing process is free of errors).

However, if your dataset is very large, this may not be a feasible option. My MacBook Pro has a 256GB SSD drive. I know of datasets that are considerably larger than 256GB. In addition, you have less control over what your test data looks like, which could make it harder to test certain scenarios.

Creating a Standardized Dataset

The approach we’ve taken at Centro to address this issue is to create a common dataset that individual applications can use to populate their database. The common dataset consists of a series of YAML files, and is stored in a format that is not specific to any particular application. The YAML files are all stored together, with the thought that conflicting data is less likely to be introduced if all of the data lives in the same place.

The YAML files may also contain ERB snippets. We are currently using ERB snippets to specify dates.

- id: 1
  name: Test Campaign
  start_date: <%= 4.months.ago.to_date %>
  end_date: <%= 3.months.ago.to_date %>

Specifying relative dates using ERB, instead of hard coding them, gives us a dataset that will not grow stale with time. Simply re-seeding your database with the common dataset will give you a current dataset.

Manually creating the standardized dataset also enables us to construct the dataset in such a way that edge cases that are not often seen in production data are exposed, so we can better test how the application will handle that data.

Importing the Standardized Data into the Application’s Database

A collection of YAML files by itself is useless to the application. We need some way of getting that data out of the YAML files and into the application’s database.

Each of our applications has a Rake task that reads the YAML files that contain the data it cares about, and imports that data into the database by creating an instance of the model object that represents the data.

This process can be fairly cumbersome. Since the data in the YAML files are stored in a format that is not specific to any particular application, attribute names will often need to be modified in order to match the application’s data model. It is also possible that attributes in the standardized dataset will need to be dropped, since they are unused by this particular application.

We solved this, and related issues by building a small library that is responsible for reading the YAML files, and providing the Rake tasks with an easy to use API for building model objects from the data contained in the YAML file. The library provides methods to iterate over the standardized data, map attribute names, remove unused attributes, or find related data (perhaps in another data file). This API greatly simplifies the application’s Rake task.

In the code below, we are iterating over all of the data in the campaigns.yml file, and creating an instance of our Campaign object with that data.

require 'application_seeds'

namespace :application_seeds do
  desc 'Dump the development database and load it with standardized application seed data'
  task :load, [:dataset] => ['db:drop', 'db:create', 'db:migrate', :environment] do |t, args|
    ApplicationSeeds.dataset = args[:dataset]
    seed_campaigns
  end

  def seed_campaigns
    ApplicationSeeds.campaigns.each do |id, attributes|
      ApplicationSeeds.create_object!(Campaign, id, attributes)
    end
  end
end

With the Rake task in place, getting all of the applications up and running with a standardized dataset is as simple as requiring the seed data library (and the data itself) in the application’s Gemfile, and running the Rake task to import the data.

The application_seeds library

The library we created to work with our standardized YAML files, called application_seeds, has been open sourced, and is available on GitHub at https://github.com/centro/application_seeds.

Drawbacks to this Approach

Making it easy to perform real service calls in development can be a double edged sword. On one hand, it greatly simplifies working with a SOA. On the other hand, it makes it much easier to perform service calls, and ignore the potential issues that come with calling out to a remote service. Service calls should be limited, and all calling code should be capable of handling all of the issues that may result from a call to a remote service (the service is unavailable, high latency, etc).

Another drawback is that test data is no substitute for real data. No matter how carefully it is constructed, the test dataset will never contain all of the possible combinations that a production dataset will have. So, it is still a good idea to test with a production dataset. However, that testing should be done in a secure environment, where there is no risk of the data being lost (like a staging environment).

Why Code Coverage Alone Doesn’t Mean Squat

Agile software development is all the rage these days. One of Agile’s cornerstones is the concept of test driven development (TDD). With TDD, you write the test first, and write only enough code to make the test pass. You then repeat this process until all functionality has been implemented, and all tests pass. TDD leads to more modular, more flexible, and better designed code. It also gives you, by the mere nature of the process, a unit test suite that executes 100% of the code. This can be a very nice thing to have.

However, like most things in life, people often focus on the destination, and pay little attention to the journey required to get there. We as human beings are always looking for short cuts. Some software managers see 100% code coverage as a must have, not really caring how that goal is achieved. But it is the journey to 100% code coverage that provides the benefits that most associate with simply having 100% code coverage. Without taking the correct roads, you can easily create a unit test suite that exercises 100% of your code base, and still end up with a buggy, brittle, and poorly designed code base.

100% code coverage does not mean that your code is bug free. It doesn’t even mean that your code is being properly tested. Let me walk through a very simple example.

I’ve created a class, MathHelper that I want to test. MathHelper has one method, average, that takes a List of Integers.

/**
 * Helper for some simple math operations.
 */
public class MathHelper {

    /**
     * Average a list of integers.
     * 
     * @param integerList The list of integers to average.
     * @return The average of the integers.
     */
    public float average(List<Integer> integerList) {
        ...
    }
}

Caving into managerial pressure to get 100% code coverage, we quickly whip up a test for this class. Abracadabra, and poof! 100% code coverage!

coverage-green-bar

So, we’re done. 100% code coverage means our class is adequately tested and bug free. Right? Wrong!

Let’s take a look the the test suite we put together to reach that goal of 100% code coverage.

public class MathHelperTest {

    private MathHelper _testMe;

    @Before
    public void setup() {
        _testMe = new MathHelper();
    }

    @Test
    public void poor_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
    }
}

Ugh! What are we really testing here? Not much at all. poor_example_of_a_test is simply verifying that the call to average doesn’t throw an exception. That’s not much of a test at all. Now, this may seem like a contrived example, but I assure you it is not. I have seen several tests like this testing production code, and I assume that you probably have too.

So, let’s fix this test by actually adding a test!

    @Test
    public void a_better_example_of_a_test() {
        List<Integer> nums = Arrays.asList(2, 4, 6, 8);
        _testMe.average(nums);
        assertEquals(5.0, result);
    }

Let’s run it, and see what we get.

java.lang.AssertionError: expected:<5.0> but was:<2.0>

Well, that’s certainly not good. How could the average of 2, 4, 6, and 8 be 2? Let’s take a look at the method under test.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (int i = 0; i < integerList.size() - 1; i++) {
            sum += integerList.get(i);
        }
        return sum / integerList.size() - 1;
    }

Ok, there’s the bug. We’re not iterating over the full list of integers that we have been passed. Let’s fix it.

    public float average(List<Integer> integerList) {
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the test once again, and very that our test now passes. That’s better. But, let’s take a step back for a second. We had a method with unit tests exercising 100% of the code that still contained this very critical, very basic error.

With this bug now fixed, we commit the code to source control, and push a patch to production. All is fine and dandy until we start getting hammered with bug reports describing NullPointerExceptions and ArithmeticExceptions being thrown from our method. Taking another look at the code above, we realize that we have not done any validation of the input parameter to our method. If the integerList is null, the for loop will throw a NullPointerException when it tries to iterate over the list. If the integerList is an empty list, we will end up trying to divide by 0, giving us an ArithmeticException.

First, let’s write some tests that expose these problems. The average method should probably throw an IllegalArgumentException if the argument is invalid, so let’s write our tests to expect that.

    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_an_empty_list() {
        _testMe.average(new ArrayList<Integer>());
    }
    
    @Test(expected=IllegalArgumentException.class)
    public void test_average_with_a_null_list() {
        _testMe.average(null);
    }

We first verify that the new tests fail with the expected NullPointerException and ArithmeticException. Now, let’s fix the method.

    public float average(List<Integer> integerList) 
            throws IllegalArgumentException {
        
        if (integerList == null || integerList.isEmpty()) {
            throw new IllegalArgumentException(
                "integerList must contain at least one integer");
        }
        
        long sum = 0;
        for (Integer i : integerList) {
            sum += i;
        }
        return sum / integerList.size();
    }

We run the tests again, and verify everything now passes. So, there wasn’t just one bug that slipped in, but three! And, all in code that had 100% code coverage!

As I said in the beginning of the post, having a test suite that exercises 100% of your code can be a very valuable thing. If achieved using TDD, you will see many or all of the benefits I list at the top of the post. Having a solid test suite also shields you from introducing regressions into your code base, letting you find and fix bugs earlier in the development cycle. However, it is very important to remember that the goal is not to have 100% coverage, but to have complete and comprehensive unit tests. If your tests aren’t really testing anything, or the right thing, then they become virtually useless.

Code coverage tools are great for highlighting areas of your code that you are neglecting in your unit testing. However, they should not be used to determine when you are done writing unit tests for your code.

The Google Chart API

At work last week, I spent some time helping to write a small tool that would performance and capacity test our applications. We thought it would be great if at the end of the test run, the tool would generate a series of graphs to display the trends of certain key metrics over the course of the test. A quick search of the web for charting tools lead me to the Google Chart API.

The Google Chart API is a very slick tool for generating all sorts of graphs. You build the graphs by specifying all of the data for the graph, the graph type, and the graph metadata in the URL. The URLs get a little nasty, but in all fairness, there is a ton of information being conveyed in the URL. So, it’s going to be nasty no matter what you do.

My only concern was relying on an external web service to generate the graphs. Since this is just a development tool, it wouldn’t be the end of the world if the Google Chart API went away. We would simply adjust the tool to use some other graphing library. And, as for all of the old charts from previous runs…we wouldn’t want all of those graphs simply “disappearing”. So, instead of using the Google Chart API URL directly in the reports, I do a wget on the URL, and store the image on my file system. I then reference that local image in the reports.

All in all, the Google Chart API is a very slick, easy to use tool. I’d recommend checking it out next time you need to quickly throw together some graphs.

Falling in Love with DSLs

We were recently given a free day at work to hack on a project that was outside the realm of our normal responsibilities, yet could still be beneficial to the company. We were encouraged to be creative, explore ideas that interested us, and see if we could come up with something to demo at the end of the day.

Service level testing, or functional testing, has been a hot topic at work recently. It’s no secret that we have a very large SOA at Orbitz, powered by Jini. Services, calling services, calling services…you get the picture. Historically, we have not been the best at automating the testing of these services. That is beginning to change. About two years ago, the team I work on developed a test tool that allows us to interact with our services at a very high level, keeping us out of the nitty-gritty details when invoking a service. It’s a command line based tool with a very simple, intuitive syntax. All of the details are accessible if we need them, but more often than not they just get in the way. In fact, we enjoyed working at this level of abstraction so much, that I wrote a functional test framework (named jwoodunit by my co-workers) that drove service level tests against our services, which were written in this same “language”. It allowed us to pump out service level tests is very short order that were easy to read, and easy to maintain. It has only just occurred to me that what we had created was really a Domain Specific Language (DSL).

We have a fairly large quality assurance team that is made up mostly non-developers. Most of the testing that is done is manual, or driven by an automation tool similar to Selenium. The problem is that the manual testing is slow and not reliable, and tools like Selenium tend to be brittle, since they are based on the layout of the HTML page. It also prevents you from testing any service that is not directly accessible through the web application. So, for my project, I wanted to see if I could take my team’s DSL, clean it up even more (it is still very “programmy”), and give it to our quality assurance team so that they could test our services in a more reliable fashion.

What I ended up with was a very high level, English like DSL that I call Trastel (Travel Service Testing Language). Trastel is implemented in Ruby. In fact, it’s safe to say that Trastel is Ruby. I didn’t implement a new language. I simply took advantage of Ruby’s fantastic meta programming capabilities to extend the language with functionality that is needed by the tests. A example test is worth 1000 words:

search.flights.on("orbitz").where(
   :origin => "ORD",
   :destination => "LAX",
   :departure_date => "2008/12/10",
   :return_date => "2008/12/15"
)

verify_at_least_one_result

foreach_result do
  verify_equal "ORD", origin
end

This does exactly what you’d expect. It searches flights on Orbitz, flying from Chicago’s O’Hare airport to Los Angeles International Airport on 2008/12/10, returning on 2008/12/15. We then verify at least one result came back, and that the origin of each flight is O’Hare. That’s it.

Implementing this test in Ruby was a breeze. Since everything is an object in Ruby, dynamically adding methods to the Object class gives us the ability to create pseudo-keywords, like “search”, and “foreach_result”. Trastel also sets an attribute named @response on the test, so the code that implements the verify methods can just check that, instead of having to specify that your checking the response of the service call. foreach_result will iterate over the @response if it is an array, yielding to the specified code block for each item in that array…giving us an easy way to check each element. The last bit of magic surrounds the “origin” method. “origin” isn’t a method on object. It’s a method on the the type contained in the @response. Thanks to Ruby’s method_missing method, I can forward that method call onto the object in the @response for it to tell me what origin means. Nice and easy.

Another great thing about Trastel is that it sucks in Active Support. Active Support brings with it a wealth of extensions to the core ruby classes, letting us do something like this:

search.flights.on("orbitz").where(
   :origin => "ORD",
   :destination => "LAX",
   :departure_date => 4.weeks.from_now,
   :return_date => 6.weeks.from_now
)

We still do all of the heavy lifting (creation of the service request, execution of the remote service, etc) in Java, using JRuby to get the Ruby code to play nice with the Java code. But, the DSL serves as an excellent means to collect the data needed to drive the test, call the Java code to invoke the service with that data, and verify the data in the response matches what was expected. It successfully keeps the person writing the test from having to know anything about actually invoking the service. Writing tests at this level of abstraction also shields us from the majority of service API changes that may come our way.

With all of the benefits that I can already see with this still very immature DSL, I’m kicking myself for not considering DSLs more seriously in the past. They have been around quite some time, and have had many well know developers singing their praises. I’m looking to see what other problems I’m currently facing that could be solved a little easier by letting me code a little closer to the domain. Had I know that it would be this simple, maybe I would have tried it a long time ago.