The Beauty of Redis

If I had to name a single piece of software that has impressed me lately, it would undoubtably be Redis. Not only is this key/value store on steroids blazing fast, but it is also very simple, and incredibly powerful.


How simple, you ask?

redis> SET mykey “Hello”
redis> GET mykey

That simple.

It’s also a breeze to install and get up and running. The suggested way of installing Redis isn’t to fetch some pre-compiled package for your Linux distribution. It is to download the source code (a tiny 655K tarball) and build it yourself! This can be a real crap shoot for most software, but since Redis only depends on a working GCC compiler and libc, it is not an issue at all. It just works.

After it is installed, you can start it by simply running


at the command line. The quickstart guide also has some easy-to-follow instructions on how to start Redis automatically at boot as a daemon.


Redis is a very powerful piece of software. This power, I believe, is a direct result of its simplicity.

Redis is so much more than your run of the mill key/value store. In fact, calling it a key/value store would be like calling the Lamborghini Aventador a car. It would be far more accurate to call Redis a key/data structure store, because Redis natively supports hashes, lists, sets, and sorted sets as well. These data structures are all first class citizens in the Redis world. Redis provides a host of commands for directly manipulating the data in these data structures, covering pretty much any operation you would want to perform on a hash, list, set, or sorted set. Therefore, it is super simple to perform tasks like incrementing the value of a key in a hash by 1, push multiple values onto the end of a list, trim a list to the specified range, perform a union between two sets, or even return a range of members in a sorted set, by score, with scores ordered from high to low.

This native support for data structures, combined with Redis’ incredible performance, make it an excellent complement to a relational database. Every once in a while we’ll run into an issue where, despite our best efforts, our relational database simply isn’t cutting the mustard performance wise for a certain task. Time and time again, we’ve successfully been able to delegate these tasks to Redis.

Here are some examples of what we are currently using Redis for at Signal:

  • Distributed locking. The SETNX command (set value of key if key does not exist) can be used as a locking primitive, and we use it to ensure that certain tasks are executed sequentially in our distributed system.
  • Persistent counters. Redis, unlike memcache, can persist data to disk. This is important when dealing with counters or other values that can’t easily be pulled from another source, like the relational database.
  • Reducing load on the relational database. Creative use of Redis and its data structures can help with operations that may be expensive for a relational database to handle on its own.

When Not To Use Redis

Redis stores everything in RAM. That’s one of the reasons why it is so fast. However, it is something you should keep in mind before deciding to store large amounts of data in Redis.

Redis is not a relational database. While it is certainly possible to store the keys of data as the values of other data, there is nothing to ensure the integrity of this data (what a foreign key would do in a relational database). There is also no way to search for data other than by key. Again, while it is possible to build and maintain your own indexes, Redis will not do this for you. So, if you’re looking to store relational data, you should probably stick with a relational database.


It is very clear that the Redis team has put a ton of effort into making sure that Redis remains simple, and they have done an amazing job.

It’s worth pointing out that Redis has some of the best online documentation that I have ever seen. All commands are easy to find, clearly documented, with examples and common patters of usage. AND the examples are interactive! Not sure what the result of a certain command will be? No need to install Redis and fire it up locally. You can simply try it right there in the browser.

With client libraries in virtually every programming language, there is no reason not to give it a try. You’ll be glad you did.

Introducing Tenacity – An ORM Independent Way to Manage Inter-database Relationships

I’m a big believer in polyglot persistence. There are so many (very different) production ready databases available today that’s it is becoming more and more common to find applications using more than one database, utilizing the strengths of each. Using the right tool for the job gives me a warm, fuzzy feeling inside.

However, polyglot persistence comes with its own set of drawbacks. One of those drawbacks is the loss of foreign keys, which are very important in maintaining data integrity. Another drawback is that Object/Relational Mapping (ORM) libraries typically focus on a specific database, or type of database. So, writing code that manages relationships between objects backed by different databases hasn’t been nearly as easy as writing code to manage relationships between objects in the same database.

Tenacity’s goal is to address some of these issues. Tenacity is a ruby gem that provides an ORM independent way of managing relationships between models backed by different databases.

Tenacity works by extending popular Ruby ORM libraries to respond to a set of methods that the tenacity core uses to build and manage relationships between objects. By extending the ORM libraries to implement this interface, tenacity is able work with the objects in a generic way, without having to know what database is backing the given objects. This approach also allows you to continue using your favorite ORM libraries. To use tenacity, you simply need to include Tenacity inside your model.

Tenacity is heavily based on ActiveRecord’s associations, and aims to behave in much the same way, supporting many of the same options.

This initial release of tenacity supports belongs_to, has_one, and has_many associations, and the ActiveRecord, CouchRest, and MongoMapper ORMs. However, there is still plenty of work to be done. Feedback, bug reports, and code contributions are always welcome.

Tenacity is free and open source, and can be found on GitHub at



gem install tenacity

Slides From My WindyCityDB Talk On Polyglot Persistence

The slides from my WindyCityDB talk about Polyglot Persistence have been posted to Slideshare. You can see them here, or embedded in the post below.

The case for Polyglot Persistence was made throughout the day by several of the speakers. Most people seemed to acknowledge the fact that it is very likely one database may not have all of the tools you need to get your job done. Instead of coercing a single database to try and do things it was never designed to do, it is becoming more common for applications to use multiple databases, utilizing each for their respective strengths. But, of course, using multiple databases in a single application comes with its own set of issues, and you should make sure there is a real need for Polyglot Persistence before making that leap.

Thank you very much to the WindyCityDB organizers for putting on such a great event. I had a great time, learned a ton, met some interesting people, and participated in some great conversations. What more can you ask for in a tech conference?

Using Multiple Database Models in a Single Application

The days of the relational database being a one-stop-shop for all of your persistence needs are over. A new class of application is beginning to emerge with requirements that exceed the capabilities of the relational database. Some of these applications need unlimited scalability or bullet proof fault tolerance, while others may require blazing fast access or flexible data storage. The relational database was simply not designed to meet the needs of this small but growing class. Instead, a new breed of data stores are gaining momentum. These data stores are looking at data persistence with a fresh set of eyes, diverging from the relational model considerably in order to meet these challenges.

What’s wrong with the relational database?

For 99% of the applications out there, absolutely nothing. The relational database has been the industry standard for data storage over the past 30+ years for good reason. It is an incredibly capable piece of software. Although it may not be the best tool for everything it is used for, it certainly satisfies the needs of the vast majority of applications just fine.

However, while not new, the class of applications mentioned above are becoming more common. These applications either handle enormous amounts of traffic, or deal with tremendous amounts of data. The relational database falls short in a few areas when trying to meet the demands of an application like this.

WordPress 2.7 Database
Creative Commons License photo credit: bioxid

A single database server is usually not enough to support these requirements. Applications like this need a true database cluster, capable of adding storage space and processing power on the fly without the application even noticing. However, relational databases weren’t designed to operate in a cluster where all machines are capable of reading and writing data. This is largely due to the promises they make regarding data integrity. In order to fulfill these promises, the database needs easy, quick access to all of the data at all times to verify that duplicates aren’t being inserted, constraints aren’t being violated, etc. This quickly becomes a bottleneck when dealing with very large amounts of data.

There are techniques for scaling out relational databases, but they don’t address every concern. One popular technique is to use one or more slave databases for read requests, while continuing to funnel all write requests through the master database. The master database constantly synchronizes with the read only databases, so the data remains consistent between databases. This technique works great for read heavy applications, but does not help applications that perform just as many creates, updates, and deletes. Data sharding is another popular technique, which involves splitting the data up onto several different databases based on some criteria. But this pushes an extraordinary amount of complexity onto the application, as it is now responsible for determining which database to use for specific data sets. Master-master replication can be used to keep multiple master databases in sync, so any database server can perform read or write operations. However, for some applications there comes a point where the replication can’t keep up with the traffic.

Relational databases are also (intentionally) very strict when it comes to the structure of the data being stored. Data must be broken up into a series of rows and columns. Good object/relation mapping tools hide much of this awkwardness from us, but some applications deal with data that doesn’t map well into rows and columns. A simple key/value store is usually a better fit for applications like this.

How does the new breed address these problems?

The new breed of data stores, called NoSQL databases, make very few promises regarding data integrity. In this new model, data integrity becomes the application’s concern. By not having to enforce any complex data integrity rules, NoSQL databases can scale to levels way beyond that of a relational database. Adding more processing power or storage capacity can be as simple as adding a new machine to the cluster. The database can then store and process the data using any machine in the cluster.

In this model, the data being stored is self contained, and does not rely on any other data in the database. Therefore there is no need for one machine to know anything about the other machines in the cluster. This approach is quite different from the relational model, where data is broken up into multiple tables to eliminate duplicate data, and joined back together when being accessed.

Most of these databases subscribe to a theory called eventual consistency. In situations where duplicate information is scattered across different servers in the cluster, it is not feasible for the database to find all instances of that data and update it as a part of the original operation. Instead, the data will be replicated to the other database servers at a later time. Until that replication takes place, the application will be in an inconsistent state, where simultaneous queries fetching the same data could return different results. Although this sounds terrible, it turns out that in practice it is really not too big of a deal for most applications. Do all customers of an online retailer need to see the exact same set of product reviews 100% of the time? Probably not.

Also, because there are few promises regarding data integrity, NoSQL databases can offer data storage that is much more flexible. The database no longer has to enforce the uniqueness of a column, or ensure that the id of some referenced piece of data actually exists in the database. Some of these databases are true key/value data stores, where you can store just about anything. Others require a certain document format to be used (such as JSON or XML), but still allow you to freely change the contents of that document as you wish.

Still no one-stop-shop for persistence

Although NoSQL databases address some issues that can’t be addressed by relational databases, the opposite is true as well. The relational database offers an unparalleled feature set. While some of these features prevent it from serving the needs of the class of applications described above, they are absolutely required by other classes of applications. In some domains, data integrity is the number one concern. You need to look no further than the classic “try to withdraw money from the same account at the same time” example to justify the need for locks and transactions.

For the vast majority of applications out there, relational databases work great. There are a boat load of tools and libraries that support them, and software developers are very familiar with how to use them. It is safe to say that the relational database has secured its spot in IT departments and data centers around the world, and it isn’t going anywhere. It is far from dead.

Polyglot persistence

An increasing amount of case studies are appearing that describe how real world applications are needing the data integrity offered by the relational database in addition to the benefits offered by NoSQL databases. I believe this trend will continue, as companies are storing more data than ever, and processing that data in different ways than previously imagined.

To address these needs, some companies are beginning to run their relational database side-by-side with one or more of the NoSQL alternatives. Extremely large data sets that require scalable storage space and processing power are moved to a NoSQL database, while everything else, especially data that needs its integrity kept in-check, remains in the relational database. The term Polyglot Persistence has been used to describe the use of multiple databases within the same project.

The benefits of polyglot persistence

The benefits are somewhat obvious. By running a relational database side-by-side with a NoSQL database, you get the best of both worlds. Strict enforcement of data integrity from the relational database, and the scalability and flexibility provided by the NoSQL database. This allows you to use the best tool for the job, depending on your use case.

172/365 - memory
Creative Commons License photo credit: jypsygen

There are a few scenarios where I’ve seen systems take advantage of polyglot persistence. The first scenario involves the need to perform some set of complex calculations on an extremely large data set. The data is either copied/moved from the relational database to the NoSQL database, or inserted directly into the NoSQL database by the application. The application can then use a cluster of NoSQL database servers can then divide the work, process the data, and aggregate the results. The more machines you have in your cluster, the less time the processing will take. The resulting data can either remain in the NoSQL database or be inserted into the relational database, depending on what needs to be done with the results.

The other scenario takes advantage of the schema-less nature of some NoSQL databases. While it is certainly possible to store a serialized data structure in a single column of a relational database, interacting with that data can be a bit more challenging than if that data were in a schema-less, document oriented database. This use case, after all, is what the documented oriented databases were designed for. These types of databases simply treat the data as a collection of key/value pairs, identified by a unique ID. The NoSQL databases provide ways in which you can add structure back into the document so the data inside the document can be queried. These databases are great for storing data that can be radically different from document to document, or data whose structure changes constantly.

The challenges of polyglot persistence

Polyglot persistence comes with its own set of challenges. While potentially getting the best of both worlds as far as features go, you get the complexity and hassle of dealing with not only multiple databases, but multiple databases models.

Determining which database to use to store certain data

With more than one database, you now have to decide where to store the data. It’s no longer a given. If you make the wrong decision, you could be looking at a painful migration from one database model to another as a result. To make this decision, you need to carefully examine how the data will be used.

Increased application complexity

Applications also face increased complexity as they now have to interface with two different (potentially very different) data stores. If done correctly, you should be able to isolate this complexity to the persistence layer of your application, freeing the rest of the application from having to know what database specific data is coming from. But, interfacing to multiple data stores could greatly increase the complexity of that data persistence layer. Your application will now need to know:

  • How to connect to each of the databases
  • What database to use for specific sets of data
  • How to handle the different types of errors from each database
  • How to map results from each database back to your application’s object model
  • How to handle queries for information across databases
  • How to mock out the different databases for testing
  • Potentially, how to move data from one database to another

Addressing these concerns could result in a bunch of new application code, and with added code usually comes added complexity, and more bugs.

Increased deployment complexity

In addition to the increased application complexity, you will also face increased deployment complexity.

  • Will you need to provision new hardware to host the new database?
  • How will you backup the data in the new database?
  • How will you manage and control changes to the configuration of the new database?

Training for developers and operational staff

Given that this database will likely be radically different from the relational database that your developers and operational staff are comfortable with, how will you bring them up to speed on how to use and manage this new database? And, given that the majority of the NoSQL databases are still very young, how will you keep your developers and operational staff up to speed with the latest developments on the project?

This is a big issue, especially in companies with large development and operations teams, and needs to be thought through carefully.

  • Is there an expert you can hire to help you get up and running, and mentor your staff?
  • Is there any training available that you can give to your staff?
  • Who can you turn to for support when something goes wrong in production?


I’ve always been a big advocate of using the right tool for the right job. For the past 30 years, the relational database has been the de-facto standard for persistence. Creative people have managed to utilize and manipulate it to serve all sorts of different use cases, quite successfully. But just because you can fit a square peg through a round hole if you hit it with a big enough hammer doesn’t necessarily mean that you should.

NoSQL databases can be great tools for addressing data persistence cases that the relational database struggles with. In addition, each NoSQL database brings its own set of strengths and weaknesses to the table. They are becoming very important tools to have around, and I believe that our industry will see a steady increase in the adoption of these tools going forward.