DevOps questions: patterns for multi-data center and multi-cloud applications
I received the following question to firstname.lastname@example.org regarding how to run an application in multiple data centers:
I work for a b2b e-commerce company with a significant private data center footprint. For example, our data tier alone includes a 28 host MySQL cluster, 30 five host Mongo clusters, and Hadoop.
However, we are looking to migrate to cloud-first. We have a small cloud footprint that today is mostly batch oriented (~1300 VM). We've had trouble moving interactive or non-command driven apps to the cloud due to data persistence.
What patterns are you seeing as successful in terms of multi-data center or multi-cloud environments like this? It seems like Cassandra or multi-master multi-data center persistence is key with environment or app local caches (Varnish, Redis) as "pods" is one pattern.
The problem with a data tier that spans multiple data centers (DCs) is, obviously, CAP. Clustering that works fine inside a single DC typically struggles with dropped, out of order, and simply slow packets when communicating between data centers. Those conditions effectively create a continuous, rolling partition between data centers1, which is a significant challenge for applications and databases alike.
Very few meaningful applications are designed to work well under conditions of eventual consistency2. Imagine creating a new user account, for example: a username that’s unique one moment in one data center may not be unique in another DC. The ongoing replication delay creates an ongoing partition between the DCs1. The likelihood of such a problem may seem small at first, but grows at internet scale. Unfortunately, there’s no way to resolve a problem like surprise username collisions without frustrating somebody, and that’s certainly not a desirable trait in e-commerce applications.
My personal experience with NoSQL cross-DC replication (XDCR) is limited to Couchbase. I like how Couchbase works in modern applications and its XDCR features are very appealing. Couchbase’s XDCR will replicate data between DCs and even supports multi-master cross replication between DCs, but it’s not magic, sadly. It still suffers the eventual consistency problem described above, and must perform conflict resolution that might lead to surprising results.
I’d honestly be surprised to learn there’s any database that can magically solve this problem with multiple masters in multiple DCs in a way that also meets our performance expectations. At its core, this is an application problem, and (though I’ve used this quote before), “if you need application-level guarantees, build them into the application level. The infrastructure can’t provide it.” That’s from Tyler Treat, and his blog post on distributed systems is worth a read with questions like this on your mind.
There’s no denying CAP, but we can work around the problem with some very careful thinking about your CAP needs. Joyent faces this very problem in the management of our own data centers. The information about what’s running—all the user instances—in each DC needs to be strongly consistent just within the DC. Joyent's user database, however, is shared across all DCs. Segregating data based on scope and CAP requirements makes it possible to optimize interactions to get the best performance. When a user requests a list of all her instances in a DC, the request is handled within that specific DC. However, if the user makes a request to update her account details, that request is written to the master data center. That write takes longer because it has to cross between DCs, but doing so supports the strong consistency required for valid writes to the user database. The performance implications are minimized because account updates are far less common than other, faster API interactions that deal exclusively with resources in the local data center.
Making the above work requires that the application be aware of the data segregation. Triton is open source (installation walkthrough), so you can see the details of how that works, but it might be easier to make sense of how this is implemented in WordPress. On its own, WordPress can connect to a single database instance, but the HyperDB plugin supports connecting to master and replica instances, sharding data across multiple independent databases (see “dataset” details), and distribution of data across multiple data centers. Importantly, HyperDB forces WordPress to read its writes, to mitigate the worst problems of replication delays.
Despite my suggestions above that the application needs to be designed for running in multiple DCs, it might be possible to separate your application(s) from their data tier with no modifications. A tiered application with database, an object cache like Memcached or Redis, application code, and a web front end, might work just fine with the database in one DC and the rest in another DC. The trade-off you’ll make is in performance of any interactions between the application and the database tier, but those are hopefully minimized by the object cache. If your app is well-enough behaved to read its own writes, you might be able to keep a read-only DB replica in the remote data center and write to the primary DC. On the other hand, you might find that your app is not well behaved. Sloppiness in cache invalidation after writing to the DB is a common problem, for example.
Or, perhaps your application is structured as a microservices app (or something that approaches a microservices app). In theory, the separation of concerns in a microservices app can make them well-suited to distribution across different DCs. In practice, separating the app can uncover situations where multiple apps might share the same database, rather than present that data as a service (the arguments against sharing the same database across multiple microservices is worthy of a separate post). I should say that, for a number of users, the complexity of making an app work across data centers against risk of DC failure is often such that they maintain a small passive footprint in a secondary DC that they can manually fail-over to in case their primary data center goes offline.
And that brings me to the short answer you were hoping for when you asked about successful patterns for multi-data center or multi-cloud environments:
- Understand that there really can only be a single master for any consistent data service
- Keep replicas in secondary data centers
- Manually promote a replica/secondary data center to master if the situation requires
- If you can run your app with a remote data tier, take advantage of that to keep secondary data centers warm
- Build or refactor your app to segregate data based on geographic scope and CAP requirements to maximize performance by minimizing the traffic between data centers
I hope that answers the question and gives some context for how to approach the challenges of architecting an application to run in multiple data centers. Joyent's expertise in operating a public cloud using the same software we offer to those operating their own data centers positions our engineers to assist critical support customers on questions like this. Contact our team to learn more.
As always, send your devops questions to email@example.com and I'll do my best to answer them or find somebody who can.
Technically, this is true even within a single data center. Most of us have suffered consistency problems due to replication delays at one point or another, but the network fallacies of distributed computing are multiplied—along with the problems—when communicating between data centers. ↩ ↩
Applications that can tolerate eventual consistency must typically embrace a data model expressly designed for it, and accept a number of limitations because of it. Most people understand that all members of an eventually consistent system will eventually converge to the same state, but many are surprised to learn that there's no guarantee about what state they converge on. Representing the data as a series of transactions can work well for data that doesn't depend on strict ordering. Conflict free replicated data types work well in eventually consistent systems, but patterns for using them well remain uncommon. ↩
Post written by Casey Bisson