Disaster Recovery (DR) is an important aspect of any cloud deployment. In the words of Amazon’s CTO Vernal Vogel’s, “Everything fails, all the time”. It is possible that an entire data center or region of the cloud provider goes down. This has already happened to most cloud providers like Amazon AWS and Microsoft Azure and will surely happen again in future. Cloud providers like Amazon AWS and Microsoft Azure will readily suggest that you have a Disaster Recovery and Business Continuity strategy that spans across multiple regions, so that even if a single geographic region is down, you can continue working off of another region. This sounds good in theory, but there are several flaws in the methodology of using the same region of a single provider.
Below are the 5 reasons why I am asserting that this Cross-Region DR will not be that effective. Alternatively, companies would be looking at Multi-Cloud DR where a different cloud provider is used for the DR strategy.
1) A single AWS Region failure might cause huge capacity crunch for other regions used as DR
Many businesses in the USA will have their AWS infrastructure in AWS East coast region. Most of them will have their cross-region DR setup in US-West region in California. Let’s imagine that for some reason the US-East region is down. We can imagine that the load on US-West California region will drastically increase possibly bringing down US-West California region also. Even if that region survives the additional load, services like Elastic Compute Cloud (EC2) will definitely run out of capacity. It is unfathomable to think AWS keeps as much spare capacity in US-West California region that all of the US-East traffic can be moved there without any impact. Similar concepts apply to Azure US-East and US-West regions. Even if the US-East region of AWS is down for some reason, it is highly unlikely that Azure’s US-East is also down at the same time. And it is even more unlikely that Azure’s US-West would be down at the same time. Also, as very few companies have a Multi-Cloud DR strategy, the increased traffic for Azure due to AWS being down will be negligible.
2) AWS & Azure regions are no longer fully independent creating a global failure scenario likely
The above-mentioned reason is possibly one of the reasons that none of the cloud providers make it very easy to have an automated DR in another region. As the cloud providers know that it can have a cascading effect where one region going down can possibly impact another regions and those might go down as well, like stacked dominos. But as more and more customers want to setup cross-region DR and they keep asking these cloud provides to make DR simpler, most cloud providers have now provided services that automatically replicate to another region. For example, in AWS the Relational Database Service (RDS) now allows having read-replicas in another region and Simple Storage Service (S3) allows replicating buckets across regions. Azure Storage Service allows Geo-redundant storage, which replicates across regions. This means that services like AWS S3, RDS and Azure Storage are now global services where they interact across regions. Thus, it breaks the “regions are independent” ideology. Imagine a bad code push in AWS S3 in US East can bring down entire S3 service in all regions (even if the code was pushed just to one region). So your DR strategy dependent on all regions being independent now goes out for a toss.
3) Data is better protected from accidental deletions when stored in multiple clouds
Let’s say you have all your data on Azure. What if an employee, ignorantly or maliciously, writes a script that deletes all the data and data backups? Check your cloud setup. How many people in your company have the authority to delete both data and backups? I am sure that are many people with such permissions in your company. So what happens if one of the guys ignorantly or maliciously deletes all the data and backups? (BTW, this did happen to one of our clients) In this case, having DR on a separate cloud, and not having any (or very few) people with access to both cloud environments can mitigate this issue. It certainly saves from ignorant deletion as both cloud will have different credentials. Even if you are just replicating your data in another cloud, you have a chance to recover your data in this fashion.
4) Data is better protected from malicious deletions when on multiple clouds
In spite of all the precautions, there is a non-zero chance that credentials of your cloud environment may get compromised. Hackers may decide to hijack your cloud account, delete all accounts and/or delete all data and backups of your cloud environment. While you are trying to recover your account with the help of the cloud service provider like AWS or Azure, your system is down. Having data in two cloud services and securing the credentials in two different methods will help mitigate the risk that both sets of credentials are compromised at the same time. Thus, this enables one more DR strategy, where you can keep your environment running, even when the primary environment is compromised.
5) It is not very hard to setup parts of Multi-Cloud DR
Having a complete and fully automated DR across different cloud providers like AWS and Azure is hard. But starting small and having a DR strategy for a few key pieces of critical infrastructure across different cloud providers is not that hard. For example, let’s say you are a service like Pinterest that stores all your images in AWS S3. Thus, one of the most important service for you is S3. It is not that hard to have automation that replicates your S3 buckets to Azure Blob Storage and has automation in code to start serving Azure Blog Storage URLs when S3 is deemed to be down. This gives you an enormous boost in your DR and Business Continuity, as related to a key service like S3. Thus, if a company wants REAL Disaster recovery and Business Continuity strategy, it has to think multi-cloud DR. Doing multi-cloud DR is definitely more challenging compared to single cloud DR, but it is well worth the extra effort. One of our clients processes hundreds of millions of dollars worth of payments through infrastructure on AWS. Due to the nature of the business, any downtime of data loss is a huge issue. Thus, we designed and deployed a DR setup on Azure. In the next few articles, I will recommend strategies that can be used to perform a cross-cloud DR between Amazon AWS and Microsoft Azure. I will also describe a cross-cloud DR we have already implemented. Please comment and share if you liked the article. Feel free to ask your questions below and I will get back to you on them.
Please comment and share if you liked the article. Feel free to ask your questions below and I will get back to you on them.