AWS Pilot Light Part II: An Automated Disaster Recovery Solution

November 8, 2021 | Comments(0) |

Understanding Disaster Recovery:

With a growing number of cyber-attacks and malware, viruses moving in the online environment, data security needs to be taken seriously. For any business, there are a few critical systems that need to be up and running all the time. To tackle the attacks, every organization needs a Business Continuity Plan (BCP) that is exercised during emergencies that can have a significant impact on the business. Disaster Recovery Plan is a subset of the Business Continuity Plan to achieve minimum impact on the software applications.

Disaster Recovery (DR) is the process an organization uses to recover access to their application and/or data to resume the performance of critical business functions after an event of either a natural or a human-driven disaster. In the event of a disaster, the continued availability of your application depends on the ability to replicate your IT systems and data.  The disaster recovery plan stipulates how a company will prepare for a disaster, what would be the response, and what steps would be taken to ensure that operation is restored.

Amazon Web Services (AWS), the leading cloud service provider today, has outlined 4 different techniques for disaster recovery preparation. Each technique can be exercised in specific scenarios.

DR strategies compared

The first one is the Backup & restore strategy, I covered that in my previous blog, Understand the Vitality of Data Backup & Disaster Recovery Plan in an Organization.

In this blog, we will see the Pilot Light strategy working and how to plan for Pilot Light recovery.

Introducing Pilot Light:

Among the four types of Disaster Recovery options provided by AWS, the process of selecting a particular DR strategy could be based on the benefits of RTO (recovery time objective) and RPO (recovery point objective).

To ensure that the critical core elements of the systems are already configured and running in AWS that acts as Pilot Light, this strategy is used as a Quick Recovery Solution.  When an emergency strikes, the DR team would then rapidly provision a full-scale production environment. The team should weigh the benefits of lower RTO (Recovery Time Objective) and RPO (Recovery Point Objective) vs the cost of implementing and operating the strategy.

Confused about RTO and RPO?? Here is an explanation to explain them.

  1. Recovery Point Objective: RPO is the maximum acceptable amount of time since the last data recovery point. This determines the acceptable loss of data between the last recovery point and the interruption of service.
  2. Recovery Time Objective: RTO is the maximum acceptable delay between the interruption of service and restoration of service. This determines the acceptable time window when service is unavailable.

RTO and RPO

The Pilot Light Technique recovery offers 10s of minutes of RTO and RPO. The Pilot Light Strategy replicates data from the primary region to data resources in the recovery region such as Amazon RDS instances or Amazon DynamoDB tables. These data resources are kept ready to serve requests. It requires you to create a continuous backup in the recovery region.

Architecture Diagram:

The left AWS Region is the primary Region that is active, and the right side is the recovery Region that is passive before the failover.

Disaster-Recovery-Pilot-Light-New

AWS Services that can be used for Pilot Light DR Solution:

  • AWS EC2 instance
  • Amazon S3
  • Amazon Glacier
  • AWS Storage Gateway
  • AWS Direct Connect
  • AWS Custom Software packages
  • Amazon Machine Image (AMI)
  • Route 53
  • Elastic Load Balancing

Pilot Light failover mechanism:

With Pilot Light DR, a minimal version of an environment is always running in the cloud which hosts the critical functionality of the application continuously. During recovery, a full-scale production environment can be rapidly provisioned around the critical core.

Considering the RTO and RPO as defined by the operational level agreement (OLA) can give us a clear idea of when to use the Pilot Light DR strategy. If the RTO is a half-hour, then by choosing Pilot Light Solution, the critical services can be restored within time, it should be acceptable. If a disaster occurs at 1:00 p.m and the RPO is a half-hour, the system should recover all data that was in the system before 12:30 p.m. With Pilot Light, only minimal critical functionalities can be restored with a minimal cost involved. For the DR scenarios, RTO and RPO reduce with an increase in cost, with data backup and restore being the lowest and multi-site active/active being the highest. When compared to the data backup and restore strategy, Pilot light takes significantly less time to recover the backup, and systems are up in a matter of a few minutes.

Conclusion:

A few things must be considered while implementing an effective DR plan such as testing, monitoring, and alerting, backups, accessibility, automation, software licensing, and many more.

CloudThat provides an end-to-end implementation of Disaster Recovery strategies to safeguard your infrastructure solutions from data loss and build a cost-effective, flexible DR plan that suits your business. Stay tuned to learn more about other DR strategies: Warm-Standby and Multi-site active/active. Learn more about CloudThat’s Consulting and Expert Advisory here.


Leave a Reply