• 4 Steps to Disaster Recovery

    Introduction

    Etomic provides many different services and SLA’s (Service level agreements), so you can select the service that meets your availability and budget needs. Things as simple and affordable as on-site and/or off-site backups all the way to fully redundant geographically dispersed sites. Plus, you can start simple and scale into different DR solutions as your business grows and needs change.

    1. Identify

    What is a disaster recovery plan and your value?

    A disaster recovery plan is your plan to recover from different types of disasters combined with your tolerance for risk. You must determine your acceptable risk and the cost of DR ro to achieve it. Think of it like buying insurance. It would be great to be insured for everything, 100% of the time but the cost of that would be extremely high for the amount you would ever use it (risk). So in your insurance plan you assume some sort of risk in a deductible in return for a lower rate. The deductible to the amount risk or expense your assume or cover within heavily impacting your life. The same is true for your Internet or business online. What levels fits your budget and what level of risk can you assume. See the charts below to understand the different risks probabilities.

    Types disasters

    When you say disaster the first thing people normally think about is a hurricane, tsunami, etc., but disasters can also be something as simple as A system or hard drive failure or corruption that results in a loss of data or up time. Something as simple as backups can be the first level of protection from these types of disasters and they are very affordable with quick recovery times (note: recovery times can very depending on the amount of data have to recover)

    Percent (%) of Disaster 5-8% of Disasters 92-95% of Disasters
    Disaster Types Locational Disasters

    • National Disasters
    • Unnatural Disasters
    Data Disasters

    • Hardware Failure
    • Software Malfunction
    • Virus
    • Human Error

    • Physical destruction of a location and data (or access to location and data). Examples: fire, flood, earthquake, significant power or network outage.
    • Data destruction without physical destruction. Examples: hardware failure, virus/hacker attack, software malfunction, human error.

    Data destruction is far more frequent. The ultimate goal of D/R is to get your business restarted in an acceptable timeframe. For some organizations that means within minutes, while for others it means hours or possibly days. The cost of operational downtime varies among businesses and industries. For example, financial firms often calculate that cost in millions of dollars per hour, while other industries calculate operational downtime as thousands per day. These costs include lost business transactions, employee productivity, and customers – not to mention regulatory penalties. The ability to tolerate these losses generally determines business continuity strategy.

    2. Define

    Disaster recovery planning

    Define what a disaster means your company. It’s it revenue, brand integrity, Data (here customer data, intellectual property, Etc.). And, associate a value to this if it’s not something obvious such as revenue. Specifically, you need to define your RPO and RTO.

    RPO (recovery point objective) in general is the time between your data backups. For example, if you back up nightly you have 24 hours between backups. So if backups are done at midnight every night and you have a disaster at 10 PM you could lose almost a day of data. Depending on how frequently data changes will determine how critical that loss will be.

    RTO (recovery time objective) after a disaster this is the length of time it takes to get back up and running. That would include the time it takes to get new systems online and run complete restores of your data from backup.

    See the RPO vs RTP tab for a more detail description.

    3. Deploy

    Implement the proper infrastructure, systems, and services you need to protect yourself from your defined disaster.

    See SLA’s tab for a more detail descriptions on SLA’s and DR services

    4. Test

    Test your recovery systems and/or services to ensure they are working in case there is a disaster. Ensure that you have regularly schedule testing and documented procedure to keep your DR plan current with change to your environment. If your Dr is not update to date and compatible with your production environment recovery may not work in times of disaster.

    Next Steps…

    Once you have identified your company’s’ value in terms of a disaster, and defined your disaster recovery objectives, we are ready to help you. Etomic will create a Disaster Recovery Plan and Scope of Services specifically for your company’s needs and budget.

  • SLA’s

    We can provide and design the high-availability and disaster recovery solution you need to need to meet your requirements and budget.

    • All high availability configurations can be customized to meet your specific needs and budget.
    • Multiple solutions can be created to meet different business objectives and scaling needs.
    • All availability targets are based, upon the minimum redundancy required for each configuration.
    Computing System
    99.9% 99.9% Single Cloud Server or computing node – Single instants
    99.999% 99.999% Cloud servers or computing nodes – Redundant servers across separate physical computing node (not assume you’re your virtual or clouds servers are redundant) with load balancing between them.
    100% 100% Cloud servers or computing nodes – Geographic High Availability Redundant servers load balanced across multiple data centers in different geographic regions at least 300 miles apart.
    Data & Storage
    999.9% 99.99% System Storage – All our storage system run a redundant RAID array that means every individual disc has redundancy (a copy of the data on another disc in case one fails) NOTE: RAID to high available storage system is NOT BACKUP. IF you get corruption in your storage system data or data base, a virus, or it gets deleted by accident the data is most cases is not be recoverable. This is why you want a separate copy fo your data kept in a backup
    99.999% 99.999% Data with Backups on-site. As mentioned above backups as a spate copy to your entire server image (include the OS, apps, and data) or specific data (file level backups), or in most cases we do a combination of both. The reason we both is you have the entire image so you can recover from a system or server failure, but in most cases customer only need to replace specific files from back up because a certain file got damaged or wrote over.
    100% 100% Data with Backups off-site. Off-site backups is that as on-site backup but they are keep off-site at another secure site. This done in case there is a large disaster (i.e. tsunami) at your primary data center. The only disadvantage to off-site backup is that it can take a little long to restore data because of the added distance.
    Infrastructure & Network
    100% 100% Network – The core network has a 100% SLA, and the network handoff to your equipment has a 99.999% SLA, which is slightly lower due to rare maintenance.
    100% 100% Data Center 2.0 infrastructure – This consists of a fully redundant data center infrastructure (power, ups, air conditioning, fire systems, etc.) with redundant Internet service providers and routes to the Internet.
    100% 100% Up-time from all natural disasters – Multiple Data Center Geographic High Availability This consists of infrastructure installed in at least two geographically diverse data centers. The production server infrastructure installed at the backup facility must be functionally equivalent to the infrastructure at the primary, but it doesn’t necessarily have to be identical. Each facility should be sized to handle the required application load.

    * This % of uptime is base on a single site. Meaning, that all the provisions are in place and we do maintain 100% uptime, but there is no way to avoid something like a natural disaster, so to achieve absolute uptime you need to be in multiple geographically disperse data centers.

    For your reference:

    Nines of Reliability: (Hours / Minutes / Seconds)
    2 9′s (99%) = up to 87.6h (3.6 days) / 5256.0m / 315360.0 seconds of downtime per year.
    3 9′s (99.9%) = up to 8.76h / 525.6m / 31536.0 seconds of downtime per year.
    4 9′s (99.99%) = up to 0.876h / 52.55995m / 3153.6 seconds of downtime per year.
    5 9′s (99.999%) = up to 0.0876h / 5.256m / 315.36 seconds of downtime per year.

  • What Is the Difference Between RTO and RPO?

    Both RPO and RTO are critical factors in an effective business continuity plan. Business continuity establishes a schematic that helps a company recover after a disaster hits. Disasters, by definition, are rarely expected. Fire, flood, theft, these are all examples of disasters that can put a poorly organized business under permanently. A continuity plan is a roadmap that details how an organization continues to thrive while rebuilding. This is where RPO and RTO come into play.

    Recovery Point Objective

    Recovery point objective, or RPO, is a complex concept geared specifically towards data backup. A business that relies heavily on data is vulnerable during a shutdown. Consider, for example, a company that maintains a database that feeds an ecommerce site. If disaster strikes the datacentre, the inventory disappears or becomes out of date. As part of business continuity planning, the management must figure out how long they can afford to have no access to that system before the business fails. Or in other words, how much data they can afford to lose before it has serious consequences for their business. The answer is crucial to developing a system backup and disaster recovery schedule.

    If that same business has a revolving inventory, updating the backup every hour improves the odds of recovering after the main system goes down. Companies that have few changes to their database might be able to update once a week and still stay in business. RPO is that deadline –the amount of data a business can afford to lose before the failure causes severe problems or shuts them down.

    Recovery Time Objective

    Recovery time objective, or RTO, is simpler. It is a target time for resumption of their IT activities after a disaster has struck. A business that can afford to take a week before being fully operational again does not need to put as much money into disaster recovery preparation as the organization that needs the doors open within two hours.

    A data entry operation has a short RTO, so the company should invest heavily in disaster recovery systems, maybe even a second DR site. This secondary location would maintain a full system backup with workstations able to support the business if the main office is unable to open. A small boutique would have a longer RTO and not budget for a disaster recovery centre.

    RPO vs. RTO

    RPO is specifically about data backup in order to maintain continuity. It is essential to determining how often a business should schedule data backup on their network. RTO is how long it will take an organization to get back up and running to the Recovery Point Objective.

    Although, one does not necessarily have anything to do with the other, they are both elements in disaster recovery and business continuity management. One is about how long the company can survive without data while the other is about how long they can take to reopen their doors. A company could have an RPO of three days, but an RTO of just one. For example, a restaurant may be able to operate without a computer system, but they lose money and inventory with the doors shut.

    RPO and RTO are important business concepts for companies to consider when developing a system that allows them to survive after disaster strikes. Although not directly related, but they are both a necessary part of the process.