Disaster Recovery & Cloud Computing– Real Recovery or a Catastrophe?

Disaster Recovery in Cloud Computing– Real Recovery or a Catastrophe?When the top management at ‘Help at Home’ set down to look over a complete BCP/DR solution, they were confronted by an undeniable truth – all of them were really very expensive.
Help at Home, a company of 13,000 workers, having a presence in nine states, assisted the elderly and disabled with household chores and health care.
They had all their operations automated with VMware servers, and backups were with SonicWall’s continuous data-protection appliance. However, the standby systems weren’t backing up operating systems or imaging any servers. Result – in case of an outage the system could only retrieve files and databases, whereas they had to lose all the major application configurations.
Their first consideration was to make one of their 85 offices a fail over site. But duplicating the infrastructure and maintaining the site was a bit costly – $190,000 over three years, almost a dead investment until some disaster struck. Instead, they found a Texas – based cloud infrastructure vendor. It pulls up their data via a VPN and syncs it with the databases usually on nightly basis or depending on the criticality of the information. The best part – all this happens in only $48,000 over three years saving almost $140,000.
No doubt it seems easy and money-saving to let others handle your data backups in a cloud, well there is always a different side too. Making critical business information backups via third-party makes it vulnerable and increases security risks, whereas size limitation, availability of bandwidth, and data retrieval capability is an additional issue.
If making backups at cloud services is inexpensive, it’s insecure too. If you had been given an option to sort out a disaster recovery plan for a company like Help at Home, what would be your main consideration points and how will you arrive at a decision? Comments appreciated.


  • There are legal issues to consider depending on whether your data includes medical, mental health, and personnel records.

  • loads of issues, I´ll pick only a couple, so others may add also ;-).
    First of all, assuming one has chosen a proper disaster recovery solution the MAJOR question will always be how fast can you be recovered from a disaster. Having all your data, configs and algorithms backed-up properly is a one thing, having it up and running again is another. You’ll be bleeding cash without your systems being online, so how fast can you be up and running again is THE major issue.
    However there’s an aspect in your scenario I don’t fully understand. You’re telling us they’re running VMware, probably meaning they’ve virtualized some of there app/server combo’s. In short this necessarily means the full app config settings AND supporting hardware settings are configured and stored through the VM management software.
    In this context the term “stand-by system” seems a bit out of place, because it can mean one of two things;
    1. It’s standby capacity within the virtual realm (meaning aux. processing capacity, data storage etcetc) or even a complete secondary fully synced session.
    2. The stand-by system is the actual app & OS & physical hardare (which needs to be in absulote sync on all system aspects, which doesn’t make any economic sense)
    In scenario 1 I cannot imagine all application settings being lost, since they’re managed from the management interface which will utilize different resources then the virtualized sessions. If the live session are struck by disaster, they should be restorable/ reinitialized through the management interface which still posseses all the necessary configs (data etc isn’t stored on the same resource). If the management infra has a meltdown, your virtualized session should remain running (for some time, atleast sufficient to recover the management infra).
    Ofcourse, this logic is highly dependent on basic architecture principles applied consistently throughout the complete chain of services.
    The second scenario which is theoretically possible would tell you: leave as soon as you can! 😉
    Make any sense to you? Otherwise ask me.

  • BCP/DR almost always costs money and is almost always a sunk investment until you need it. It’s like any insurance policy – it shows it’s value when you have to invoke it.
    You can, of course, make the choice not to have a DR solution (although I would strongly advise having a BCP in place). This saves you money but you’re banking on there not being any major issues. Bear in mind that ‘a major issue’ can be anything from a plane landing on your datacentre to a major component of the infrastructure getting fried, to anything in between.
    What concerns me most about the solution you’ve described – and it could just be the way it’s worded – is that you say, ” It pulls up their data via a VPN and syncs it with the databases usually on nightly basis or depending on the criticality of the information.”. I’m reading that as saying that this cloud backup solution only backs up your data, especially as it syncs with a database. Fine, but what do you do in the “plane lands on your datacentre” scenario? You have to rebuild the solution from scratch; having all your data is pointless if you can’t access it.
    If you’re running VMWare, any good cloud provider should be able to suck up your image and create a cloud-based version of it. You then have to get into discussions with your vendor about how resilient you want to make that – are you going to split the capability over a number of hosting sites, for example? This comes down to how rigorous you want your BCP to be and, unfortunately, how much money you have to spend.
    Other considerations (as others have mentioned) are around data integrity and security. You will have to challenge your chosen vendor on these points. Get testimonials, contact other clients, especially those that have had a ‘disaster’, find out how the provider coped. Ask for demonstrations and tests. If you purchase a backup capability, get the provider to prove restores. Simulate disasters, see how they cope.
    There is no easy answer, I’m afraid, although it should be a fairly standard procurement exercise. Set out what you want, get providers to bid for it, see how the bids match your requirements, etc, etc.

  • The best solution would be for a Hybrid Cloud – which is a combination of a Private Cloud and a Public Cloud.
    A private cloud is deployed behind an organization’s firewall and only organization employees can access data in this cloud.
    Hybrid Clouds allow users to delegate what information is stored on site (in the private cloud) and what is stored off-site in a public cloud.
    Rest assured that with most Public Clouds (Amazon, Rackspace, etc) the information is sent to the cloud via a secure SSL encryption.
    I have included a link below with more information about Hybrid Clouds

  • You have some really good answers here. Mykel, I think, brings up the point that everyone was thinking – you’re using VMware that provides much of what you may be looking for. Stuart builds on this with several good points. And finally Michael brings up hybrid clouds, making for a compelling discussion. All great feedback.
    The one thing that caught my attention was your point on cost. I hate to be the “big picture” or elephant-in-the-room guy, but how do you know $190k for 3 years is too much? Stuart touches on this by saying how rigorous do you need the plan. There is BC and then there is DR, closely related, of course, but can be simplified as continuation in the face of failure and the resumption in the face of a disaster, respectively. You need to start by understanding what elements of IT are critical to the business running and which elements are essential for the business to function. How is revenue generated and what information systems support that ability? In your case, you have to include the potential loss of life or “harm”, which is more about DR than BC, technically speaking.
    Cost of BCDR is inexorably tied to what is important to ensure the business can fulfill it’s customer needs (continue to generate revenue) and the degree of a recoverable disaster (BTW, not all disasters are recoverable, fiscally speaking)
    You need to do a business impact study and this doesn’t have to be overly complicated or time consuming. You need to look at the specific systems related to core business services, any support systems or services, and then business process related systems. Relate these to a time line of failure to impact (you’ll find these will fall on a progressive scale). The purpose is to determine what features of the systems need to be part of the plan. You mention your data is backed up but not app configs, which is a perfect example of this. Therefore, to expand on your example, you need to determine the time required to rebuild the system and applications before restoring the data, the potential rate of occurrence of an offending event, and compare this to cost and loss to the business… if it’s more than $190k, or whatever the compensating solution is, then the answer is right there. In short, never back into these things and start from the top and work your way down.
    Of course, many things will play into the selection criteria… security being one,of course (I had to put that in there:) The point is there are many options, too many – so you have to create an envelope of expectations and requirements and work inwardly from there. That’s my $0.02 anyway. I know it’s big picture goo, but that’s where the game begins.

Leave a Reply