By John Watkins, inRsite IT Solutions
While the term ‘Business Continuity’ is fairly new, the question behind it has kept system administrators up late at night since the PC made its way into our offices; “How do I ensure my company can continue functioning in the event of an emergency?”. Developing a Business Continuity plan (also sometimes called a Backup and Disaster Recovery plan), is a common way for sysadmins to answer that question.
A traditional Backup and Disaster Recovery (BDR) plan will address what kinds of data is backed up, the frequency of the backups as well as the storage and retention settings of those backup files. This type of BDR plan was the industry standard for a long time, but it only addressed part of the problem.
Even if your files are backed-up on a server or on a NAS at the office, there are still threats that need to be planned for. What if a user is phished and infects the entire office, including your backup device, with ransomware? What happens when the workers at the construction-site down the road cut the main fiber feed for your ISP, and they say it will be days before it’s repaired? What happens if there is a fire in the building and all the equipment in your server room is destroyed? Can your employees do their jobs if a sever hurricane floods your office and they must work from home for a couple weeks?
Far too often, these questions are only asked after a disaster has occurred and it is too late to do anything about it. Spending the time and effort today to develop and test a Business Continuity plan will save your clients, and yourself, a lot of time, money and headaches tomorrow.
The first thing you should do when developing a Business Continuity plan for your clients, is to setup a meeting with your client POC(s), where you will discuss what LOB applications and servers they consider to be critical, along with what their Risk Tolerance is. Then you will take the information learned in that meeting and develop RPO and RTO values for your client. These values provide you with the basic framework to structure your backup schedule and retention settings.
The Recovery Point Objective (RPO) is a value that defines how much data the client can afford to lose, before it has an impact on the daily operations of their business. For a banking system, 1 hour of data loss would be devastating, as these systems operate live transactions. On the other hand, a small retail shop that only processes a handful of sales per day would be good with just a couple backups per day.
The Recovery Time Objective (RTO) defines a time frame within which applications and systems should be restored after an outage. Critical systems, such as a banking system, would have a shorter RTO window than low-priority systems, and thus be restored first.
Keep in mind that the examples given above are over-simplified to keep this article short. You can (and should) define multiple tiers in your BC plan for each client, grouping applications/servers together based on their RPO/RTO requirements and with different SLA’s applied to each tier.
Far too often, MSPs focus on the traditional aspects of business continuity, data backup and recovery, and they neglect to plan for the single biggest threat to their client’s productivity – ISP Outages. In today’s modern offices, when your office goes offline, there isn’t much that employees can do. No emails can be sent or received, VoIP and eFax services will not function and SaaS tools like QuickBooks Online and Salesforce will be unreachable. Not only doesn’t the business lose money in the loss of employee productivity, it loses any prospective new clients that may have called, emailed or placed an order during the outage.
Luckily for our clients, MSPs can help protect their bottom lines from the impact of ISP outages, by leveraging Redundant internet connections, implementing SDWAN technologies or having a plan in place to get its employees back to work remotely.
The use of Redundant ISPs isn’t anything new, but it is an essential step in keeping your clients up and running as much as possible. The cost of bringing a secondary ISP into your client’s office is (usually) negligible and implementation is very simple. Many business grade firewalls have a port already defined as WAN2 (or similar), so all you need to do is plug the new connection in and configure the firewall software to use it. Some more advanced firewalls, like the Sophos UTMs we use, give extra options like the ability to set the priority of the WAN connections and even load-balance the traffic across both uplinks.
SDWAN is a term that has been thrown around quite a bit over the last couple years, and for many vendors in the MSP space it is just a rebranding of uplink balancing technology that has been used for years. But there are a few players that offer a true “Software Defined WAN” solution, with Big Leaf being our chosen provider. While it may seem like the same old tech at first glance, there is much more under the hood than simply balancing the network traffic across uplinks. BigLeaf provides a physical appliance that installs between your ISP connections and your firewall, and it’s on that appliance that the heavy lifting is done; Dynamic QOS, Intelligent Load Balancing and Autonomous Classification and Prioritization of traffic all lead to unmatched performance and uptime metrics.
While uplink balancing on our UTMs is very quick, users will notice when an ISP is lost and traffic is moved to the failover ISP, especially if they are on VoIP calls or using a SaaS software at the time. With the BigLeaf SDWAN solution, we have had multiple instances of an office losing an ISP connection and users not noticing, even when they were on a VoIP call when the connection was lost.
There are dozens of backup tools on the market that are marketed toward MSPs, and you will find negative and positive reviews online for every one of them. After using a few of the top preforming products over the years, our MSP has standardized on Veeam, but understand that there are tradeoffs that have to be accounted for when selecting or changing the backup solution your MSP uses, and what works for one shop won’t necessarily work for another.
For example, a small 1-3 employee MSP may want something that’s super easy to manage and doesn’t require separate hardware to function. In this case, Datto may be a good fit. But a more mature MSP with experienced Engineers on staff may choose to deploy their own BDR servers to client sites and may even host their own cloud repository for offsite backup replication. Rolling out a solution powered by StorageCraft or Veeam, where the engineers have a higher level of control over the data, may be a better fit for this type of MSP.
Once you have selected the backup software solution that fits your MSP, you will need to plan for deploying BDR Appliances at your client sites. These appliances are either purchased from your backup vendor (Datto), or are built out by the MSP to their own spec before installing the backup solution software on the appliance (Veeam). Since our MSP is a Veeam shop, we build our 2u Dell Poweredge servers with enough resources to not only store backup files, but to restore backups on the BDR itself, if needed. We then pair the BDR with a local NAS and configure storage tiering to keep the most recent backup files on the hot tier (the BDR Server), older files on the warm tier (NAS), and archived backups on the cold tier (AWS/Wasabi/B2). Retention settings vary from client to client and take into account the RPO/RTO set by the client as well as any regulatory requirements on data retention.
When creating a backup plan for your clients, don’t forget the basics. Follow the principal of Least Privilege when assigning permissions on your BDR, setup proper service accounts with complex passwords and interactive login disabled, implement MFA wherever possible and follow the 3-2-1 rule for backups. Keep at least 3 copies of the data, with 2 copies stored to separate devices and the third copy of the data offsite.
The final and most important part of a backup plan is to TEST THE BACKUPS on a regular basis. Even if your backup report shows that a backup was successful, you may run into problems when you go to restore the file. Some backup solutions, including Storagecraft and Veeam, have built in components that will automatically test backups by mounting, booting and then taking a screenshot of the booted VM on the login screen or listing the result in a report. While the automatic testing of VMs is helpful, don’t trust it on its own. I have had backups that would pass the test boot successfully, but BSOD when you attempted to sign in. No matter the solution you choose for backups, always set aside some time to manually test a backup with a full restore and be sure to test the functionality of the OS as well, don’t assume that getting to the login screen means that the VM is healthy.
If you have a client that requires near-zero downtime for its systems, Replication is going to be your best friend. Replicas are live versions of the client’s production servers waiting to be used and are updated frequently so that they closely match the production servers they’re copying. If the client does lose a server, they can switch over to using the replica in a couple minutes and pick up right where they left off, limiting the impact to their productivity.
Veeam has an entire component dedicated to replication and includes the functionality to create fail-over plans that allow you to switch clients over from their on-site systems to cloud replicates with the push of a button. This is also an incredibly helpful tool for MSPs to use when testing updates to client systems and applications, without impacting the end users. And just like with the backups, make sure to test your replication VMs and fail-over plans on a regular basis.
With the gauntlet of threats facing MSPs and their clients in today’s world, Business Continuity is more important than ever before. Hurricanes, wildfires and earthquakes are in the news almost daily, MSPs and their clients are being targeted by cybercriminals and Karen from accounting will still open that cute cat-meme email from a Russian sender without question. Do yourself and your clients a favor and take the time to create a proper Business Continuity plan now, before you need one.
About The Author
John Watkins is the CIO of inRsite IT Solutions, a Central Florida based MSP that helps clients embrace new technologies to drive meaningful change in their organizations. Before joining inRsite, John managed Low Voltage projects for companies including Target, Walmart, RadioShack, and Lexus. He brought that experience with him to inRsite where he helped grow the company from a small PC break-fix shop into an award-winning MSP and Cloud Services Provider. For more information visit www.inRsite.com.