Backups vs. Archives vs. Mirrors: How to Protect Your Data

According to Schofield’s Laws of Computing, your data doesn’t really exist unless you have at least two copies of it. The principle is simple: data is easily corrupted, lost, stolen, or destroyed. While storage devices such as hard drives have seen vast improvements in stability and performance, they are still subject to damage and failure. So how do we protect critical data and ensure access to it in the future?

The simplest solution is to create another copy of it. Easy, right? But not enough.

You must consider questions such as how sensitive the data is, how critical it is, how often you need to access it, and how long it should be retained.

Backups

‘Backup’ is often used as a catch-all term for a copy of critical data, but in the IT world, it refers to storage that needs to be available at a moment’s notice. For example: did an artist accidentally delete a job folder and the delivery deadline is coming up fast? The backup needs to be readily available for immediate recovery.

A backup should NOT be stored on the same storage as the original data - if the storage fails, both the original and backup would be lost. In an ideal situation, there will be at least two backup copies of your data: one stored onsite for easy access and recovery, and a second in a separate physical location from the original (this can include the cloud). This offsite copy is for disaster recovery purposes: it is there to ensure your data is protected against any number of events that can occur in any office or studio - smoke, fire, power surges, flood damage, and so on. Keeping a backup in a separate location reduces the risk that both the original and the backups will be destroyed. Bear in mind that recovering data from an offsite backup will take more time than from a readily available on-site drive.

A major factor to consider is keeping backups up to date. As files are added, deleted, and modified, you need backups to keep up. Fortunately, there are several products available that allow for scheduled ‘incremental backups’. These solutions will run a backup script according to a set schedule. The scripts may even maintain old data and update the backup to accommodate new and modified files.

Another consideration when using offsite backups (whether it be at a storage location or the cloud) is the security of that data. While most online services utilize the latest in encryption and authentication to secure your data, breaches still do happen. This adds a potential vulnerability to sensitive data like payroll and credit card information as opposed to keeping data closer to home.

Server Mirroring, Server Clusters & High Availability

Saving backup copies to hard drives or to the cloud is fine for cases when a file or folder gets deleted; recovery in that situation will generally take minutes or hours, depending on how much data needs to be recovered. But what about the worst-case scenario where the file server fails entirely? Recovering data from a set of drives or the Internet could take days, causing significant downtime and destroying your production deadlines.

To protect against this kind of data disaster, Nodal recommends mirroring your file server. This requires a second server with at least as much storage capacity as the primary. The goal is to use the secondary server as a big, workable backup volume. Using a tool such as rsync, the primary server can be scheduled to regularly copy data to the secondary (Nodal clients often run a nightly rsync). As with other backup tools, only files that have been added, removed, or modified are considered by rsync, which speeds up backup time.

In the event of the primary server failing, users can switch to working off of the secondary backup server with minimal data and productivity loss. The secondary server may not even need to have the same power or capability as the primary, as it will only serve as a stop-gap to keep your facility up and running until the primary can be repaired or replaced. For a more seamless transition, the secondary can have the same specs as the primary server, allowing work to continue at full capacity. However, this doubles the investment in server hardware and will depend on client budget and server strategy.

Another related option, instead of a nightly rsync, is a real-time mirroring of two file servers. In this scenario, files are copied to both servers immediately upon modification (much like RAID 1 mirroring of hard drives). There is minimal, if any, data loss when switching to the secondary server.

A final option for server backups that not only improves data stability but also increases performance is to create a ‘high-availability cluster’. Normally used in data centers and other major facilities, this involves taking the data mirroring mentioned above even further. Using multiple servers, data will be spread across them much like a RAID 5 or RAID 6 configuration. This not only makes the cluster extremely resilient against the loss of any one server (minimizing or eliminating downtime) but also expands capacity by increasing the number of servers and allows data to be retrieved even faster due to pulling from multiple systems at once.

The cluster can even be expanded on the fly should additional capacity be needed. This requires significant investment in hardware but remains an option for facilities interested in maximizing server uptime while increasing performance and available storage.

The major pitfall with rsync and high-availability mirroring is that there is a limited window to recover data in case of accidental deletion. With rsync, you have until the next scheduled sync to recover files. In the case of real-time mirroring, that data is unrecoverable because it’s immediately deleted on both servers. Nodal stresses that some form of incremental backup is still necessary to be able to recover deleted or corrupted files; server mirroring is primarily to minimize loss of production time due to hardware failure.

Archiving

On the other hand, creating a data archive is all about long-term storage. You may want to maintain your studio’s jobs, reels, and other data for as long as you’re in business. This may be for posterity or auditing purposes, or to have available for use in a future pitch. In this scenario, the data needn’t be immediately accessible - the goal is to keep the data around as long as possible, to be retrieved at your convenience.

Dedicated archiving solutions are required due to the nature of storage media. Hard drives do not last forever on a shelf - Nodal suggests a 3 year lifetime for a standard spinning-disk hard drive before device failure becomes increasingly likely. This is much too short a timeframe for a dedicated archive.

However, other storage media such as LTO tape can endure closer to 30 years given ideal storage conditions while still being readable (assuming a compatible device is available to read the data). This grants a much more reliable average shelf life for maintaining long-term storage.

Archive management is also simpler than dealing with backups. In an archive, you will not necessarily need the incremental day-to-day recovery options that a backup solution can provide. Instead, you will likely only create an archive when a job is complete and shipped to the client, and the archive will contain only finished assets and project files.

Summary

Overall, we recommend deploying all three of these options in some capacity at your facility:

  • A backup system will ensure availability of your working data while allowing recovery in case of accidental file/folder deletion or corruption.  Depending on your backup schedule and configuration, incremental backups can allow for a wide recovery window, adding flexibility in case data loss is not detected for some time.

  • Some form of server mirroring will allow you to keep working in the event of hardware failure, reducing lost productivity.

  • An archiving system will provide secure long-term storage of past work.

There are a wide variety of cloud-based and local solutions for backup and archive. Which of these solutions your studio deploys will depend on your data needs, budget, and schedule.

As always, feel free to reach out to Nodal with questions regarding your data security!

Previous
Previous

NVIDIA Announces Turing-Based GeForce RTX 2070, RTX 2080, and RTX 2080 Ti GPUs

Next
Next

Autodesk Releases Maya 2018.4