February 25th, 2009 by Phil Pélanne

Redundancy: Good Things Come In (at least) Twos

 Add a comment.

About this post:

Phil Pelanne addresses the often overlooked issue of backing up and protecting your data.

Filed under

 

A large part of the work NewCity does deals with the conception, organization and construction of websites. In all aspects of this process, short of a repeal of Murphy's Law, redundancy is something that cannot be ignored. In truth, that's a great rule of thumb for anything computer-related.

Consider that virtual pile of digital photos that has accrued on your hard drive over the years. If you don't maintain some kind of backup of them, you are one errant disk platter away from losing all those memories. While our websites may be cutting edge, we try work at a safe distance from the precipice. Given the reach of the web, the potential casualties of poor planning can be quite costly, and therefore all aspects of redundancy need to be considered. This means establishing reliable plans for retaining site data, access to it, and even process knowledge.

HARDWARE

At the server level, the sheer complexity of the system means at there are a myriad things that can go awry. If there is only one hard drive in the system, just like your digital photo collection at home, you are that one hard drive failure away from losing your data.

A technique that many businesses choose to employ to get around this is called RAID, an acronym that stands for Redundant Array of Inexpensive Disks. In short, RAID means that there are actually multiple hard drives that work in unison, with the data being protected from any one disk failing by virtue of their numbers. They can be configured in different ways, sometimes as a pair that mirror each other, or as an array of three that have data striped across them, or with even more. The configuration chosen has implications for how quickly data can be written to or read from them, but they all strive to protect your data.

Unfortunately, RAID will not protect you completely. Its big Achilles heel is that if a disk begins to go bad, depending on how insidious the failure is, rather than just disappearing from the array of disks, data corruption can occur, and that corruption can be faithfully copied across to the other disks in the array by the RAID controller. What now?

SOFTWARE

Well, hopefully you employed another very common arrow in the quiver of redunancy – backups! Depending on your company requirements, backups can range from a periodic synchronization of files to a different location, to a regular, full imaging of the entire disk to an image tape or hard drive. This process can be controlled by software on the server itself, or via remote backup systems that can back up multiple machines at once. Of course, you will need to make sure that you don't lose your backups, so reliable, off-site storage should be employed for your backups.

What if you did all these things but face the perfect storm – you lose a drive in your RAID array, which corrupted the data on the good drives, and now you need to restore information off your backups? Depending on your backup scheme, and the size of the backups, this can be a time-consuming process.

SAFETY IN NUMBERS

Thanks to your exquisite planning, you're in good shape, because you employed redundant servers! This too can take several forms. Depending on the need for uptime, it could take the shape of a warm backup server (a machine that is configured the same as the main server and has data periodically synchronized to it), to a cluster (many computers behind a load balancer that act as one), to what is known simply as the cloud (hundreds to thousands of computers that provide various computing services in a distributed manner). In the cases of clustering and cloud computing, you'd likely never know that the failure occurred as the troubled computer is often detected and removed from the group automatically. Load balancers, which spread the load between individual servers in a cluster, are themselves clustered so that one load balancer failure won't bring the network down.

At this point, you've wisely entrusted your data to a cluster – time to relax, right? Not necessarily, though unless you are a service provider, you're entering the realm of things you cannot control. Hosting services that employ clustering or cloud computing can also have problems, to which the answer is, you guessed it – more redundancy!

KEEPING THE LIGHTS ON

Beyond the servers themselves, you need to concern youself with the services they depend on. Network connections can be (and in our experience, have been) cut by construction equipment, rendering your beautifully crafted network unavailable? Generally, to get around such issues, providers will establish redundant connections to multiple providers. At our business park, they do just that – there is one connection to one ISP at one end of the park, and an equally quick connection to a different ISP on the other. Most large hosting consortiums are located in large cities near the main arteries of the internet so that they can get the same effect.

Large hosting facilities need to make sure that the other services that computers require remain available. Electricity is made redundant through multiple connections to the electric grid. Should that get compromised, vast arrays of batteries called UPS's (Uninterruptible Power Supplies) wait quietly, ready to switch on automatically to provide a couple extra minutes power to the arrays of servers — just long enough for enormous diesel powered generators to fire up. These generators are often stocked with enough fuel to keep networks going for days. Temperatures at these facilities can be an issue, with all those CPU's under one roof, so gargantuan ventilation systems and raised flooring are employed to keep air circulating around the servers.

The network services that all of these servers at the facilities also benefit from redundancy and caching. A great example of this is the worldwide DNS system, which takes care of translating domain names into the numeric IP addresses that computers use. It is a requirement that every registered domain have at least two DNS servers which can do this translation at any given time. More are encouraged, and running these DNS servers on separate networks is encouraged.

At NCM, resolution for many of our websites is done by servers in Blacksburg and in Salem. If one were to go down for whatever reason, the other would make sure that the website was still resolvable. The DNS service also makes use of caching to spread the load of resolving domains beyond just the authoritative DNS servers – any DNS server in the chain of resolution will remember the IP address for a certain time period, which ensures that the primary and secondary servers aren't hammered by requests.

REDUNDANCY IN PROCESS

In our sphere of web development, we even apply concepts of redundancy to our development processes. When we develop the various pages and scripts that comprise a website, we use a system called Subversion, which saves a copy of every version of every page in the site. This allows several developers to work on a website at the same time without worrying about overwriting each others work. If the Subversion system detects that two people have worked on the exact same part of a page, it will notify the developers so that they can work out exactly how it should look or function before saving. If things really go awry, developers can roll a page or site section back to any previous version that has been saved in Subversion.

Beyond Subversion, we try to achieve redundancy of knowledge through cross training and mentoring in-house. In the same way that a server can suffer ill effects if everything is saved on just one drive, a business can suffer if in-house knowledge is concentrated in too few places. We try get around this through copious documentation, use of collaborative systems such as wikis, and we try to make sure that our important business and development knowledge is spread out amongst multiple people in the various groups. This ensures that if someone were to go on vacation or is otherwise out of the office, other members of the group can fill in for them.

I hope this has given you a taste of some of the many layers at which redundancy figures into the development, hosting and operation of a website. Now, go get an external drive and back up those photos!

Speak up.

Respect.NewCity will never distribute, sell or otherwise treat your information like its ours to run around all willy-nilly, hither and yon with. That's because we appreciate your contribution to the conversation.

Search the blog.

Recent Comments:

From: Why ExpressionEngine?

Nice read.  Hopefully I’ll have some time to dive into EE this Summer. … - Brian Sewell


From: Using Opacity

As a access, he fled a compromise dance mechanizing timing to replace it to start for such ratings. The start at the many income of the buckshot must however create that the full style has needed the speed and usually offer … - jesse

Authors: