As many of you are aware, we recently suffered a significant server outage over the past weekend (March 7, 8, 9).
We should NEVER EVER be down for more than an hour again. Being down for 52 hours was unacceptable and we need a solution to that problem.
Since the current hardware has been in place we do regular backups of the data. So all your accounts, post content, front page news and everything else can be restored.
However, as we learned from this weekend, there are a few procedural things that we didn't take into account when it comes to the question of "What happens when the hardware breaks?"
Sure -- the data will be there when it gets fixed, but if you guys, the users, can't connect to us at all during that time, that's a pretty huge pain in the ass. Since some of you have expressed interest in helping out, we decided the best thing to do was to try to implement such a solution.
Unlike managed or virtualized hosting like you find at GoDaddy or other such places, in which your site is part of a collective of virtual machines, The Allspark is unique in the regard that we have our own dedicated and powerful web server. Gone are the days of random database errors or the inability to withstand huge pulses in traffic due to events like Botcon or SDCC.
It also means that I have the freedom to install and customize whatever I want and whatever we need. We set our own policies and everything we do is ours.
The only downside to that method, as it is now, is that when hardware failure happens -- and lets face it, you run a computer non-stop 24 hours a day, every day, for years on end, it's bound to happen -- we don't have a failover site to act as a buffer between the failure and the repair.
So the solution I'm proposing is simple: We install another physical machine at the webhost along side our current machine. This machine will not only house all the hourly backups but can also, in an emergency, act as that failover device. It's as simple as pointing the domain to the other machine's IP address on the network. To you guys, the transition will be so seamless that it's doubtful you'll ever know there's a problem in the first place. And that's the way it should be in any data environment.
The way it will work is this:
Allspark Server A is the main server. It's always on and is set to back up and send the data to Allspark Server B at regular intervals.
If Allspark Server A fails like it did on Saturday, I can simply point the domain to Server B which will have a live backup of the server running. It will not need to be as powerful a CPU or have as much RAM, but will still serve as a live failover server should A fail for some reason. Then when Server A is back up, we move the live backups from B back over to A, point the domain to that machine and things continue normally.
And to make this easy, that's what this fund will be for. A modest backup unit that can handle the load of an average day's worth of traffic. It won't have and doesn't need all the bells and whistles of the main server. (Which, for those who are interested is a 6 Core CPU, 16GB RAM and a RAID5 2 terabyte disk array).
We'll also need a nice gigabit router to let the machines talk to each other on the network since they'll be transferring a lot of data back and forth.
I've been doing research on this and have determined that the cost for the machine itself is going to be around 500-700 dollars due to the specs I'm looking for as well as the physical size (It has to fit in the same space as the current server in the host's server room). The router/switch is going to be between 100-150 dollars and it probably wouldn't hurt to add a spare hard drive or two to the current RAID array.
Therefore, I've set the funding goal to $950 to give us a buffer. Should we exceed that, we can put some of that aside for the monthly hosting fees or do something more interesting like add additional features to the site and forums.
This will be a pretty simple campaign with the offer of a supporter rank on the forums and the benefit of knowing that your contributions will allow the community to continue to thrive.
Thanks for reading!
View the full article