View Single Post
  #204  
Old 09-24-2008, 10:36 PM
cavedude's Avatar
cavedude
The PEQ Dude
 
Join Date: Apr 2003
Location: -
Posts: 1,988
Default Bad news

Alright, so here's where we are at, and it isn't good.

When the machine came back up today, we noticed our /home directory had reverted to a state it was when this machine was first built. /home is where our website is, and where we run the game server from, along with all our of personal work directories. PEQ is setup using a mirrored array, for those familiar with drive redundancy/backup solutions. At first glance, it appeared a drive in the array failed, and the mirror failed to rebuild the data properly. Further investigation shows we have a controller that isn't responding to the kernel at all. So, without physically inspecting the machine we can conclude that either:

1. We have a bad/loose drive cable.
2. We have a drive that is so dead it is preventing the controller from working properly.
3. We have a failed controller and as such the motherboard will need to be replaced.

I'm not sure which is worse, as on one hand we need to replace a drive, and have a good chance of losing very important data. On the other, we need to replace the motherboard. Though, in that case our data may be safe, unless the controller took out the drive. Either way, replacing hardware has to wait for now.

Fathernitwit is the only team member that has physical access to the box. Unfortunately, he is going away for a month very soon. He is going to try to get to the datacenter this weekend and see about assessing the situation in person. We then have a few options for getting the game server back up in the meantime:

1. A minor problem occurred (cable came loose) FNW fixes it, we are back up to 100% (Wishful thinking, probably will not happen)
2. We have a dead drive, and FNW is able to get data off. (Not likely considering FNW's very limited time)
3. FNW disconnects the drives on the problematic controller and I manually rebuild the game server and get us back up. (We can't risk rebuilding the game server now, having the "bad" controller come back up for some reason, and destroy the old good data when the mirror updates)
4. FNW isn't able to get to the DC, or it turns out to be an issue we didn't expect. In that case, we were offered to use a server for the time being. If the offer still stands, we may have to go that route.

My goal is to get the game server up this weekend, when FNW gets to the DC. Once he returns from his trip, he and I will discuss permanent solutions, assuming it is a hardware issue.

Now, I'm sure you're all wondering what data might we lose? Well, fear not the drive our databases are stored on is healthy, and current. So, your characters, forum data (posts, etc), quest status data, player points data, etc are all safe and sound. Both FNW and I have backed all of that data up to our personal machines as well. What we might lose, is the website code. The forum code, all of our tools, editors, scripts, etc. It can be reconstructed, but it will be painful :( We also might have lost the game server directory, but that is simply a matter of uploading my server directory from my test machine, and changing a few settings. Misc internal scripts, PEQBot, and other such things that made my life easier may also be on the chopping block.

It could be much, much worse of course. But, it still sucks. I'll know more this weekend, so hold tight.
Reply With Quote