spark@onestopknifeshop.com
Wise Guy
Talk about a frigging comedy of errors...
The last few days has definately been a stress-fest to say the least.
Here's the lowdown on what happened, and what's going on...
TFL and BladeForums.com currently reside on a server co-located with a Jacksonville, Fl (my area) ISP. This ISP was bought out a few months ago, and they've been building a new Class 5 facility locally for their operations (Class 5 means it can take some seriously bad ju-ju and still operate).
This means that when the building was completed, all the servers were scheduled to be moved over to the new facility. Unfortunately, prior to this move, the ISP failed to provide us with all of the information that we needed in order to make a decision on whether to stay or go. We said that pending more infomation, we'd stay, but we've been hounding them for the information. Naturally, they managed to overlook us completely, despite our calling twice a week, for the last 3 months.
So, Monday, bright and early, we come into work and find that we have no access to the server. Why? Because our ISP chose that morning to move everything to the new facility. Joy.
And they didn't contact us. Great.
And when we called them to find out why, they gave us attitude that it was our fault for not filling out the paperwork that we should have gotten, and for not contacting them (?!?!? Mistake #1, never assume that even though you do everything you can to make sure that the bases are covered, that they will be covered. AKA, never underestimate the incompetence of others).
After some nice screaming and threats of bodily harm, the person in question was informed that he was in error by his subordinates, and plans were made to get us back on the network.
Cut to scene two - my boss and I go to visit our server, sitting all by itself in this now mostly vacant building, boy is it dark and quiet without all the other servers! We'll, since the server is off the network anyway, this sure seems like a good time to upgrade the server software and kernel! (Mistake #2 - no upgrade ever goes smoothly, especially on mission critical equipment).
Naturally, the upgrade crashes and burns, leaving us with a server that doesn't serve webpages. Around this time, we're hooked back onto the network though... thank heavan's for that, one less problem to track down...
This sounds like a call for Tech Support! Unfortunately, unlike Microsloth, RedHat's tech support is $225 per incident, and it took 1 of those to get us to even get the server to the point where it would connect to the network. It turns out that the person who set up our server did so in a "non-standard configuration" (Mistake #3, never think that the previous guy did things the right way) which wound up screwing us when we upgraded, and it took the best part of 3 hours to even get it to the point where it would look out at the network (though it wouldn't serve pages - the connection refused errors you guys were seeing).
At that point, the tech said, sorry, follow the installation instructions, and this is where I say "Time", you are at the end of your call. (?!?!?).
Fine, time to call it a night, regroup, and come back refreshed and ready for battle. Since we're now at the point where I can remote into the server without a problem, we're a lot better off anyhow because I can work from the office.
I come in yesterday bright and early, and call RH tech support again for another troubleshooting call. This was at 0900. It doesn't end until 1630, at which time BladeForums.com and TFL are back online. Then we have the other problem with the forums not quite working correctly after the server upgrade, which lasts until around midnight, when I finally get that tracked down. I finally got the server and all of the hosted sites up 100% today at 1700.
But, in the end, the server software is upgraded, things are all patched up, and everything seems to be working just fine right now, though there are still some background issues I'm looking at.
Just a heads up, we'll be physically moving the server some time tomorrow (it's still by itself in the old building, remember?), so expect an outage for about an hour or so while we move it.
Rich, you can turn the search back on, and when you are trying to update threads, do it at 30 per, that should stop the crashing problem you were having.
Thanks for your patience, guys, I appreciate it. I'm not a Linux guru, but I busted my ass to get us back, and I'm hoping that we won't have to go through that any time soon.
Spark
------------------
Kevin Jon Schlossberg
SysOp and Administrator for BladeForums.com
www.bladeforums.com
The last few days has definately been a stress-fest to say the least.
Here's the lowdown on what happened, and what's going on...
TFL and BladeForums.com currently reside on a server co-located with a Jacksonville, Fl (my area) ISP. This ISP was bought out a few months ago, and they've been building a new Class 5 facility locally for their operations (Class 5 means it can take some seriously bad ju-ju and still operate).
This means that when the building was completed, all the servers were scheduled to be moved over to the new facility. Unfortunately, prior to this move, the ISP failed to provide us with all of the information that we needed in order to make a decision on whether to stay or go. We said that pending more infomation, we'd stay, but we've been hounding them for the information. Naturally, they managed to overlook us completely, despite our calling twice a week, for the last 3 months.
So, Monday, bright and early, we come into work and find that we have no access to the server. Why? Because our ISP chose that morning to move everything to the new facility. Joy.
And they didn't contact us. Great.
And when we called them to find out why, they gave us attitude that it was our fault for not filling out the paperwork that we should have gotten, and for not contacting them (?!?!? Mistake #1, never assume that even though you do everything you can to make sure that the bases are covered, that they will be covered. AKA, never underestimate the incompetence of others).
After some nice screaming and threats of bodily harm, the person in question was informed that he was in error by his subordinates, and plans were made to get us back on the network.
Cut to scene two - my boss and I go to visit our server, sitting all by itself in this now mostly vacant building, boy is it dark and quiet without all the other servers! We'll, since the server is off the network anyway, this sure seems like a good time to upgrade the server software and kernel! (Mistake #2 - no upgrade ever goes smoothly, especially on mission critical equipment).
Naturally, the upgrade crashes and burns, leaving us with a server that doesn't serve webpages. Around this time, we're hooked back onto the network though... thank heavan's for that, one less problem to track down...
This sounds like a call for Tech Support! Unfortunately, unlike Microsloth, RedHat's tech support is $225 per incident, and it took 1 of those to get us to even get the server to the point where it would connect to the network. It turns out that the person who set up our server did so in a "non-standard configuration" (Mistake #3, never think that the previous guy did things the right way) which wound up screwing us when we upgraded, and it took the best part of 3 hours to even get it to the point where it would look out at the network (though it wouldn't serve pages - the connection refused errors you guys were seeing).
At that point, the tech said, sorry, follow the installation instructions, and this is where I say "Time", you are at the end of your call. (?!?!?).
Fine, time to call it a night, regroup, and come back refreshed and ready for battle. Since we're now at the point where I can remote into the server without a problem, we're a lot better off anyhow because I can work from the office.
I come in yesterday bright and early, and call RH tech support again for another troubleshooting call. This was at 0900. It doesn't end until 1630, at which time BladeForums.com and TFL are back online. Then we have the other problem with the forums not quite working correctly after the server upgrade, which lasts until around midnight, when I finally get that tracked down. I finally got the server and all of the hosted sites up 100% today at 1700.
But, in the end, the server software is upgraded, things are all patched up, and everything seems to be working just fine right now, though there are still some background issues I'm looking at.
Just a heads up, we'll be physically moving the server some time tomorrow (it's still by itself in the old building, remember?), so expect an outage for about an hour or so while we move it.
Rich, you can turn the search back on, and when you are trying to update threads, do it at 30 per, that should stop the crashing problem you were having.
Thanks for your patience, guys, I appreciate it. I'm not a Linux guru, but I busted my ass to get us back, and I'm hoping that we won't have to go through that any time soon.
Spark
------------------
Kevin Jon Schlossberg
SysOp and Administrator for BladeForums.com
www.bladeforums.com