People have been wondering why my server has been going up and down since Christmas morning. It hasn't been hacked or anything along those lines. I do want to thank everyone who was worried about that and offering temporary space/hosting until I got things resolved. This is not really how I wanted to spend my Christmas, but I figured it was the best time for maintenance. My server will probably be up and down a few more times this week until I finally finish everything (hopefully by the end of the week).
Updated 1/17/2007: Holy S*%$#, I just got a call that they are still working on it!
Updated 1/10/2007: This is taking them forever as its still in progress and my server is now a bit of a mess from it :/. All I need is the server installed with an OS. Can it really take them this long????
I have been running Red Hat 9 for quite some time now and decided it was finally time to upgrade it to a more recent Fedora release. The biggest reason for this was when I tried to build PHP 5.2 on it. The XSL extension now requires a 1.1.x version (the 1.1.x versions have been out for a couple of years) - yet my installed version was of the 1.0.x line. On the other hand, I have kept the libxml2 libraries up to date. I figured it was time to bring them both up to the latest versions, so went about to build some rpms - just to keep things in synch. Come to find out the newer libxml2 needs a newer version of python than what I had installed on my server. I really didnt want to build libxml2 and libxslt without python support and trying to update python and all the rpm dependencies was going to be a real nightmare, so I figured an OS upgrade would get me up to date with everything more easily (and this is where things got fun).
This is a remote dedicated server so I all I had was my handle terminal window and the yum command line tool. I found some great reference material on upgrading from older RH releases to recent Fedora releases. One worth mentioning can be found here (be sure to read everything including notes about a specific upgrade before attempting). It was pretty straight forward and worked other than an issue which I spent a few days tracking down.
Trying to speed things up a bit I upgraded from RH 9 directly to Fedora Core 2. The instructions for the Core 3 upgrade seemed a bit longer so I was going to do that as a small incremental from 2=>3. Upgrade went very smooth and quick, with a successful reboot showing me the nice Fedora Core 2 indicator. The one thing I hadn't done yet was to boot into the new kernel. The server was still running the 2.4.25 kernel and not the 2.6.10 one from the FC2 upgrade. I have done a number of upgrades from RH to FC including a test run on a different machine going from RH9 to FC2 using yum and encountered no problems. I never imagine I would run into a problem and blindly set lilo to boot into the 2.6.10 kernel as its default and rebooted. BIG MISTAKE: I will never blindly set a default kernel again without at least a test boot into it.
After a few minutes, I knew something was wrong. I could no longer access the server. This was Christmas morning about 10:30am EST and I had to get in touch with my hosting provider. I knew they were going to be thrilled about this one. It turns out my provide, AIT, was understanding about the issue, though it did take a while to get the server back online - mostly due to the cause of this problem.
To AIT and their sys admins, it looked like the server was up and running, but for outside the network, the server seemed to be dead. Once they manually rebooted the server and selected my old working kernel, the server was back online and I started to investigate a bit. From the boot logs, everything looked fine. The 2.6.10 kernel had come up, all devices started fine and there were no errors. At this point I was just happy that it was back up running so started backing everything up again. The next day I took a look at the firewall logs and noticed that they were empty during the time the 2.6.10 kernel was running. This was very odd because the log shows the nic was started successfully and showed no signs of crashing.
Come to find out my server has 2 of the same nics installed but only one that gets initialized. I finally came across some similar stories from people who had the same exact issue as I after upgrading from a 2.4 kernel to a 2.6 kernel. It seems that the order the nics come up is not guaranteed to be the same and in fact they came up in different order, thus getting their names swapped and getting completely wrong configurations. My configuration for the outside nic ended up getting set on the unused nic.
I found two solutions to my problem. The first was the use of ifrename/nameif directly. These tools allow nics to be explicitly named based on MAC addresses. Because the names need to be known when the modules are loading, using these tools requires some changes to my network script. The second method I found was using udev rules calling out to nameif. This is much more straight foward and just requires the creation of a rule file that looks like the following:
nameif eth1 00:1f:33:3b:45:1a
nameif eth0 00:1f:33:3b:45:1b
This method also was more inline with my plans because Fedora uses udev and I wanted to upgrade to at least FC 4, so it was the more logical choice. It also allowed me to keep the stock network script rather than modding it. I became a bit more cautious with my upgrade plan this time.
Due to the radical upgrade I was making I decided to leave it up to the staff at AIT to get me a working FC4 server and let me go from there. This is the first time I have ever had to pay for support (the server is a self managed system) due to the extent of work this problem caused and what I wanted done, but their pricing is very decent and its worth every dime. This is going to be a clean FC4 install, which means I need to re-install everything, but at least there will be no extraneous packages left on the server and the hardware will be properly setup from the install. I expect everything to be finished up and back to normal by the end of the week. This will definitely be my last entry until the upgrades are completed so that I don't have to keep backing things up until then.