More detail about the forum downtime

For tech wizards and n00bs alike. Questions, answers, or just general hoo-haa.

Moderator: Moderators

Post Reply
Message
Author
User avatar
aj
Consistently Inconsistent
Posts: 1725
Joined: Wed Jul 30, 2008 10:13 am

More detail about the forum downtime

#1 Post by aj »

The only thing we know about what caused the random shutdown is this extract from messages.log:
Jul 26 11:33:15 s15353558 shutdown[16194]: shutting down for system halt
Jul 26 11:33:15 s15353558 init: Switching to runlevel: 0
Jul 26 11:33:15 s15353558 saslauthd[26087]: server_exit : master exited: 26087
Jul 26 11:33:23 s15353558 xinetd[25725]: Exiting...
Jul 26 11:33:24 s15353558 kernel: Kernel logging (proc) stopped.
Jul 26 11:33:24 s15353558 kernel: Kernel log daemon terminating.
Jul 26 11:33:25 s15353558 rsyslogd: [origin software="rsyslogd" swVersion="2.0.0" x-pid="25678"] exiting on signal 15.
Jul 26 14:41:09 s15353558 rsyslogd: [origin software="rsyslogd" swVersion="2.0.0" x-pid="22478"][x-configInfo udpReception="No" udpPort="51
4" tcpReception="No" tcpPort="0"] restart
So, essentially, we don't know much. At this point, we're guessing that the server was shutdown for a kernel update, since the kernel version has changed.

We don't know why the database wasn't cleanly shutdown - It could be a timeout function on the host, where it kills seemingly non-responsive process after x seconds.

What we do know is this:
1. The file for mysql that was storing the topic list is simply gone. Most likely suspect is that ReiserFS did not get to write everything to disk, and either didn't replay the journal properly, or lost the file reference. If we could have mounted the VPS disk and run a scan, chances are, we would have found the original file. However, without having access to a KVM/any other partition, this route is impossible.

2. The automatic backup worked once, and then didn't work. Exact cause of the failure is unknown, but I'm hoping the daily backup thing (as part of phpBB) will work. If not, mysqldump in a cron script will be used.

3. Not all the data in the topic list could be recovered. We can guess at which forum, what the title was, who posted it, and when it was posted because most of the stuff can be derived from the first post in each topic. Behind the scenes, there's a lot of stuff being stored, and thankfully, it saved us in this case.

Any questions, or is that satisfactory? |D
avwolf wrote:"No dating dog-girls, young man, your father is terribly allergic!"
y̸̶o͏͏ų̕ sh̡o̸̵u̶̕l̴d̵̡n̵͠'̵́͠t͜͢ ̀͜͝h̶̡àv̸e͡ ̛d̷̨͡o͏̀ne ̶͠͡t҉́h̕a̧͞t̨҉́.̵̧͞.͠͞.͟

RobbieThe1st
Templar GrandMaster
Posts: 706
Joined: Fri Dec 08, 2006 7:06 am
Location: Behind my computer.
Contact:

Re: More detail about the forum downtime

#2 Post by RobbieThe1st »

Go contact your host as soon as possible. On another website I host, we use eukhost.com. We ran into a similar problem(though I actually caused it; deleted the wrong files). They kept weekly backups automatically, and at our request simply dropped the copy into a temporary folder on our VPS. You might ask to do the same, then recover any files you can.

User avatar
No Clemency
Templar GrandMaster
Posts: 610
Joined: Tue Mar 10, 2009 5:01 pm
Location: Fort Worth, Texas

Re: More detail about the forum downtime

#3 Post by No Clemency »

I was curious what was going on with the forum today since it wouldn't let me on. I figured something must have went wrong, but it looks like for the most part we are back up and running again, so that is good to see. :)
Image
Image
Image

RobbieThe1st
Templar GrandMaster
Posts: 706
Joined: Fri Dec 08, 2006 7:06 am
Location: Behind my computer.
Contact:

Re: More detail about the forum downtime

#4 Post by RobbieThe1st »

No Clemency wrote:I was curious what was going on with the forum today since it wouldn't let me on. I figured something must have went wrong, but it looks like for the most part we are back up and running again, so that is good to see. :)
Different problem this time(check the dates): The server ran out of space, and well... Tom didn't have SSH access, and the web-interface filemanager won't run with the disk full.
I assume aj either fixed the problem with SSH, or called on Tech Support, though this happened after I left for the night so I don't know.

User avatar
aj
Consistently Inconsistent
Posts: 1725
Joined: Wed Jul 30, 2008 10:13 am

Re: More detail about the forum downtime

#5 Post by aj »

Irony at its finest: The solution to the problem of losing data was to automatically back the server up. But making the backups caused the server to run out of space.

Anyway, Tom managed to find someone at 1and1 that was helpful for a change. And so, all solved.

And backups are now only kept for 5 days, instead of indefinitely.
avwolf wrote:"No dating dog-girls, young man, your father is terribly allergic!"
y̸̶o͏͏ų̕ sh̡o̸̵u̶̕l̴d̵̡n̵͠'̵́͠t͜͢ ̀͜͝h̶̡àv̸e͡ ̛d̷̨͡o͏̀ne ̶͠͡t҉́h̕a̧͞t̨҉́.̵̧͞.͠͞.͟

User avatar
No Clemency
Templar GrandMaster
Posts: 610
Joined: Tue Mar 10, 2009 5:01 pm
Location: Fort Worth, Texas

Re: More detail about the forum downtime

#6 Post by No Clemency »

RobbieThe1st wrote:
No Clemency wrote:I was curious what was going on with the forum today since it wouldn't let me on. I figured something must have went wrong, but it looks like for the most part we are back up and running again, so that is good to see. :)
Different problem this time(check the dates): The server ran out of space, and well... Tom didn't have SSH access, and the web-interface filemanager won't run with the disk full.
I assume aj either fixed the problem with SSH, or called on Tech Support, though this happened after I left for the night so I don't know.
Oh, I see, that's ironic much. Oh well, hopefully this time everything gets straightened out and we don't hit another problem that forces the site down again.
Image
Image
Image

Post Reply