Issue With cPanel / WHM or Server Crash

BrianP

Registered
Jan 10, 2007
1
0
151
Hello,

This may be a bit of a stupid question, but it's been happening for a while and I'm not sure of the issue. I'm more used to running Windows environments, so I'm still a little new at the Linux stuff (although I do have 3 Linux servers, so I'm trying to learn).

The Server was running great when I got it about 1 & 1/2 months ago. Suddenly it started this thing where it would just crash out of no where. It didn't really phase me though, I had just started uploading my website to it & doing a little development (MySQL database only so far). All I needed to do was request a reboot (2 domains on the server). It crashed about once a week, I still thought it was something I had done to it (not config wise, but crash wise)...

About a week or so ago it started picking up on the crashes. It would crash about once per day. This is when I started speaking with my Data Centre and questioning about possible hardware issues.

Today it started crashing about ever few minutes. I'll request a reboot, the DC will respond (within 3 - 5 min or so) and about 8 or so minutes it would be down again.

I'm curious what could be causing this... and more importantly (especially for future reference) where are my log files :p (newbie question?).

Any ideas would be great.

Thanks,
--Brian


EDIT: My Data Centre just replaced the RAM, we'll see how this goes. This seem like a reasonable resolution to the issue, and what could have caused it?
 
Last edited:

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,453
31
473
Go on, have a guess
Physical RAM problems aren't uncommon and are probably the reason for most regular servers crashes, so the actions of your datacenter are good.

If it is a hardware issue, then the logs won't be of much use, but the main one to consider after a crash is /var/log/messages

It might also help to know which OS and kernel version you are running.
 

Krownet

Active Member
Sep 16, 2005
38
0
156
It crashed again :(

I'm running Fedora Core 5.

Huh so I did have another account (was using a different comp for my first post).

Anyways, same person, my FC5 server is going all messy still
 
Last edited:

Krownet

Active Member
Sep 16, 2005
38
0
156
What it looks like is corrupted files on the HD. Either a bad install, bad config (although I dont know what) or a bad HD. A nuke & pave seems to have cleaned up the issue ... although if it was a HD issue .. I may be seeing the same issue, at which time a new HD will be tried.

Cheers,
 

Interdit

Well-Known Member
May 27, 2003
70
0
156
They checked the ram and the hdd already, no errors.

What is the next step ?
Would a kernet upgrade fix this ?

The server crashed as I was transfering a large Cpanel account.

The ssh told me that then no answer, typing reboot told me unknown command,...

Message from [email protected] at Fri Feb 9 01:22:52 2007 ...
alpha925 kernel: Disabling IRQ #5

Message from [email protected] at Fri Feb 9 01:22:52 2007 ...
alpha925 kernel: journal commit I/O error
We are up and down since two days, the only temp fix is a manual reboot with a filesystemcheck.

Anyone genius ?

Thanks,
Francois
 

xufeng

Member
May 13, 2004
14
0
151
Having same issue right now..

Hi,

One of my new cPanel servers gets the same error recently, it occurs every day twice or more. After a fresh reboot it returns normal again.

I have searched all log files for any technical issues, but still can not find the actual cause of this. For hardware, there was no issue with fsck scanning. The server never went down with hosting of few testing website in the first week, until I migrate all user accounts from another old cPanel server, this problem started to give me headache.

How did you manage to resolve your server, could you give me some advise?

my Email: [email protected]
 

Krownet

Active Member
Sep 16, 2005
38
0
156
Hi,

One of my new cPanel servers gets the same error recently, it occurs every day twice or more. After a fresh reboot it returns normal again.

I have searched all log files for any technical issues, but still can not find the actual cause of this. For hardware, there was no issue with fsck scanning. The server never went down with hosting of few testing website in the first week, until I migrate all user accounts from another old cPanel server, this problem started to give me headache.

How did you manage to resolve your server, could you give me some advise?

my Email: [email protected]

I'm not sure the actual cause of the issue. I got my Data Centre to wipe the HD and do a reinstall of cPanel and the OS (a nuke and pave). It seemed to clear the issue up. Although I don't have that issue anymore, a few months later I moved my clients to a faster server and it has been doing great ever since.
 

xufeng

Member
May 13, 2004
14
0
151
I'm not sure the actual cause of the issue. I got my Data Centre to wipe the HD and do a reinstall of cPanel and the OS (a nuke and pave). It seemed to clear the issue up. Although I don't have that issue anymore, a few months later I moved my clients to a faster server and it has been doing great ever since.
I see, so for the worst case, I will have to prepare another server to replace the current one.

You mean your reinstallation make everything fine? .... does that mean this error could due to kernel compatible issue?

My case is bit strange, the server was ok at 1st week after setting it up with few test domains. Everything was smooth at the beginning until I finished migration of all accounts from another cpanel. Therefore, I always suspect something wrong in system configuration during migration or there are security issue in some client's scripts, which caused problem after migration.
 

Krownet

Active Member
Sep 16, 2005
38
0
156
My server was running fine for about a month then started acting up. I did do a few account migrations (some of my domains from an older server to this one). It started acting up and after working with my data centre to resolve the issue, it was mutually decided that a reinstall would do the trick.

I only moved to a different server later because of changing needs. After the reinstall, everything seemed fine. We (the DC and I) thought it was one of three things: (1) the Hard Drive was failing (b) It was a corrupt cPanel install [only now rearing its ugly head] (iii) I screwed something up :p
 

xufeng

Member
May 13, 2004
14
0
151
My server was running fine for about a month then started acting up. I did do a few account migrations (some of my domains from an older server to this one). It started acting up and after working with my data centre to resolve the issue, it was mutually decided that a reinstall would do the trick.

I only moved to a different server later because of changing needs. After the reinstall, everything seemed fine. We (the DC and I) thought it was one of three things: (1) the Hard Drive was failing (b) It was a corrupt cPanel install [only now rearing its ugly head] (iii) I screwed something up :p

:) For the hard disks, both are brand new Seagate SATA II 320GB. Scanned through without any issue; for the 2GB RAMs, working well in 1st week, so I do not think they are bad.

I only remember that I installed an firewall with whm interface according to configserver's guide just before accounts migration, initially I was thinking if the busy firewall activities will access the hard disk too often to cause hard disk journal IO error, but based on cPanel support saying "never see any software firewall cause disk IO issues", that should be true since I also never see such a case before.

By the way, there was still a RedhatEnterprise5 RAID driver installed in my system during the 1st installation, should I remove it to test
 

CoNfOuNd

Member
Feb 20, 2004
18
0
151
Ireland
Did anyone ever solve this?

This is happening with one of my servers which is only a month old. I had technicians look at it, the datacenter, and cpanel support. No one could solve it.

As a last resort I ordered another server, got it set up with centos5 too and bought a cpanel license. Here's what happened next:

[[email protected] ~]#
Broadcast message from root (Wed Oct 3 15:37:07 2007):

cPanel Layer 2 Install Commencing

Message from [email protected] at Wed Oct 3 15:43:31 2007 ...
localhost kernel: journal commit I/O error

[[email protected] ~]# uptime
15:59:06 up 4:07, 2 users, load average: 0.03, 0.23, 0.90
[[email protected] ~]# pico /var/log/messages
Bus error
 

xufeng

Member
May 13, 2004
14
0
151
Did anyone ever solve this?

This is happening with one of my servers which is only a month old. I had technicians look at it, the datacenter, and cpanel support. No one could solve it.

As a last resort I ordered another server, got it set up with centos5 too and bought a cpanel license. Here's what happened next:

[[email protected] ~]#
Broadcast message from root (Wed Oct 3 15:37:07 2007):

cPanel Layer 2 Install Commencing

Message from [email protected] at Wed Oct 3 15:43:31 2007 ...
localhost kernel: journal commit I/O error

[[email protected] ~]# uptime
15:59:06 up 4:07, 2 users, load average: 0.03, 0.23, 0.90
[[email protected] ~]# pico /var/log/messages
Bus error
Your error comes out before completion of cPanel installation or after? If it is before installation, then it could be hardware issue...
 

CoNfOuNd

Member
Feb 20, 2004
18
0
151
Ireland
It is before or during. Not after as the install never completes.

It's an identical error to the previous server, so I am assuming the problem is not hardware related (what's the chance?) but something to do with the distro, kernel or system files.
 

xufeng

Member
May 13, 2004
14
0
151
It is before or during. Not after as the install never completes.

It's an identical error to the previous server, so I am assuming the problem is not hardware related (what's the chance?) but something to do with the distro, kernel or system files.
My servers issues all happened after complete installation. Usually before or during cPanel installation, there should not be issues since there is no active domain using any server resource. Are you using CentOS 5 or any other OS? Journal I/O issues were ever reported in some early Linux kernel versions. I think your case should be quite different from mine, you should try to install latest kernel or rebuild it.

Another possible issue could be your RAID card driver never installed properly to cause read-write failures and made OS went to protected mode regularly.

If you confirm that you have ever changed hard disk and RAM and still getting the same error, then you can consider above suggestions to check your servers.

Anyway, latest updates for both of my server issues, they were both resolved after using back APF/BFD. I have collected feedbacks from clients, they told me now the server is even faster than early days using CSF. Therefore, I think that there must be somewhere I did not configure properly in CSF/lfd settings to cause over-use of server resources and downtimes. I may still prefer to try CSF in other new servers with different OS in future to see if there is any compatibility issue for CentOS 5 kernel with CSF. I still like its nice integration with WHM.
 
Last edited:

CoNfOuNd

Member
Feb 20, 2004
18
0
151
Ireland
It was Centos 5 but I've decided to install Fedora. I think it must be related to the distribution, because the chances of two servers suffering exactly the same hardware problems is slim?

It was an identical server I ordered to replace the first faulty server, but I still think chances are very small. Once I install Fedora, I'll just leave the server for 42-72 hours and see what happens!