Hello,
(The box, 2 years now, never had issues before all you will read...)
I do have a box at the IP xxx.xxx.xxx.xxx, which on 15th Nov. was: AMD 4200+
with 300 GB hdd for OS, and CentOS 4.5 on it...
Because the system was facing few load problems of wrong installed
applications, we decided to do a fresh OS reload to the latest OS version and
cPanel.
on 16th Nov, the box was up and running with CentOS 5, and cPanel 11.
the kernel is: 2.6.18-8.1.15.el5
After few hrs the box was done and IO restored all backups in it anbd all
worked fine, I realized that I couldnt access cPanel, page was blank for
everything.
Also, most pages (except simple html) where showing internal error 500.
I did a reboot. probelm was fixed.
After few hrs, problem came up again...
I left ssh logged in to see whats wrong.
I found out that every few hrs, that issue was happening:
Message from syslogd@server at Sun Nov 16 11:47:21 2007 ...
server kernel: journal commit I/O error
Then, we had techs to check HDD for errors. they did fsck, and said disk has to
be replaced (at this point, without wanting to offend anyone, I have to state
that 90% of techs in datacenters, or at least at mine --won't name them, 99% you guess which DC it is...-- are just low paid
students that don't know SIMPLE things...) so we told them to replace the HDD
and do OS reload in new drive...
So, on 17th Nov. we had online a new HDD, 400 GB with CentOS 5 loaded in it,
and cPanel 11...
After few hrs and all working and backups recovered, the issue came up again !
This time, with exactly same error, techs told us it may be RAM, so they
replaced the RAM and we waited...
In 2,5 hrs, bang, it happens again, same error.
They say, it might be the sata cable...
we replace it...
AGAIN ERROR.
we ask them to loook at it seriously, and after lot pressure, to avoid case
that mo/bo or controller is wrong, they do upgrade us in a BRAND NEW colocated
box, which was this time: Intel Core 2 DUO, 6300, with brand new 500 GB HDD
and new RAM.
We recover in the new box the backups.
Box is online today at 18th Nov...
and the issue comes up again !!!
Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
server kernel: journal commit I/O error
I ask them what the **** is going on now and why after having all hardware
replaced with new, and all OS reinstalled clean on new drives and new devices,
3 times, why we have again the same error...
And their response: power cable was loose, we replaced power cable...
Guys, sorry, this is really DUMP... loose cable cannot give that error...
and the error... continues!
Also, at this point, I have to let you know, that when this error comes up,
(Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
server kernel: journal commit I/O error) the filesystem becomes READ-ONLY and
nothing gets affected, if we do a cold reboot by reset button, it comes up
again and all work PERFECT, until the issue comes back again.....
I am desperate with that, let me know what I have to do!!!



LinkBack URL
About LinkBacks
Reply With Quote







