Community Forums
Connect with us on LinkedIn
Community Notice
+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 20
  1. #1
    Registered User
    Join Date
    Nov 2007
    Posts
    4

    Default CentOS 5, cPanel 11, kernel buggy ???

    Hello,

    (The box, 2 years now, never had issues before all you will read...)

    I do have a box at the IP xxx.xxx.xxx.xxx, which on 15th Nov. was: AMD 4200+
    with 300 GB hdd for OS, and CentOS 4.5 on it...

    Because the system was facing few load problems of wrong installed
    applications, we decided to do a fresh OS reload to the latest OS version and
    cPanel.

    on 16th Nov, the box was up and running with CentOS 5, and cPanel 11.

    the kernel is: 2.6.18-8.1.15.el5

    After few hrs the box was done and IO restored all backups in it anbd all
    worked fine, I realized that I couldnt access cPanel, page was blank for
    everything.
    Also, most pages (except simple html) where showing internal error 500.
    I did a reboot. probelm was fixed.
    After few hrs, problem came up again...
    I left ssh logged in to see whats wrong.
    I found out that every few hrs, that issue was happening:

    Message from syslogd@server at Sun Nov 16 11:47:21 2007 ...
    server kernel: journal commit I/O error

    Then, we had techs to check HDD for errors. they did fsck, and said disk has to
    be replaced (at this point, without wanting to offend anyone, I have to state
    that 90% of techs in datacenters, or at least at mine --won't name them, 99% you guess which DC it is...-- are just low paid
    students that don't know SIMPLE things...) so we told them to replace the HDD
    and do OS reload in new drive...

    So, on 17th Nov. we had online a new HDD, 400 GB with CentOS 5 loaded in it,
    and cPanel 11...

    After few hrs and all working and backups recovered, the issue came up again !

    This time, with exactly same error, techs told us it may be RAM, so they
    replaced the RAM and we waited...

    In 2,5 hrs, bang, it happens again, same error.

    They say, it might be the sata cable...

    we replace it...

    AGAIN ERROR.

    we ask them to loook at it seriously, and after lot pressure, to avoid case
    that mo/bo or controller is wrong, they do upgrade us in a BRAND NEW colocated
    box, which was this time: Intel Core 2 DUO, 6300, with brand new 500 GB HDD
    and new RAM.

    We recover in the new box the backups.

    Box is online today at 18th Nov...

    and the issue comes up again !!!

    Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
    server kernel: journal commit I/O error



    I ask them what the **** is going on now and why after having all hardware
    replaced with new, and all OS reinstalled clean on new drives and new devices,
    3 times, why we have again the same error...


    And their response: power cable was loose, we replaced power cable...


    Guys, sorry, this is really DUMP... loose cable cannot give that error...


    and the error... continues!




    Also, at this point, I have to let you know, that when this error comes up,
    (Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
    server kernel: journal commit I/O error) the filesystem becomes READ-ONLY and
    nothing gets affected, if we do a cold reboot by reset button, it comes up
    again and all work PERFECT, until the issue comes back again.....




    I am desperate with that, let me know what I have to do!!!

  2. #2
    Member rpmws's Avatar
    Join Date
    Aug 2001
    Location
    back woods of NC, USA
    Posts
    1,858

    Default

    tell that jack leg TX based hosting provider to put in a new drive. If they say it's a new drive then check the smart data and see if you can get drive data on it and a serial number if possible. It may be you got one of my old drives that they said they threw in dumpster by mistake.

    Not saying it's not your kernel or os .. but I have seen DC's just reinstall the same drive over and over. make sure these idiots aren't doing that to you.
    Just keeping my "eye" on things....
    R. Paul Mathews
    RPMWS - diehard cPanel Nutcase

  3. #3
    Member
    Join Date
    May 2004
    Posts
    14

    Default could be "Hardware compatibility issue"..

    Hi,

    I had same issues months ago at two of my Linux servers, one running cPanel, the other running directadmin.

    Initially I always thinks it is a software only issue since I have tried replacement of hardwares on both servers for few times, but the cases still happen occationally until one of my servers crashed becaused of this.

    I did lots of testings on these servers, realized some Software-Raid system is not working fine for Centos 5 kernel and FC5/6/7 kernels, or I can say they are not working fine for most of latest version Linux kernels. The read-write control of Ext3 system is using journal, in theory it shall be better than Ext2, however in many cases Ext3 will cause server into I/O issues more frequently due to false alarm on hardware performance especially when Soft-RAID is running together to synchonize data between hard disks..

    Please check if you are using any Soft-RAID or Ext3 file systems.

    Resolve using software refinement:
    1. Disable syslog to slow down the read/write freqency to disks.. test for few days to see if improves your system
    2. Switch off software need to read/write to disk frequently such as CSF schduled periodical checking and logging modules.
    3. Not sure the latest kernel patches resolved similar issues already or not.
    You may try "yum kernel update" and reboot to test the new kernel for your server.

    Resolve using hardware refinement:
    1. You can try to downgrade ext3 to ext2. (good choice for a live server with clients)
    2. switch off SoftRAID and use single hard disk.
    (disable softraid from bios is still not enough since hardware autodetection will still call softraid drivers for your two hard disks since you configured them as RAID1 previously.)
    Bad point, you may need to reinstall OS after this step since it may affect MBR.
    3. Last choice, remove 2nd drive and reinstall Centos5 with cPanel using Ext2. Currently one of my servers is using this way, so far so good for 1 month.

  4. #4
    Registered User
    Join Date
    Nov 2007
    Posts
    4

    Default

    I dont have soft-raid.

    it is two sata2 disks.

    disk sda is used for os and /home

    disk sdb is used for /backup


    thats all...



    -----------------



    all my partitions are EXT3, always...

    kernel is latest version...


    so, what i do now ?


    downgrade kernel would help ? and how is that possible to fetch old kernels from yum ?

  5. #5
    Member
    Join Date
    May 2004
    Posts
    14

    Default

    So you are using 2 SATAII hard disk at the server? During Installation, did you configure and format 2nd drive? If you have not yet configure 2nd drive, could you try to reboot from 1st drive only by removing 2nd drive for time-being, please monitor there is any Kernel Panic issue, if there is no such a kernel issue, then your issue might not be relevant to SoftRAID and Motherboard compatibility.

    Which server board are you using? or you are using usual ASUS/MSI boards? Which chipset for the motherboard?

    At first, you can try to disable Syslog from "service manager" in WHM, if it is due to Ext3 read/write issue, usually this will make server much more stable (but will never resolve the issue completely, when server is busy, same problem will still occur periodically)

    Which firewall are you using or you do not use any firewall currently? Did you turn off default firewall and Selinux before cPanel installation?

  6. #6
    Registered User
    Join Date
    Nov 2007
    Posts
    4

    Default

    I dont use softraid or any bios raid, the two ext3 disks are seperate, firsat drive has other data than the second and these two have no relation to each other... when we did OS reload, it was done on drive SDA, the second drive was not mounted as it has my backups.

    I did manually mount the second drive after format completed and have it as secondary drive mounted under /backup

    I dont use any firewall and syslog is disabled.

    I think i have an asus mobo, but not sure, can I see that from ssh ???


    Also, would I be able downgrade to an old kernel that does not have the bugs ???

  7. #7
    Member
    Join Date
    May 2004
    Posts
    14

    Default

    If you are using CentOS 5 kernel from the DVD, it shall be the initial kernel compiled for CentOS 5. I remember there was some new kernel release for CentOS 5 when you run "yum update kernel"

    If you are referring to downgrading to earlier kernels, you may need to reinstall OS using older version installation CDs. For each version of OS such as CentOS5, Centos4, there are only few compatible prebuilt kernels can be used. Usually it is not advisable to downgrade kernel without correct version support.

    After disabling Syslog, how is the server status, how frequent the I/O issue will still occur?

    Your case might be different from mine since you are not using SATAII RAID. If you find your server is still the same with frequent I/O issue after you disabled syslog. My suggestion is to check if your DC support has changed motherboard model for the new box, if the motherboard model was used, then best way to try is to change your motherboard to other model as well before you do anything for downgrading.

  8. #8
    cPanel Partner NOC cPanel Partner NOC Badge
    Join Date
    Sep 2003
    Posts
    397

    Default

    i think that i have your same issue!!

    my server go offline every morning at 4.25AM, checking /var/log/messages i have always this error:

    Nov 18 04:12:57 venus snmpd[2978]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, password , descr Systems Management SNMP MIB Plug-in Manager
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): htree_dirblock_to_tree: bad entry in directory #5210649: rec_len % 4 != 0 - offset=0, inode=1970562
    386, rec_len=28274, name_len=45
    Nov 18 04:12:58 venus kernel: Aborting journal on device sda7.
    Nov 18 04:12:58 venus kernel: ext3_abort called.
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
    Nov 18 04:12:58 venus kernel: Remounting filesystem read-only
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7) in start_transaction: Journal has aborted

    on sda7 i have /home partition and i have already run a fsck without found error on it, i have try to upgrade also to the latest RHEL kernel but the problem persist, how can i fix?

  9. #9
    Member
    Join Date
    Apr 2005
    Posts
    318

    Default

    Yup, definitely something incompatible.

    IMHO it might be sata drivers... Anyway guys at datacenter should know and have hardware and OS which they have tested and which works fine.
    http://www.crohoster.com/
    quality hosting services and managed dedicated servers

  10. #10
    Member
    Join Date
    May 2004
    Posts
    14

    Default

    Quote Originally Posted by MMarko View Post
    Yup, definitely something incompatible.

    IMHO it might be sata drivers... Anyway guys at datacenter should know and have hardware and OS which they have tested and which works fine.
    I believe now data centre staff will get more headache due to various hardware compatibility issues since there are too many choices for CPU/MB/RAM/Hardisk.

    It is very difficult to say which combination is the most stable one since it needs long time for evaluations. By the time evaluation is done, new users might insist to use latest new hardware with newer technologies, so that needs another round of testings on new hardware.

  11. #11
    Member
    Join Date
    Jul 2005
    Posts
    24

    Default

    You can try to boot with acpi=off

    Add this to the end of end bootloader command

    acpi=off noacpi

    I have the same issue. I've changed 5 new harddisk and new server board, and ram in one month time.

  12. #12
    Member
    Join Date
    May 2004
    Posts
    14

    Default

    Quote Originally Posted by sampride View Post
    You can try to boot with acpi=off

    Add this to the end of end bootloader command

    acpi=off noacpi

    I have the same issue. I've changed 5 new harddisk and new server board, and ram in one month time.
    Do you mean you resolved your previous issues by disabling acpi?
    Is "Advanced Configuration and Power Interface"related to disk read/write operations? I never touch on that settings before, if you find this can resolve I/O and Journal issue, it is very worth to have some testings on those problematic motherboards to see if that caused journal failure.

  13. #13
    Member
    Join Date
    Jul 2005
    Posts
    24

    Default

    Yes, I took down the server and move the backup to a desktop board, so far no problem. And I took the problematic server for testing. After a long testing using acpi=off, error never came back.

  14. #14
    Member
    Join Date
    Oct 2006
    Posts
    5

    Default

    uuuuggghhh I replaced about every hardware in the server to try to find a fix to this issue! I am glad someone found it!

  15. #15
    Member
    Join Date
    Sep 2007
    Posts
    6

    Default *sigh*

    Well after 2 hdd failures, we have put in this change as well .... will let you all know how it goes :/

    Can I just confirm that everyone has been using centos 5?

Similar Threads & Tags
Similar threads

  1. Buggy wget with CentOS 5.x / cPanel 11
    By cyberdolphin in forum cPanel Developers
    Replies: 2
    Last Post: 05-18-2008, 08:27 AM
  2. New Kernel 2.6.9-55.0.12.ELsmp for CentOs
    By bsasninja in forum cPanel and WHM Discussions
    Replies: 1
    Last Post: 12-13-2007, 07:46 AM
  3. Is CentOS 4.3 , kernel 2.6.9-34 stable ?
    By fastdns in forum cPanel and WHM Discussions
    Replies: 21
    Last Post: 10-07-2006, 06:07 AM
  4. Kernel Update on centos 3.6
    By TDD in forum cPanel and WHM Discussions
    Replies: 10
    Last Post: 09-02-2006, 12:13 AM
  5. Help about Kernel and Centos
    By emeric21 in forum cPanel and WHM Discussions
    Replies: 5
    Last Post: 01-13-2005, 02:05 PM
Linkedin       Facebook       Twitter       RSS       Flickr       YouTube