The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

CentOS 5, cPanel 11, kernel buggy ???

Discussion in 'General Discussion' started by Cybersoft Plus, Nov 18, 2007.

  1. Cybersoft Plus

    Cybersoft Plus Registered

    Joined:
    Nov 18, 2007
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    1
    Hello,

    (The box, 2 years now, never had issues before all you will read...)

    I do have a box at the IP xxx.xxx.xxx.xxx, which on 15th Nov. was: AMD 4200+
    with 300 GB hdd for OS, and CentOS 4.5 on it...

    Because the system was facing few load problems of wrong installed
    applications, we decided to do a fresh OS reload to the latest OS version and
    cPanel.

    on 16th Nov, the box was up and running with CentOS 5, and cPanel 11.

    the kernel is: 2.6.18-8.1.15.el5

    After few hrs the box was done and IO restored all backups in it anbd all
    worked fine, I realized that I couldnt access cPanel, page was blank for
    everything.
    Also, most pages (except simple html) where showing internal error 500.
    I did a reboot. probelm was fixed.
    After few hrs, problem came up again...
    I left ssh logged in to see whats wrong.
    I found out that every few hrs, that issue was happening:

    Message from syslogd@server at Sun Nov 16 11:47:21 2007 ...
    server kernel: journal commit I/O error

    Then, we had techs to check HDD for errors. they did fsck, and said disk has to
    be replaced (at this point, without wanting to offend anyone, I have to state
    that 90% of techs in datacenters, or at least at mine --won't name them, 99% you guess which DC it is...-- are just low paid
    students that don't know SIMPLE things...) so we told them to replace the HDD
    and do OS reload in new drive...

    So, on 17th Nov. we had online a new HDD, 400 GB with CentOS 5 loaded in it,
    and cPanel 11...

    After few hrs and all working and backups recovered, the issue came up again !

    This time, with exactly same error, techs told us it may be RAM, so they
    replaced the RAM and we waited...

    In 2,5 hrs, bang, it happens again, same error.

    They say, it might be the sata cable...

    we replace it...

    AGAIN ERROR.

    we ask them to loook at it seriously, and after lot pressure, to avoid case
    that mo/bo or controller is wrong, they do upgrade us in a BRAND NEW colocated
    box, which was this time: Intel Core 2 DUO, 6300, with brand new 500 GB HDD
    and new RAM.

    We recover in the new box the backups.

    Box is online today at 18th Nov...

    and the issue comes up again !!!

    Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
    server kernel: journal commit I/O error



    I ask them what the **** is going on now and why after having all hardware
    replaced with new, and all OS reinstalled clean on new drives and new devices,
    3 times, why we have again the same error...


    And their response: power cable was loose, we replaced power cable...


    Guys, sorry, this is really DUMP... loose cable cannot give that error...


    and the error... continues!




    Also, at this point, I have to let you know, that when this error comes up,
    (Message from syslogd@server at Sun Nov 18 11:47:21 2007 ...
    server kernel: journal commit I/O error) the filesystem becomes READ-ONLY and
    nothing gets affected, if we do a cold reboot by reset button, it comes up
    again and all work PERFECT, until the issue comes back again.....




    I am desperate with that, let me know what I have to do!!!
     
  2. rpmws

    rpmws Well-Known Member

    Joined:
    Aug 14, 2001
    Messages:
    1,824
    Likes Received:
    5
    Trophy Points:
    38
    Location:
    back woods of NC, USA
    tell that jack leg TX based hosting provider to put in a new drive. If they say it's a new drive then check the smart data and see if you can get drive data on it and a serial number if possible. It may be you got one of my old drives that they said they threw in dumpster by mistake.

    Not saying it's not your kernel or os .. but I have seen DC's just reinstall the same drive over and over. make sure these idiots aren't doing that to you.
     
  3. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    could be "Hardware compatibility issue"..

    Hi,

    I had same issues months ago at two of my Linux servers, one running cPanel, the other running directadmin.

    Initially I always thinks it is a software only issue since I have tried replacement of hardwares on both servers for few times, but the cases still happen occationally until one of my servers crashed becaused of this.

    I did lots of testings on these servers, realized some Software-Raid system is not working fine for Centos 5 kernel and FC5/6/7 kernels, or I can say they are not working fine for most of latest version Linux kernels. The read-write control of Ext3 system is using journal, in theory it shall be better than Ext2, however in many cases Ext3 will cause server into I/O issues more frequently due to false alarm on hardware performance especially when Soft-RAID is running together to synchonize data between hard disks..

    Please check if you are using any Soft-RAID or Ext3 file systems.

    Resolve using software refinement:
    1. Disable syslog to slow down the read/write freqency to disks.. test for few days to see if improves your system
    2. Switch off software need to read/write to disk frequently such as CSF schduled periodical checking and logging modules.
    3. Not sure the latest kernel patches resolved similar issues already or not.
    You may try "yum kernel update" and reboot to test the new kernel for your server.

    Resolve using hardware refinement:
    1. You can try to downgrade ext3 to ext2. (good choice for a live server with clients)
    2. switch off SoftRAID and use single hard disk.
    (disable softraid from bios is still not enough since hardware autodetection will still call softraid drivers for your two hard disks since you configured them as RAID1 previously.)
    Bad point, you may need to reinstall OS after this step since it may affect MBR.
    3. Last choice, remove 2nd drive and reinstall Centos5 with cPanel using Ext2. Currently one of my servers is using this way, so far so good for 1 month.
     
  4. Cybersoft Plus

    Cybersoft Plus Registered

    Joined:
    Nov 18, 2007
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    1
    I dont have soft-raid.

    it is two sata2 disks.

    disk sda is used for os and /home

    disk sdb is used for /backup


    thats all...



    -----------------



    all my partitions are EXT3, always...

    kernel is latest version...


    so, what i do now ?


    downgrade kernel would help ? and how is that possible to fetch old kernels from yum ?
     
  5. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    So you are using 2 SATAII hard disk at the server? During Installation, did you configure and format 2nd drive? If you have not yet configure 2nd drive, could you try to reboot from 1st drive only by removing 2nd drive for time-being, please monitor there is any Kernel Panic issue, if there is no such a kernel issue, then your issue might not be relevant to SoftRAID and Motherboard compatibility.

    Which server board are you using? or you are using usual ASUS/MSI boards? Which chipset for the motherboard?

    At first, you can try to disable Syslog from "service manager" in WHM, if it is due to Ext3 read/write issue, usually this will make server much more stable (but will never resolve the issue completely, when server is busy, same problem will still occur periodically)

    Which firewall are you using or you do not use any firewall currently? Did you turn off default firewall and Selinux before cPanel installation?
     
  6. Cybersoft Plus

    Cybersoft Plus Registered

    Joined:
    Nov 18, 2007
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    1
    I dont use softraid or any bios raid, the two ext3 disks are seperate, firsat drive has other data than the second and these two have no relation to each other... when we did OS reload, it was done on drive SDA, the second drive was not mounted as it has my backups.

    I did manually mount the second drive after format completed and have it as secondary drive mounted under /backup

    I dont use any firewall and syslog is disabled.

    I think i have an asus mobo, but not sure, can I see that from ssh ???


    Also, would I be able downgrade to an old kernel that does not have the bugs ???
     
  7. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    If you are using CentOS 5 kernel from the DVD, it shall be the initial kernel compiled for CentOS 5. I remember there was some new kernel release for CentOS 5 when you run "yum update kernel"

    If you are referring to downgrading to earlier kernels, you may need to reinstall OS using older version installation CDs. For each version of OS such as CentOS5, Centos4, there are only few compatible prebuilt kernels can be used. Usually it is not advisable to downgrade kernel without correct version support.

    After disabling Syslog, how is the server status, how frequent the I/O issue will still occur?

    Your case might be different from mine since you are not using SATAII RAID. If you find your server is still the same with frequent I/O issue after you disabled syslog. My suggestion is to check if your DC support has changed motherboard model for the new box, if the motherboard model was used, then best way to try is to change your motherboard to other model as well before you do anything for downgrading.
     
  8. adapter

    adapter Well-Known Member
    PartnerNOC

    Joined:
    Sep 17, 2003
    Messages:
    391
    Likes Received:
    0
    Trophy Points:
    16
    i think that i have your same issue!!

    my server go offline every morning at 4.25AM, checking /var/log/messages i have always this error:

    Nov 18 04:12:57 venus snmpd[2978]: refused smux peer: oid SNMPv2-SMI::enterprises.674.10892.1, password , descr Systems Management SNMP MIB Plug-in Manager
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): htree_dirblock_to_tree: bad entry in directory #5210649: rec_len % 4 != 0 - offset=0, inode=1970562
    386, rec_len=28274, name_len=45
    Nov 18 04:12:58 venus kernel: Aborting journal on device sda7.
    Nov 18 04:12:58 venus kernel: ext3_abort called.
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7): ext3_journal_start_sb: Detected aborted journal
    Nov 18 04:12:58 venus kernel: Remounting filesystem read-only
    Nov 18 04:12:58 venus kernel: EXT3-fs error (device sda7) in start_transaction: Journal has aborted

    on sda7 i have /home partition and i have already run a fsck without found error on it, i have try to upgrade also to the latest RHEL kernel but the problem persist, how can i fix?
     
  9. MMarko

    MMarko Well-Known Member

    Joined:
    Apr 18, 2005
    Messages:
    316
    Likes Received:
    0
    Trophy Points:
    16
    Yup, definitely something incompatible.

    IMHO it might be sata drivers... Anyway guys at datacenter should know and have hardware and OS which they have tested and which works fine.
     
  10. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    I believe now data centre staff will get more headache due to various hardware compatibility issues since there are too many choices for CPU/MB/RAM/Hardisk.

    It is very difficult to say which combination is the most stable one since it needs long time for evaluations. By the time evaluation is done, new users might insist to use latest new hardware with newer technologies, so that needs another round of testings on new hardware. ;)
     
  11. sampride

    sampride Member

    Joined:
    Jul 8, 2005
    Messages:
    24
    Likes Received:
    0
    Trophy Points:
    1
    You can try to boot with acpi=off

    Add this to the end of end bootloader command

    acpi=off noacpi

    I have the same issue. I've changed 5 new harddisk and new server board, and ram in one month time.
     
  12. xufeng

    xufeng Member

    Joined:
    May 13, 2004
    Messages:
    14
    Likes Received:
    0
    Trophy Points:
    1
    Do you mean you resolved your previous issues by disabling acpi?
    Is "Advanced Configuration and Power Interface"related to disk read/write operations? I never touch on that settings before, if you find this can resolve I/O and Journal issue, it is very worth to have some testings on those problematic motherboards to see if that caused journal failure.
     
  13. sampride

    sampride Member

    Joined:
    Jul 8, 2005
    Messages:
    24
    Likes Received:
    0
    Trophy Points:
    1
    Yes, I took down the server and move the backup to a desktop board, so far no problem. And I took the problematic server for testing. After a long testing using acpi=off, error never came back.
     
  14. dotmn

    dotmn Member

    Joined:
    Oct 15, 2006
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    1
    uuuuggghhh I replaced about every hardware in the server to try to find a fix to this issue! I am glad someone found it!
     
  15. cameronp2

    cameronp2 Member

    Joined:
    Sep 22, 2007
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    *sigh*

    Well after 2 hdd failures, we have put in this change as well .... will let you all know how it goes :/

    Can I just confirm that everyone has been using centos 5?
     
  16. Cybersoft Plus

    Cybersoft Plus Registered

    Joined:
    Nov 18, 2007
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    1
    no, not even the ACPI off work for me...
    this is kernel bug for sure...

    I had to replace motherboard and cpu to be fine...

    I switched to Intel guys...

    I dont want to convert the post to flame AMD vs Intel,
    I am fan of AMD since 1997...

    but guys, after lot years on hosting, I realize that finally,
    the most compatible, fast and cool is Intel for hosting
    and cpanel boxes... Dunno why, just runs smoother
    on same load and sites...
     
  17. sampride

    sampride Member

    Joined:
    Jul 8, 2005
    Messages:
    24
    Likes Received:
    0
    Trophy Points:
    1
    Yes, Centos5
     
  18. MaraBlue

    MaraBlue Well-Known Member

    Joined:
    May 3, 2005
    Messages:
    335
    Likes Received:
    2
    Trophy Points:
    18
    Location:
    Carmichael, CA
    cPanel Access Level:
    Root Administrator
    Buhahahaha! Thanks, I needed that laugh :)

    So it's *not* just me this happens to, eh?
     
  19. MaraBlue

    MaraBlue Well-Known Member

    Joined:
    May 3, 2005
    Messages:
    335
    Likes Received:
    2
    Trophy Points:
    18
    Location:
    Carmichael, CA
    cPanel Access Level:
    Root Administrator
    Oh yes. Sad, isn't it? I've had twice as much formal training and real-world experience than 75% of those I'm supposed to be going to "for support." The part that bothers me is when they lie to cover their lack of knowledge, rather than just admitting they don't know, or what I'm asking is beyond their experience.

    I hope you got the issue sorted out.
     
  20. cameronp2

    cameronp2 Member

    Joined:
    Sep 22, 2007
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    Well Mine has been running for 7 days with no errors whatsever so far...so am very relieved at this stage, I was at the point of possibly changing providers.

    A note is that I was always using Quad Core Intel, so it doesnt really seem to have anything to do with AMD / Intel chipsets.

    Thanks,
    cameron
     
Loading...

Share This Page