The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Possible Hard Drive Failure Soon S.M.A.R.T Errors on /dev/hdb

Discussion in 'General Discussion' started by claven177, May 30, 2006.

  1. claven177

    claven177 Well-Known Member

    Joined:
    Sep 3, 2003
    Messages:
    61
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Taipei
    My box with 2 x 80 HD.
    Yesterday, I format 2nd HD as /home2 using
    WHM > Disk Drives > Format/Mount a new Hard Drive
    today I got a e-mail

    ================================================== ===
    [cPanel smartcheck] Possible Hard Drive Failure Soon
    S.M.A.R.T Errors on /dev/hdb
    From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hdb
    ATA Error Count: 29 (device log contains only the most recent five errors)
    Error 29 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
    Error 28 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
    Error 27 occurred at disk power-on lifetime: 17719 hours (738 days + 7 hours)
    Error 26 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
    Error 25 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
    ----END /dev/hdb--
    ================================================== ===​



    Current Disk UsageFilesystem Size Used Avail Use% Mounted on
    /dev/hda5 71G 34G 34G 51% /
    /dev/hda1 99M 8.4M 86M 9% /boot
    /dev/hda3 1012M 34M 927M 4% /tmp
    none 502M 0 502M 0% /dev/shm
    /tmp 1012M 34M 927M 4% /var/tmp
    /dev/hdb1 99M 5.6M 89M 6% /home2



    # /sbin/fdisk -l

    Disk /dev/hda: 80.0 GB, 80026361856 bytes
    255 heads, 63 sectors/track, 9729 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/hda1 * 1 13 104391 83 Linux
    /dev/hda2 14 268 2048287+ 82 Linux swap
    /dev/hda3 269 399 1052257+ 83 Linux
    /dev/hda4 400 9729 74943225 5 Extended
    /dev/hda5 400 9729 74943193+ 83 Linux

    Disk /dev/hdb: 80.0 GB, 80026361856 bytes
    255 heads, 63 sectors/track, 9729 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/hdb1 * 1 13 104391 83 Linux
    /dev/hdb2 14 270 2064352+ 82 Linux swap
    /dev/hdb3 271 9729 75979417+ 83 Linux​


    /dev/hdb1 only Size 99M?

    Now, with that said, what do you think about these these?
     
  2. jackie46

    jackie46 BANNED

    Joined:
    Jul 25, 2005
    Messages:
    537
    Likes Received:
    0
    Trophy Points:
    0
    Im still getting this messages. Been 6 months now and im still waiting for some type of resolution. :rolleyes:
     
  3. Rafaelfpviana

    Rafaelfpviana Well-Known Member

    Joined:
    Mar 12, 2004
    Messages:
    142
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Brazil
  4. AndyReed

    AndyReed Well-Known Member
    PartnerNOC

    Joined:
    May 29, 2004
    Messages:
    2,222
    Likes Received:
    3
    Trophy Points:
    38
    Location:
    Minneapolis, MN
    As of January 2006, CPanel has made some changes to their smartctl implementation, which is what they use to periodically scan for SMART errors on your drive(s). Previous versions were somewhat buggy, and did not usually correctly report minor SMART inconsistencies.

    Due to recent upgrades by CPanel, when their smartcheck script runs, it may show minor ATA errors that it logged quite some time ago, but never reported.

    You have nothing to worry about from this particular errors. *All* drives will, from time to time, experience ATA errors - it is only if you see these incrementing quickly and over a short period of time that you need worry.

    As you can see from the smart report, these errors were logged at 17741 hours of the drive life. You need to find out the current hours logged and substract that from the above logged hours. If you find a big increment, then your HD might be failing any time soon.

    In the event of serious (CRITICAL) SMART errors, such as a possibly failing drive, the smartcheck script will notify you with different information. While SMART cannot always accurately predict a failing drive, in a case of a predicted drive fail, it would be best for you to make backups of all your data, and schedule a time with your DC to test the drive.

    However, the type of error you are seeing is nothing to worry about, and again, simply a result of CPanel's new implementation of smartctl. Hope this helps!
     
  5. jackie46

    jackie46 BANNED

    Joined:
    Jul 25, 2005
    Messages:
    537
    Likes Received:
    0
    Trophy Points:
    0
    Yes i read all this before. Its nothing to worry about but how annoying do you think it is to get this messages twice a day and for months at a time?

    Until they made that change we never got these messages. So its time to undo what they did so people dont keep getting blasted with this insane messages.

    ...and what happens if my drive is really starting to fail? How will i know the difference between a screwup change and a real failing drive?
     
  6. mctDarren

    mctDarren Well-Known Member

    Joined:
    Jan 6, 2004
    Messages:
    664
    Likes Received:
    2
    Trophy Points:
    18
    Location:
    New Jersey
    cPanel Access Level:
    Root Administrator
    Does
    Code:
    touch /var/cpanel/disablesmartcheck
    
    still work? Effectively this will turn of SMART checking of the drive. It won't help you guage when the drive is failing, but it will end the annoying emails...

    Best advice: Do as AndyReed says. Schedule a backup time and dump that drive.
     
  7. jackie46

    jackie46 BANNED

    Joined:
    Jul 25, 2005
    Messages:
    537
    Likes Received:
    0
    Trophy Points:
    0
    Read my message again;

    ...and what happens if my drive is really starting to fail? How will i know the difference between a screwup change and a real failing drive?
     
  8. mctDarren

    mctDarren Well-Known Member

    Joined:
    Jan 6, 2004
    Messages:
    664
    Likes Received:
    2
    Trophy Points:
    18
    Location:
    New Jersey
    cPanel Access Level:
    Root Administrator
    Ok, there's no need to get testy when someone replies to a question you have. Only trying to help.

    If you simply want the emails to stop then you can turn off smartctl. This will not help you determine if the drive is failing. Other signs can point to a failure such as freezing after reboot, file corruption or loss, the box slowing down on hdd read/writes, etc. The only way to really know is, sadly, after the failure occurs. There really isn't a way to accurately predict a hard drive failure. Keep good backups and have a failure contigency plan in place to minimize downtime.

    Also, SMART errors can come from a variety of possible problems. Bad firmware, motherboard incompatibilities, even bad cables or insufficient powering of the box. Half the time a gradual degradation with sporadic but consistant errors will mean a hard drive failure. A myriad of errors right at the point that a new hdd is introduced throws a flag for me that something doesn't like the way it's configured - which I believe would be the case of the OP.

    Bottom line? As AndyReed said, you should check into how old the errors you got are and act accordingly. If this is brand new drive and you consistently get these warnings I would look for compatibility reasons or causes. If it's an old drive and the errors are increasing in frequency over a short time it may well be time to take the drive out to pasture and end it's misery.
     
  9. jackie46

    jackie46 BANNED

    Joined:
    Jul 25, 2005
    Messages:
    537
    Likes Received:
    0
    Trophy Points:
    0
    Im not getting testy. There is nothing wrong with the drive. The box is less then a few months old. The first month the box ran without reporting error. The following month, in the change logs we see cpanel made a change to chkservd. The next day after the update was applied we started receiving those messages and it hasn't stopped since. Its not the drive, its a code they changed and its unfortunate that its now incompatable with the controller in this box. We cant ask our NOC to change out the controller now. We have since moved 500 websites to that box.

    Im getting a bit tired of these issues and im not the only one. There are many who are reporting the problem. Why ignore all those people who are having this problem? They should be fixing the code or changing it back to the way it used to work.
     
  10. InternetPEI

    InternetPEI Well-Known Member

    Joined:
    May 26, 2003
    Messages:
    102
    Likes Received:
    0
    Trophy Points:
    16
    still an issue?

    Just finished having another server setup at EV1, but now when I check the smart, its showing: "Errors logged 2 errors detected" on the main drive..

    I just did a reboot as well hoping it would clear it and ran another 1 minute test.

    Is this something I should be concerned about?

    Thanks Jason
     
  11. domtaj

    domtaj Active Member

    Joined:
    Aug 29, 2005
    Messages:
    42
    Likes Received:
    0
    Trophy Points:
    6
    May or may not be concerned about. If you just got a new server I would recommend running smartctl -t long /dev/hda just to be certain that it is "healthy". I did experience an actual irrecoverable error when I ran a long smartctl test and once it was checked out by EV1, they offerred to change the hard disk at no charge.
     
Loading...

Share This Page