Possible Hard Drive Failure Soon S.M.A.R.T Errors on /dev/hdb

claven177

Well-Known Member
Sep 3, 2003
61
0
156
Taipei
My box with 2 x 80 HD.
Yesterday, I format 2nd HD as /home2 using
WHM > Disk Drives > Format/Mount a new Hard Drive
today I got a e-mail

================================================== ===
[cPanel smartcheck] Possible Hard Drive Failure Soon
S.M.A.R.T Errors on /dev/hdb
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hdb
ATA Error Count: 29 (device log contains only the most recent five errors)
Error 29 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
Error 28 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
Error 27 occurred at disk power-on lifetime: 17719 hours (738 days + 7 hours)
Error 26 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
Error 25 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
----END /dev/hdb--
================================================== ===​



Current Disk UsageFilesystem Size Used Avail Use% Mounted on
/dev/hda5 71G 34G 34G 51% /
/dev/hda1 99M 8.4M 86M 9% /boot
/dev/hda3 1012M 34M 927M 4% /tmp
none 502M 0 502M 0% /dev/shm
/tmp 1012M 34M 927M 4% /var/tmp
/dev/hdb1 99M 5.6M 89M 6% /home2



# /sbin/fdisk -l

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 83 Linux
/dev/hda2 14 268 2048287+ 82 Linux swap
/dev/hda3 269 399 1052257+ 83 Linux
/dev/hda4 400 9729 74943225 5 Extended
/dev/hda5 400 9729 74943193+ 83 Linux

Disk /dev/hdb: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hdb1 * 1 13 104391 83 Linux
/dev/hdb2 14 270 2064352+ 82 Linux swap
/dev/hdb3 271 9729 75979417+ 83 Linux​


/dev/hdb1 only Size 99M?

Now, with that said, what do you think about these these?
 

jackie46

BANNED
Jul 25, 2005
536
0
166
Im still getting this messages. Been 6 months now and im still waiting for some type of resolution. :rolleyes:
 

AndyReed

Well-Known Member
PartnerNOC
May 29, 2004
2,217
4
193
Minneapolis, MN
claven177 said:
My box with 2 x 80 HD.
Yesterday, I format 2nd HD as /home2 using
WHM > Disk Drives > Format/Mount a new Hard Drive
today I got a e-mail

================================================== ===
[cPanel smartcheck] Possible Hard Drive Failure Soon
S.M.A.R.T Errors on /dev/hdb
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hdb
ATA Error Count: 29 (device log contains only the most recent five errors)
Error 29 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
Error 28 occurred at disk power-on lifetime: 17741 hours (739 days + 5 hours)
Error 27 occurred at disk power-on lifetime: 17719 hours (738 days + 7 hours)
Error 26 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
Error 25 occurred at disk power-on lifetime: 17670 hours (736 days + 6 hours)
----END /dev/hdb--
================================================== ===​
As of January 2006, CPanel has made some changes to their smartctl implementation, which is what they use to periodically scan for SMART errors on your drive(s). Previous versions were somewhat buggy, and did not usually correctly report minor SMART inconsistencies.

Due to recent upgrades by CPanel, when their smartcheck script runs, it may show minor ATA errors that it logged quite some time ago, but never reported.

You have nothing to worry about from this particular errors. *All* drives will, from time to time, experience ATA errors - it is only if you see these incrementing quickly and over a short period of time that you need worry.

As you can see from the smart report, these errors were logged at 17741 hours of the drive life. You need to find out the current hours logged and substract that from the above logged hours. If you find a big increment, then your HD might be failing any time soon.

In the event of serious (CRITICAL) SMART errors, such as a possibly failing drive, the smartcheck script will notify you with different information. While SMART cannot always accurately predict a failing drive, in a case of a predicted drive fail, it would be best for you to make backups of all your data, and schedule a time with your DC to test the drive.

However, the type of error you are seeing is nothing to worry about, and again, simply a result of CPanel's new implementation of smartctl. Hope this helps!
 

jackie46

BANNED
Jul 25, 2005
536
0
166
Yes i read all this before. Its nothing to worry about but how annoying do you think it is to get this messages twice a day and for months at a time?

Until they made that change we never got these messages. So its time to undo what they did so people dont keep getting blasted with this insane messages.

...and what happens if my drive is really starting to fail? How will i know the difference between a screwup change and a real failing drive?
 

mctDarren

Well-Known Member
Jan 6, 2004
665
9
168
New Jersey
cPanel Access Level
Root Administrator
Does
Code:
touch /var/cpanel/disablesmartcheck
still work? Effectively this will turn of SMART checking of the drive. It won't help you guage when the drive is failing, but it will end the annoying emails...

Best advice: Do as AndyReed says. Schedule a backup time and dump that drive.
 

jackie46

BANNED
Jul 25, 2005
536
0
166
webtiva said:
Does
Code:
touch /var/cpanel/disablesmartcheck
still work? Effectively this will turn of SMART checking of the drive. It won't help you guage when the drive is failing, but it will end the annoying emails...

Best advice: Do as AndyReed says. Schedule a backup time and dump that drive.
Read my message again;

...and what happens if my drive is really starting to fail? How will i know the difference between a screwup change and a real failing drive?
 

mctDarren

Well-Known Member
Jan 6, 2004
665
9
168
New Jersey
cPanel Access Level
Root Administrator
Ok, there's no need to get testy when someone replies to a question you have. Only trying to help.

If you simply want the emails to stop then you can turn off smartctl. This will not help you determine if the drive is failing. Other signs can point to a failure such as freezing after reboot, file corruption or loss, the box slowing down on hdd read/writes, etc. The only way to really know is, sadly, after the failure occurs. There really isn't a way to accurately predict a hard drive failure. Keep good backups and have a failure contigency plan in place to minimize downtime.

Also, SMART errors can come from a variety of possible problems. Bad firmware, motherboard incompatibilities, even bad cables or insufficient powering of the box. Half the time a gradual degradation with sporadic but consistant errors will mean a hard drive failure. A myriad of errors right at the point that a new hdd is introduced throws a flag for me that something doesn't like the way it's configured - which I believe would be the case of the OP.

Bottom line? As AndyReed said, you should check into how old the errors you got are and act accordingly. If this is brand new drive and you consistently get these warnings I would look for compatibility reasons or causes. If it's an old drive and the errors are increasing in frequency over a short time it may well be time to take the drive out to pasture and end it's misery.
 

jackie46

BANNED
Jul 25, 2005
536
0
166
Im not getting testy. There is nothing wrong with the drive. The box is less then a few months old. The first month the box ran without reporting error. The following month, in the change logs we see cpanel made a change to chkservd. The next day after the update was applied we started receiving those messages and it hasn't stopped since. Its not the drive, its a code they changed and its unfortunate that its now incompatable with the controller in this box. We cant ask our NOC to change out the controller now. We have since moved 500 websites to that box.

Im getting a bit tired of these issues and im not the only one. There are many who are reporting the problem. Why ignore all those people who are having this problem? They should be fixing the code or changing it back to the way it used to work.
 

InternetPEI

Well-Known Member
May 26, 2003
100
0
166
still an issue?

Just finished having another server setup at EV1, but now when I check the smart, its showing: "Errors logged 2 errors detected" on the main drive..

I just did a reboot as well hoping it would clear it and ran another 1 minute test.

Is this something I should be concerned about?

Thanks Jason
 

domtaj

Active Member
Aug 29, 2005
42
0
156
Just finished having another server setup at EV1, but now when I check the smart, its showing: "Errors logged 2 errors detected" on the main drive..

I just did a reboot as well hoping it would clear it and ran another 1 minute test.

Is this something I should be concerned about?

Thanks Jason
May or may not be concerned about. If you just got a new server I would recommend running smartctl -t long /dev/hda just to be certain that it is "healthy". I did experience an actual irrecoverable error when I ran a long smartctl test and once it was checked out by EV1, they offerred to change the hard disk at no charge.