[cPanel smartcheck] Possible Hard Drive Failure Soon

ncconquer

Well-Known Member
Jun 20, 2004
80
0
156
Hi,
We've been getting this message from a lot of our servers

S.M.A.R.T Errors on /dev/hda
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hda
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0007 067 001 011 Pre-fail Always In_the_past 5000
----END /dev/hda--
is anybody else having this problem?
 

katmai

Well-Known Member
Mar 13, 2006
564
3
168
Brno, Czech Republic
strongly recommend offsite backup and reload os on a new hard drive. this indicates hard drive failure soon, and it's not quite kidding. i experienced this once and ignored and finally the server crashed and we had a pain in the back to restore data.

my advice. replace hard drive ASAP
 

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,465
30
473
Go on, have a guess
Although, not always. While it can indicate the impending failure of the drive , it might not be. If the error count doesn't increase then it could simply be an incompatibility issue between smartmontools and the drive. Search the forums for many threads that have dealt with the same issue in the past.
 

AndyReed

Well-Known Member
PartnerNOC
May 29, 2004
2,221
4
193
Minneapolis, MN
ncconquer said:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0007 067 001 011 Pre-fail Always In_the_past 5000
----END /dev/hda--
is anybody else having this problem?
Althoguh it depends on your HD make and type, the RAW_VALUE 5000 is what concerns me the most. Ask you data center check on the condition of your HD. Overall, I suggest you take katmai advice.
 

essentials

Member
Mar 25, 2002
11
0
301
At the command prompt:

dmesg

see if any errors show up.

If you see any errors I'd have the DC mirror the drive that is failing and replace it - much easier than restoring. ;)
 

ujr

Well-Known Member
Mar 19, 2004
290
0
166
you may want to add that dmesg is to find boot (kernel ring) error messages, useful also to see if a HD is failing, but not a sure fire way.
 

katmai

Well-Known Member
Mar 13, 2006
564
3
168
Brno, Czech Republic
i suggested replacing the drive as a preventive measure, in order to have a stable server/business you kinda don't want to take chances, like drive failing while you are on vacation or something. not to mention that you may lose important data, and the restore could be sometime a pain in the back. better planned rather than unplanned