procam

Well-Known Member
Nov 24, 2003
121
0
166
I got this fancy smancy email again this morning about one of my drives on one of my units,
blah blah -- I panic nearly spill my coffee on my keyboard and choke on my granola breakfast bar while reading it ~ and then take a deep breathe two xanax and send it to a tech ~
S.M.A.R.T Errors on /dev/hda
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hda
ATA Error Count: 1
Error 1 occurred at disk power-on lifetime: 921 hours
----END /dev/hda--

S.M.A.R.T Errors on /dev/hdb
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/hdb
ATA Error Count: 4
Error 4 occurred at disk power-on lifetime: 6760 hours
Error 3 occurred at disk power-on lifetime: 6760 hours
Error 2 occurred at disk power-on lifetime: 1394 hours
Error 1 occurred at disk power-on lifetime: 3 hours
----END /dev/hdb--

So I send this over to technical now and tell them that I need to schedule a drive replacement as I always have in the past everytime I get one of these errors I promptly change the drive that day or have one overnighted if its not a common stock drive so its there for replacement the next morning~

Much to my surprise I get this email back telling me that this is not a really reliable method of detecting a drive problem is this true anyone have opinions on this Id really like to hear them cause if this is the case I been wasting a LOT of money replacing drives over the years~

Email I got back from the tech "I have performed extended tests with smartcl checked for error logging in /var/log/messages to verify the integrity of your drive and I am satisfied all is well.

Despite the wording of the message, the drive is most likely fine. Every night, cPanel runs the script /scripts/smartcheck to read the SMART diagnostic information from the drive (using smartctl) and warn the system administrator if the drive is having problems, but smartcheck defines "problems" as including the case where the ATA Error Count in 100 or more.

The ATA Error Count is the number of errors recorded by the SMART circuitry on the drive, and this count is cumulative over the life of the drive. Since all drives experience some errors in the course of normal operation, the ATA Error Count will always become greater than 100 at some point, regardless of whether the drive is failing or not. Thus, cPanels smartcheck script will start to produce incorrect warnings when the error count reaches 100.

To prevent these false warnings from being sent, you must disable smartcheck with this command: touch /var/cpanel/disablesmartcheck

Once that is done, you can manually run /scripts/smartcheck once a week or so to keep an eye on the drive. The ATA Error Count should only be a concern if it increases by a large amount, or if it increases consistently. Smaller increases and sporadic jumps can normally be ignored.

When you do suspect a problem with a drive, you should always perform other tests to confirm the problem. The ATA Error Count by itself is simply not conclusive evidence of a pending failure. Some of those other tests are badblocks and monitoring /var/log/messages for I/O errors. If your concerns about this drive continue to grow, please open a trouble ticket requesting a drive replacement. "
 

procam

Well-Known Member
Nov 24, 2003
121
0
166
Izzee said:
Seems to be a s.m.a.r.t. and well informed tech you got there.

:)
Well thank you, but I would like someone to confirm that as gospel :cool:
Im one of those people that likes to gather lots of opinions and experiences so I can make goooood decisions instead of spillin my morning coffee :D
 

Izzee

Well-Known Member
Feb 6, 2004
469
0
166
Try not spilling any coffee over the search facility either as it has been discussed many times of late and has many opinions for you to select from.

:)
 
Last edited: