S.M.A.R.T. Should I be worried?

haze

Well-Known Member
Dec 21, 2001
1,540
3
318
I just got the following email, should I be worried?

IMPORTANT: Do not ignore this email.
You should backup all the data on the hard drives listed below and replace them as soon as possible.
S.M.A.R.T has detected that they are not peforming within normal operating paramaters.

Excessive ATA Errors on disk /dev/hda. Please consider replacing this drive.

SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 05
ATA Error Count: 16
Non-Fatal Count: 0

Error Log Structure 1:
DCR FR SC SN CL SH D/H CR Timestamp
08 00 08 4c a2 23 e0 ca 3147640
08 00 08 24 d3 8a e0 c8 3147640
08 00 60 2c d3 8a e0 c8 3147640
08 00 18 8c d3 8a e0 c8 3147640
08 da 00 00 4f c2 e0 b0 3147640
00 04 00 0b 4f c2 e0 51 32124

Error Log Structure 2:
DCR FR SC SN CL SH D/H CR Timestamp
08 00 08 7c a5 22 e0 ca 3582248
08 00 08 8c 90 22 e0 ca 3582248
08 00 02 41 00 00 e0 c8 3582248
08 00 08 0c 30 03 e0 c8 3582248
08 00 01 01 00 00 a0 08 3582260
00 04 01 01 00 00 a0 51 32175

Error Log Structure 3:
DCR FR SC SN CL SH D/H CR Timestamp
08 00 08 fc c0 76 e4 c8 3616438
08 00 08 24 d3 8a e0 c8 3616438
08 00 60 2c d3 8a e0 c8 3616438
08 00 18 8c d3 8a e0 c8 3616439
08 da 00 00 4f c2 e0 b0 3616439
00 04 00 0b 4f c2 e0 51 32144

Error Log Structure 4:
DCR FR SC SN CL SH D/H CR Timestamp
08 00 08 7c a5 22 e0 ca 519082
08 00 08 8c 90 22 e0 ca 519082
08 00 02 41 00 00 e0 c8 519082
08 00 08 0c 30 03 e0 c8 519082
08 00 01 01 00 00 a0 08 519095
00 04 01 01 00 00 a0 51 32162

Error Log Structure 5:
DCR FR SC SN CL SH D/H CR Timestamp
08 00 08 b4 a6 76 e4 c8 579232
08 00 08 24 d3 8a e0 c8 579232
08 00 60 2c d3 8a e0 c8 579232
08 00 18 8c d3 8a e0 c8 579233
08 da 00 00 4f c2 e0 b0 579233
00 04 00 0b 4f c2 e0 51 32138
 

Daniel

Well-Known Member
PartnerNOC
Aug 13, 2001
164
0
316
I received an email like this also. The problem is it doesn't tell what server so I have no idea what server to check. :p
 

haze

Well-Known Member
Dec 21, 2001
1,540
3
318
Well, I got the DC to confirm there is a problem. I need to back everything up. This is a personal server, so I havent had any back up in place ( other than having a hard copy of my sites on my HD ). The question, is, how do I back up everything? I assume I will be needing to install CPanel again.
 

DefHosting

Member
May 23, 2002
10
0
301
Got this error also the other day. Ran smartcheck and everything appears to be ok. Keeping a close eye on it though.
 

Brownie

Well-Known Member
Aug 10, 2001
143
0
316
can I just ask what kernel you're all running? I remember seeing a thread a long time ago about smart not liking a certain kernel.

Im using 2.4.9-31
 

TRAIN YARD SOFTWARE

Well-Known Member
Dec 20, 2001
222
0
316
Kernel Version 2.4.17 (SMP)
Kernel Version 2.4.18 (SMP)
Kernel Version 2.4.18 (SMP)
Kernel Version 2.4.18 (SMP)
Kernel Version 2.4.18 (SMP)
Kernel Version 2.4.18 (SMP)
Kernel Version 2.4.18 (SMP)
 
B

bdraco

Guest
Since SMART errors are logged by the device itself it almost never wrong (unless there is something wrong with the drive, it which case it would be a good idea to replace it anyways). Check your dmesg as well, you will probably find disk errors.
 

jumpdomain

Well-Known Member
Aug 12, 2001
109
0
316
SMART check has been correct 100% of the time on our servers... Every time, there were also drive errors in the dmesg such as seek errors.
 

TRAIN YARD SOFTWARE

Well-Known Member
Dec 20, 2001
222
0
316
[quote:f54f75b5a7][i:f54f75b5a7]Originally posted by bdraco[/i:f54f75b5a7]

Since SMART errors are logged by the device itself it almost never wrong (unless there is something wrong with the drive, it which case it would be a good idea to replace it anyways). Check your dmesg as well, you will probably find disk errors.
[/quote:f54f75b5a7]

What is the OK time frame to fix problem. from right when email comes in at 3am or etc.?

-Ed
TYS
 

Brownie

Well-Known Member
Aug 10, 2001
143
0
316
its been confirmed by a tech - my servers drive is forked :\ Waiting on a replacement now
 
B

bdraco

Guest
[quote:5c2e8c2eaf][i:5c2e8c2eaf]Originally posted by TRAIN YARD SOFTWARE[/i:5c2e8c2eaf]

[quote:5c2e8c2eaf][i:5c2e8c2eaf]Originally posted by bdraco[/i:5c2e8c2eaf]

Since SMART errors are logged by the device itself it almost never wrong (unless there is something wrong with the drive, it which case it would be a good idea to replace it anyways). Check your dmesg as well, you will probably find disk errors.
[/quote:5c2e8c2eaf]

What is the OK time frame to fix problem. from right when email comes in at 3am or etc.?

-Ed
TYS[/quote:5c2e8c2eaf]

There are two levels of warnings. If you get and error that says
& Please consider replacing this drive&, you probably have a while till failure. If you get &Disk Failure soon on ????& then you better backup everything before rebooting again.
 

kwimberl

Well-Known Member
Aug 13, 2001
123
0
316
After having 16 drives all show that they are failing SMART with this new script, I began to really look into this.

They are all of my Samsung drives.

I spent nearly an hour on the phone with 3 samsung reps this afternoon. I had already run all of their own diagnostic utils on them and they all say they are fine.

After much ado, it appears that this smartcheck is NOT compatible wth samsung drives. There is NOT a problem according to samsung with my drives.

Anyway, I need a way to disable this check? I edited the script myself to disable it, but the rsync takes care of that after the next update.

Nick?
 

Brownie

Well-Known Member
Aug 10, 2001
143
0
316
[quote:82908c0f37][i:82908c0f37]Originally posted by kwimberl[/i:82908c0f37]

After having 16 drives all show that they are failing SMART with this new script, I began to really look into this.

They are all of my Samsung drives.

I spent nearly an hour on the phone with 3 samsung reps this afternoon. I had already run all of their own diagnostic utils on them and they all say they are fine.

After much ado, it appears that this smartcheck is NOT compatible wth samsung drives. There is NOT a problem according to samsung with my drives.

Anyway, I need a way to disable this check? I edited the script myself to disable it, but the rsync takes care of that after the next update.

Nick?[/quote:82908c0f37]

chattr +i /scripts/nameofscript should stop the script been overwritten :)
 

andyf

Well-Known Member
Jan 7, 2002
249
0
316
UK
I've got this error on a dev box, thats running a drive which has been working fine for 3 years running a windows OS, so I suspect the errors can also be caused by a mis-configuration and incorrect settings for the IDE controller, not just a failing disk.

Andy