hard drive failure or just smart acting up?

AlaskanWolf

Well-Known Member
Aug 11, 2001
537
0
316
Fremont CA
Heres the issue, 1 server says 800 hours, the other 7,900 hours of "life" left

Heres the kicker, the drive seek errors on my grey fox server were first just showing up on HDA, now as you can see from this, hdb is being affected too, yet i cant see any think major, ran fsck on 2nd drive, every partion is clean, 1st drive however, the guys @ DC say it skips over about 5 blocks as "read only"

heres the errors after a reboot yesterday from dmesg and the nightly email im getting from cpanel

EXT2-fs error (device ide0(3,71)): ext2_check_page: bad entry in directory #3802085: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ide0(3,71)): ext2_check_page: bad entry in directory #3802085: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
EXT2-fs error (device ide0(3,71)): ext2_check_page: bad entry in directory #3802085: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
hdb: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdb: drive_cmd: error=0x04 { DriveStatusError }
[email protected] [~]# df

You should backup all the data on the hard drives listed below and replace them as soon as possible.
S.M.A.R.T has detected that they are not peforming within normal operating paramaters.

Excessive ATA Errors on disk /dev/hda. Please consider replacing this drive. Some Errors may be normal due to not 100% compatible IDE controllers and may be ignored.

SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 03
ATA Error Count: 338
Non-Fatal Count: 0

Error Log Structure 1:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 ad e2 5f e1 ca 270
00 00 08 75 5b 60 e1 ca 270
00 00 08 c5 be 66 e1 ca 270
00 00 08 3d f9 87 e1 ca 270
00 00 08 3d f9 87 e1 ca 275
00 40 08 15 e6 9c e1 51 27059
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 821 (life of the drive in hours)

Error Log Structure 2:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 75 5b 60 e1 ca 270
00 00 08 c5 be 66 e1 ca 270
00 00 08 3d f9 87 e1 ca 270
00 00 08 15 e6 9c e1 c8 275
00 00 06 17 e6 9c e1 c8 279
00 40 06 17 e6 9c e1 51 27059
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 821 (life of the drive in hours)

Error Log Structure 3:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 fe ac 03 ad e1 c8 8084
00 00 02 aa 04 ad e1 c8 8084
00 00 fe ac 04 ad e1 c8 8084
00 00 02 aa 05 ad e1 c8 8084
00 00 fe ac 05 ad e1 c8 8084
00 40 fe 48 06 ad e1 51 28116
Error condition: 0 Error State: 20
Number of Hours in Drive Life: 823 (life of the drive in hours)

Error Log Structure 4:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 fd e5 9c e1 c8 252
00 00 08 05 e6 9c e1 c8 252
00 00 08 0d e6 9c e1 c8 252
00 00 06 0f e6 9c e1 c8 257
00 00 04 11 e6 9c e1 c8 261
00 40 04 13 e6 9c e1 51 27059
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 821 (life of the drive in hours)

Error Log Structure 5:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 05 e6 9c e1 c8 252
00 00 08 0d e6 9c e1 c8 252
00 00 06 0f e6 9c e1 c8 257
00 00 04 11 e6 9c e1 c8 261
00 00 04 11 e6 9c e1 c8 270
00 40 02 13 e6 9c e1 51 27059
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 821 (life of the drive in hours)

----------------------------------------------
WOLF SERVER - 2nd server

this goes on in dmesg for about 3 pages

hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=62345409, sector=3708096
end_request: I/O error, dev 03:05 (hda), sector 3708096
hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=62345409, sector=3708096
end_request: I/O error, dev 03:05 (hda), sector 3708096
hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=62345409, sector=3708096
end_request: I/O error, dev 03:05 (hda), sector 3708096
hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=62345409, sector=3708096
end_request: I/O error, dev 03:05 (hda), sector 3708096

and email from Cpanel

IMPORTANT: Do not ignore this email.
You should backup all the data on the hard drives listed below and replace them as soon as possible.
S.M.A.R.T has detected that they are not peforming within normal operating paramaters.

Excessive ATA Errors on disk /dev/hda. Please consider replacing this drive. Some Errors may be normal due to not 100% compatible IDE controllers and may be ignored.

SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 01
ATA Error Count: 4286
Non-Fatal Count: 0

Error Log Structure 1:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 f1 97 7f e3 c5 390776
00 00 08 a1 1e 82 e3 c5 390776
00 00 68 81 d2 84 e3 c5 390776
00 00 08 e9 d2 84 e3 c5 390776
00 00 08 c1 50 b7 e3 c4 390776
00 40 08 c1 50 b7 e3 59 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 7941 (life of the drive in hours)

Error Log Structure 2:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 a1 1e 82 e3 c5 360581
00 00 68 d1 cd 84 e3 c5 360581
00 00 08 39 ce 84 e3 c5 360581
00 00 08 69 f4 81 e3 c4 360581
00 00 08 c1 50 b7 e3 c4 360581
00 40 08 c1 50 b7 e3 59 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 7933 (life of the drive in hours)

Error Log Structure 3:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 71 89 82 e3 c5 368388
00 00 20 99 89 82 e3 c5 368388
00 00 08 b9 89 82 e3 c5 368388
00 00 08 29 cc c2 e3 c4 368388
00 00 08 c1 50 b7 e3 c4 368388
00 40 08 c1 50 b7 e3 59 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 7935 (life of the drive in hours)

Error Log Structure 4:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 21 b9 8e e3 c5 377850
00 00 08 99 52 a3 e3 c5 377850
00 00 a0 b1 bd 84 e3 c5 377850
00 00 08 51 be 84 e3 c5 377850
00 00 08 c1 50 b7 e3 c4 377850
00 40 08 c1 50 b7 e3 59 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 7937 (life of the drive in hours)

Error Log Structure 5:
DCR FR SC SN CL SH D/H CR Timestamp
00 00 08 79 d7 84 e3 c5 377885
00 00 08 11 e1 c2 e3 c5 377885
00 00 28 81 d7 84 e3 c5 377886
00 00 08 a9 d7 84 e3 c5 377886
00 00 08 c1 50 b7 e3 c4 377886
00 40 08 c1 50 b7 e3 59 0
Error condition: 0 Error State: 3
Number of Hours in Drive Life: 7937 (life of the drive in hours)


Now, with that said, what do you think about these these?