acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
cPanel appears to have recently corrected the auto-detection for SATA drives. When this correction went out, we got a flood of support requests asking about SMART emails being received. Almost all of the requests were regarding errors that occurred in the past.

AndyReed posted details about old errors in a post here:

http://forums.cpanel.net/showthread.php?t=53613#4

But we're also noticing that reports are being generated due to attributes 190 having an "In_the_past" value for WHEN_FAILED. This attribute is the Airflow_Temperature_Cel and indicates when the drive's temperature exceeded a threshold. This is certainly good information to have if your drive is running hot, but after you install a fan and and get it running cool again, the WHEN_FAILED value will continue to trip SMART email alerts.

I've already contact Seagate and have confirmed that it is impossible to reset the attributes of the drive.

The smartd SMART daemon makes use of /etc/smartd.conf to control which which options and drives should be monitored with SMART checks. Within this file, there as a -I option which allows you to ignore certain attributes in error reports. cPanel's smartcheck script does not use this file so we can't omit attributes from the error alerts. Is there a sanctioned method of ignoring the "In_the_past" value for a given attribute in /scripts/smartcheck?

If not, we'll probably be disabling /scripts/smartcheck across all of our servers and continuing to use the smartd service to monitor drives instead. It might be a prudent move to have cPanel simply setup /etc/smartd.conf and allow the SMART daemon to do the actual monitoring. You could even add smartd to tailwatchd for good measure.
 

cPanelKenneth

cPanel Development
Staff member
Apr 7, 2006
4,608
77
458
cPanel Access Level
Root Administrator
/scripts/smartcheck uses the smartctl binary to perform the checks which , at least on CentOS 4, does not use a configuration file.

We provide a custom configuration file via /var/cpanel/smartcheck_custom_dash_d.yam but it is only used for the -d parameter.

I'll file a report with the developers to investigate extending this to either a full fledged configuration file or other method of providing custom runtime options to smartctl.
 

acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
Thanks, cpanelkenneth. We already went digging through the smartctl flags to see if there was a way to suppress certain errors and just add them to /scripts/smartcheck's binary call directly. The smartctl binary itself doesn't use a config file, but when you setup the smartd service, you can use /etc/smartd.conf to control which drives are automatically checked and the config file itself has a set of it's own flags which you can use to prune the alerts.

Relevant Documentation Snippet from /etc/smartd.conf:

Code:
# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE
#   -d TYPE Set the device type to one of: ata, scsi
#   -T TYPE set the tolerance to one of: normal, permissive
#   -o VAL  Enable/disable automatic offline tests (on/off)
#   -S VAL  Enable/disable attribute autosave (on/off)
#   -H      Monitor SMART Health Status, report if failed
#   -l TYPE Monitor SMART log.  Type is one of: error, selftest
#   -f      Monitor for failure of any 'Usage' Attributes
#   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#   -M TYPE Modify email warning behavior (see man page)
#   -p      Report changes in 'Prefailure' Normalized Attributes
#   -u      Report changes in 'Usage' Normalized Attributes
#   -t      Equivalent to -p and -u Directives
#   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
#   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
#   -i ID   Ignore Attribute ID for -f Directive
#   -I ID   Ignore Attribute ID for -p, -u or -t Directive
#   -v N,ST Modifies labeling of Attribute N (see man page)
#   -a      Default: equivalent to -H -f -t -l error -l selftest
#   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#   -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# All but -d, -m and -M Directives are only implemented for ATA devices
Would be nice to have these options available directly within cPanel. Thanks for checking with the devs. Look forward to hearing details.
 

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
Hello,

This is essencially what is getting those errors:

smartctl -q errorsonly -H -l selftest -l error

Can you confirm 2 things for me?

1) that command shows the errors you are concerned with

2) adding these args to that command:

-i 490 -I 490

result in the hiding the 'attribute 190' error

Thanks!
 

acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
Hello cpdan,

1) Yup:

Code:
[[email protected] ~]# smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   059   044   045    Old_age   Always   In_the_past 41 (0 3 46 33)
2) Nope:

Code:
=======> UNRECOGNIZED OPTION: I

Use smartctl -h to get a usage summary
The options I posted in my last post are for the smartd.conf file directives. The -i flag is recognized by smartctl, but it doesn't have the same meaning. Within the smartd.conf -i means to ignore failures of any attributes, passing it to the binary means to print info about the device.

Thanks for the response. Let me know if there's anything else I can do to help!
 

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
The script only executes smartctl so smartd items will do any good.

Is the error present in both of these commands?

smartctl -q errorsonly -H -l error -d ata /dev/sda

smartctl -q errorsonly -H -l selftest -d ata /dev/sda

I'm looking to add support for custom flags (in addition to the -d support it already has)

However if there is no way to get smartctl to silence unwanted errors you should consider using smartd to monitor your drives instead of/along with /scripts/smartcheck.

You can disable the /scripts/smartcheck check by doing:
`touch /var/cpanel/disablesmartcheck`
and reenable it at anytime by removing that file.
 
Last edited:

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
Yup, the error's present in both. Yeah, we'll probably go back to using smartd until cPanel's script is able to include some configurable options. Thanks for the help.
If you could configure flags to get passed to smartctl what flags would they be that would silence the error?
 

acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
Within the smartd.conf file, you can set them as such (SATA drive):

Code:
/dev/sda -d sat -S on -o on -a -I 194 -I 190 -s (S/../.././(01|05|09|13|17|21)|L/../../5/08) -m [email protected]
This sets up the drive to be checked with an offline short test every 4 hours and a long test once each Friday at 8. The -I says to ignore attribute 194 and 190. -m is the contact email to send to.

Full Manpage:

http://smartmontools.sourceforge.net/man/smartd.conf.5.html
 

acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
Sorry, just reread your question. smartctl wouldn't receive the flags directly. I believe the smartd service does some post-test parsing of the results using the flags you set in smartd.conf to decide what is/isn't worth notifying the contact about.
 

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
Sorry, just reread your question. smartctl wouldn't receive the flags directly. I believe the smartd service does some post-test parsing of the results using the flags you set in smartd.conf to decide what is/isn't worth notifying the contact about.
Correct, all /scripts/smartcheck uses is smartctl it does not do anything with smartd.

It's just a simple check-and-report, for anything more complicated it recommends that "you should consider using smartd to monitor your drives instead of/along with /scripts/smartcheck."

There are instuctions on how to [dis/reen]able the check f you wish.

There is a link to smartmontools in that email also.
 

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
Hello again,

Is the output you posted previously from:

smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda

all of the output? (i.e. the next line was the prompt again?

Also just FYI

cPanel appears to have recently corrected the auto-detection for SATA drives
Probably someone configured the -d flag for the drive in question so that smartctl reports on it as the /scripts/smartcheck hasn't really changed anytime recently.
 

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
Also does the attribute erro go away w/ out -H:

smartctl -q errorsonly -l selftest -l error -d ata /dev/sda

thanks!
 

acenetryan

Well-Known Member
PartnerNOC
Aug 21, 2005
197
1
168
Yup, that's all the output:

Code:
[[email protected]##### ~]# smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   059   044   045    Old_age   Always   In_the_past 41 (0 3 46 33)
The error is not present when I omit the -H flag and now that I'm thoroughly testing smartd, I don't think the -I flag is actually omitting error reporting on the WHEN_FAILED value. According to the smartd manpage, the -H option for smartd reports only when the prefailure attributes are less than or equal to their THRESH values:

smartd manpage said:
If any Prefailure Attributes are less than or equal to their threshold values, the disk failure is predicted in less than 24 hours, and a message at loglevel 'LOG_CRITICAL' will be logged to syslog
So I think using smartd, period, is enough to suppress reports generated by the "WHEN_FAILED" value being In_the_past. I interpret the smartd manpage as saying smartd doesn't report on the WHEN_FAILED value at all, it checks the current Attribute values for failures. The -I 190 -I 194 flags further reduce the frivolous emails by ignoring small changes in ambient temperature which would normally be reported using the -t flag. Reports that would be generated by these values exceeding their THRESH would still be sent since -I does not ignore that attribute for -f failures.
 
Last edited:

cPDan

cPanel Staff
Staff member
Mar 9, 2004
721
13
243
yes, smartd is much more configurable that smartctl

smartctl is a enable/disable, run test, dump test log util.

smartd is much more customizable, and, well "smart" no pun intended :)

so I'd say configure and use smartd how you want and disable /scripts/smartcheck like the email outlines
 

blargman

Well-Known Member
Verifed Vendor
Sep 11, 2007
99
0
56
If your running Centos 5.x the latest smartmontools released is working on many more sata drives without the need for -d option. I think this is the cause of the new messages. The flag wasn't being added by smartcheck before like it should have been.
 

isputra

Well-Known Member
May 3, 2003
575
0
166
Mbelitar
Hi,

Is it normal if my HD have above 40 on temperature ?

Code:
                S.M.A.R.T Errors on /dev/sda
                From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda
                Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   056   044   045    Old_age   Always   In_the_past 44 (0 2 46 39)
                ----END /dev/sda--
                
                S.M.A.R.T Errors on /dev/sdb
                From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sdb
                Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   060   043   045    Old_age   Always   In_the_past 40
                ----END /dev/sdb--