The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

SMART Check and Attribute 190

Discussion in 'General Discussion' started by acenetryan, Apr 10, 2009.

  1. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    cPanel appears to have recently corrected the auto-detection for SATA drives. When this correction went out, we got a flood of support requests asking about SMART emails being received. Almost all of the requests were regarding errors that occurred in the past.

    AndyReed posted details about old errors in a post here:

    http://forums.cpanel.net/showthread.php?t=53613#4

    But we're also noticing that reports are being generated due to attributes 190 having an "In_the_past" value for WHEN_FAILED. This attribute is the Airflow_Temperature_Cel and indicates when the drive's temperature exceeded a threshold. This is certainly good information to have if your drive is running hot, but after you install a fan and and get it running cool again, the WHEN_FAILED value will continue to trip SMART email alerts.

    I've already contact Seagate and have confirmed that it is impossible to reset the attributes of the drive.

    The smartd SMART daemon makes use of /etc/smartd.conf to control which which options and drives should be monitored with SMART checks. Within this file, there as a -I option which allows you to ignore certain attributes in error reports. cPanel's smartcheck script does not use this file so we can't omit attributes from the error alerts. Is there a sanctioned method of ignoring the "In_the_past" value for a given attribute in /scripts/smartcheck?

    If not, we'll probably be disabling /scripts/smartcheck across all of our servers and continuing to use the smartd service to monitor drives instead. It might be a prudent move to have cPanel simply setup /etc/smartd.conf and allow the SMART daemon to do the actual monitoring. You could even add smartd to tailwatchd for good measure.
     
  2. cPanelKenneth

    cPanelKenneth cPanel Development
    Staff Member

    Joined:
    Apr 7, 2006
    Messages:
    4,458
    Likes Received:
    22
    Trophy Points:
    38
    cPanel Access Level:
    Root Administrator
    /scripts/smartcheck uses the smartctl binary to perform the checks which , at least on CentOS 4, does not use a configuration file.

    We provide a custom configuration file via /var/cpanel/smartcheck_custom_dash_d.yam but it is only used for the -d parameter.

    I'll file a report with the developers to investigate extending this to either a full fledged configuration file or other method of providing custom runtime options to smartctl.
     
  3. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Thanks, cpanelkenneth. We already went digging through the smartctl flags to see if there was a way to suppress certain errors and just add them to /scripts/smartcheck's binary call directly. The smartctl binary itself doesn't use a config file, but when you setup the smartd service, you can use /etc/smartd.conf to control which drives are automatically checked and the config file itself has a set of it's own flags which you can use to prune the alerts.

    Relevant Documentation Snippet from /etc/smartd.conf:

    Code:
    # HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE
    #   -d TYPE Set the device type to one of: ata, scsi
    #   -T TYPE set the tolerance to one of: normal, permissive
    #   -o VAL  Enable/disable automatic offline tests (on/off)
    #   -S VAL  Enable/disable attribute autosave (on/off)
    #   -H      Monitor SMART Health Status, report if failed
    #   -l TYPE Monitor SMART log.  Type is one of: error, selftest
    #   -f      Monitor for failure of any 'Usage' Attributes
    #   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
    #   -M TYPE Modify email warning behavior (see man page)
    #   -p      Report changes in 'Prefailure' Normalized Attributes
    #   -u      Report changes in 'Usage' Normalized Attributes
    #   -t      Equivalent to -p and -u Directives
    #   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
    #   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
    #   -i ID   Ignore Attribute ID for -f Directive
    #   -I ID   Ignore Attribute ID for -p, -u or -t Directive
    #   -v N,ST Modifies labeling of Attribute N (see man page)
    #   -a      Default: equivalent to -H -f -t -l error -l selftest
    #   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
    #   -P TYPE Drive-specific presets: use, ignore, show, showall
    #    #      Comment: text after a hash sign is ignored
    #    \      Line continuation character
    # Attribute ID is a decimal integer 1 <= ID <= 255
    # All but -d, -m and -M Directives are only implemented for ATA devices
    
    Would be nice to have these options available directly within cPanel. Thanks for checking with the devs. Look forward to hearing details.
     
  4. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    Hello,

    This is essencially what is getting those errors:

    smartctl -q errorsonly -H -l selftest -l error

    Can you confirm 2 things for me?

    1) that command shows the errors you are concerned with

    2) adding these args to that command:

    -i 490 -I 490

    result in the hiding the 'attribute 190' error

    Thanks!
     
  5. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Hello cpdan,

    1) Yup:

    Code:
    [root@colorado ~]# smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda
    Please note the following marginal Attributes:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    190 Airflow_Temperature_Cel 0x0022   059   044   045    Old_age   Always   In_the_past 41 (0 3 46 33)
    
    2) Nope:

    Code:
    =======> UNRECOGNIZED OPTION: I
    
    Use smartctl -h to get a usage summary
    
    The options I posted in my last post are for the smartd.conf file directives. The -i flag is recognized by smartctl, but it doesn't have the same meaning. Within the smartd.conf -i means to ignore failures of any attributes, passing it to the binary means to print info about the device.

    Thanks for the response. Let me know if there's anything else I can do to help!
     
  6. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    The script only executes smartctl so smartd items will do any good.

    Is the error present in both of these commands?

    smartctl -q errorsonly -H -l error -d ata /dev/sda

    smartctl -q errorsonly -H -l selftest -d ata /dev/sda

    I'm looking to add support for custom flags (in addition to the -d support it already has)

    However if there is no way to get smartctl to silence unwanted errors you should consider using smartd to monitor your drives instead of/along with /scripts/smartcheck.

    You can disable the /scripts/smartcheck check by doing:
    `touch /var/cpanel/disablesmartcheck`
    and reenable it at anytime by removing that file.
     
    #6 cPDan, Apr 10, 2009
    Last edited: Apr 10, 2009
  7. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Yup, the error's present in both. Yeah, we'll probably go back to using smartd until cPanel's script is able to include some configurable options. Thanks for the help.
     
  8. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    If you could configure flags to get passed to smartctl what flags would they be that would silence the error?
     
  9. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Within the smartd.conf file, you can set them as such (SATA drive):

    Code:
    /dev/sda -d sat -S on -o on -a -I 194 -I 190 -s (S/../.././(01|05|09|13|17|21)|L/../../5/08) -m mycontact@mydomain.com
    
    This sets up the drive to be checked with an offline short test every 4 hours and a long test once each Friday at 8. The -I says to ignore attribute 194 and 190. -m is the contact email to send to.

    Full Manpage:

    http://smartmontools.sourceforge.net/man/smartd.conf.5.html
     
  10. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Sorry, just reread your question. smartctl wouldn't receive the flags directly. I believe the smartd service does some post-test parsing of the results using the flags you set in smartd.conf to decide what is/isn't worth notifying the contact about.
     
  11. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    Correct, all /scripts/smartcheck uses is smartctl it does not do anything with smartd.

    It's just a simple check-and-report, for anything more complicated it recommends that "you should consider using smartd to monitor your drives instead of/along with /scripts/smartcheck."

    There are instuctions on how to [dis/reen]able the check f you wish.

    There is a link to smartmontools in that email also.
     
  12. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    Hello again,

    Is the output you posted previously from:

    smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda

    all of the output? (i.e. the next line was the prompt again?

    Also just FYI

    Probably someone configured the -d flag for the drive in question so that smartctl reports on it as the /scripts/smartcheck hasn't really changed anytime recently.
     
  13. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    Also does the attribute erro go away w/ out -H:

    smartctl -q errorsonly -l selftest -l error -d ata /dev/sda

    thanks!
     
  14. acenetryan

    acenetryan Well-Known Member
    PartnerNOC

    Joined:
    Aug 21, 2005
    Messages:
    197
    Likes Received:
    1
    Trophy Points:
    18
    Yup, that's all the output:

    Code:
    [root@##### ~]# smartctl -q errorsonly -H -l selftest -l error -d ata /dev/sda
    Please note the following marginal Attributes:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    190 Airflow_Temperature_Cel 0x0022   059   044   045    Old_age   Always   In_the_past 41 (0 3 46 33)
    
    The error is not present when I omit the -H flag and now that I'm thoroughly testing smartd, I don't think the -I flag is actually omitting error reporting on the WHEN_FAILED value. According to the smartd manpage, the -H option for smartd reports only when the prefailure attributes are less than or equal to their THRESH values:

    So I think using smartd, period, is enough to suppress reports generated by the "WHEN_FAILED" value being In_the_past. I interpret the smartd manpage as saying smartd doesn't report on the WHEN_FAILED value at all, it checks the current Attribute values for failures. The -I 190 -I 194 flags further reduce the frivolous emails by ignoring small changes in ambient temperature which would normally be reported using the -t flag. Reports that would be generated by these values exceeding their THRESH would still be sent since -I does not ignore that attribute for -f failures.
     
    #14 acenetryan, Apr 15, 2009
    Last edited: Apr 15, 2009
  15. cPDan

    cPDan cPanel Staff
    Staff Member

    Joined:
    Mar 9, 2004
    Messages:
    711
    Likes Received:
    3
    Trophy Points:
    18
    yes, smartd is much more configurable that smartctl

    smartctl is a enable/disable, run test, dump test log util.

    smartd is much more customizable, and, well "smart" no pun intended :)

    so I'd say configure and use smartd how you want and disable /scripts/smartcheck like the email outlines
     
  16. blargman

    blargman Well-Known Member

    Joined:
    Sep 11, 2007
    Messages:
    99
    Likes Received:
    0
    Trophy Points:
    6
    If your running Centos 5.x the latest smartmontools released is working on many more sata drives without the need for -d option. I think this is the cause of the new messages. The flag wasn't being added by smartcheck before like it should have been.
     
  17. isputra

    isputra Well-Known Member

    Joined:
    May 3, 2003
    Messages:
    576
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Mbelitar
    Hi,

    Is it normal if my HD have above 40 on temperature ?

    Code:
                    S.M.A.R.T Errors on /dev/sda
                    From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda
                    Please note the following marginal Attributes:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    190 Airflow_Temperature_Cel 0x0022   056   044   045    Old_age   Always   In_the_past 44 (0 2 46 39)
                    ----END /dev/sda--
                    
                    S.M.A.R.T Errors on /dev/sdb
                    From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sdb
                    Please note the following marginal Attributes:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    190 Airflow_Temperature_Cel 0x0022   060   043   045    Old_age   Always   In_the_past 40
                    ----END /dev/sdb--
     
  18. rsong

    rsong Member

    Joined:
    Sep 5, 2007
    Messages:
    12
    Likes Received:
    0
    Trophy Points:
    1
    If i understand correctly, that alert indicate the old warning in the past.
     
Loading...

Share This Page