Please whitelist cPanel in your adblocker so that you’re able to see our version release promotions, thanks!

The Community Forums

Interact with an entire community of cPanel & WHM users!
cPanelResources

Tutorial Troubleshooting high server loads on Linux servers

Information on how to diagnose high load averages.

  1. cPanelResources
    Technical support analysts often receive tickets about high server loads. The cause of high server loads is very rarely attributed to defects in the cPanel software or the applications it installs. High server loads are something that should be initially investigated by the server owner, their system administrator, or server provider.

    What causes high server loads?

    Excessive usage of any of the following items can typically cause this issue:

    • CPU
    • memory (including swap)
    • disk I/O

    How can I check these items?

    That depends whether you want to review their current resource usage, or historical resource usage. This tutorial will cover both.

    A brief lesson on "sar"

    Historical resource usage can be viewed using the "sar" utility, which should exist by default on all cPanel servers from the sysstat package. The stats are collected when sysstat runs from cron (/etc/cron.d/sysstat). If crond is not running, sysstat will not be able to collect historical statistics.

    To view resource usage histories from sar, you must provide the path to the file that corresponds with the date of the stats.

    For example, if you wanted to view the load averages for your server from the 23rd of the month, you would run this command:

    Code:
    [user@host ~]$ sar -q -f /var/log/sa/sa23
    The command above uses '-q' to obtain the load average information, and '-f' to specify which sar file to obtain the information from. Note that sar may not have historical data going back more than a week or so.

    You do not need to specify the date when viewing the statistics for the current day. As such, this command would show the load average for today:

    Code:
    [user@host ~]$ sar -q
    You are strongly encouraged to read the documentation for sar:

    Code:
    [user@host ~]$ man sar
    It provides statistics for many things that can be helpful to know about.


    Current CPU usage

    Run "top", and on the line that says "Cpu(s)", check the "%id" section which shows the percentage of which your CPUs are idle. The higher the number the better. A 99% idle CPU is not doing much of anything, and a 1% idle CPU is heavily tasked.

    Code:
    [user@host ~]$ top c
    Tip: hit "P" to sort by processes that are currently consuming the most CPU.

    Historical CPU usage

    Check the "%idle" column:

    Code:
    [user@host ~]$ sar -p
    Current memory usage

    Code:
    [user@host ~]$ free -m
    Tip: run "top c" and hit "M" to see which processes are consuming the most memory.

    Historical memory usage

    This depends on the version of sar, which used to use '-r' to show %memused and %swpused (swap memory used), but later changed to '-S' to show %swpused.

    Check "%memused" and "%swpused":

    Code:
    [user@host ~]$ sar -r
    OR:

    Code:
    [user@host ~]$ sar -r
    Code:
    [user@host ~]$ sar -S

    A note about memory usage: it is normal to see much of the server's memory being used. Why? Because the OS loves to cache things in memory. Why? Because accessing data from memory is extremely fast and far more efficient than using the server's disk(s).

    As such, %memused isn't generally going to be much of an issue (unless perhaps you don't have a swap partition, but that's an issue in and of itself). You should focus on %swpused, which is what gets used when your server's physical memory is full. The lower the number, the better. A %swpused percentage of 0% would mean that your server currently has sufficient physical memory to perform its tasks.

    How much %swpused is too much? That depends on your opinion of "too much". Generally speaking, a consistent low percentage of swap usage may not be an issue on your server. If you observe the %swpused increasing over time (e.g., from 1%, to 7%, to 32%), something on your server is consuming too much memory, and it would be wise to determine what that is (rather than just installing more memory). If your server ends up using all of its physical memory and swap memory, it may become unresponsive, requiring a reboot.

    Current disk I/O usage

    Note: this does not work on OpenVZ/Virtuozzo containers.

    This will print the disk usage statistics 10 times, every 1 seconds. Check the %util column:

    Code:
    [user@host ~]$ iostat -x 1 10
    Historial disk I/O usage

    Code:
    [user@host ~]$ sar -d

    Good system administration involves knowing when your server's load is higher than acceptable. The main reason for this (other than preventing your server from becoming unresponsive and requiring a reboot) is to see what's taking place on the server while the load is high. Fast actions will enable you to troubleshoot the issue while it is occurring.

    If your server's load was high from 2AM - 4AM while you were sleeping, you would have missed what took place. While sar can be helpful to show you what specific resources were high during that time, it won't tell you the cause of the high usage. There can be many causes, including DoS attacks, spam attacks, poorly designed php scripts which consume large amounts of memory, web spiders that crawl sites too aggressively, hardware issues, massive amounts of disk writes to a user's MySQL database, and much, much more.

    The good news is that you can have much of this information collected and sent to you automatically while the load is high, which you can review later as needed. How? From your process list:

    Code:
    [user@host ~]$ ps auxwwwf
    I have created a shell script for this, which is based off of a perl script that I used to run on servers that I managed. It was very useful to me in conjunction with other server monitoring (such as via Nagios). It checks 6 different things (more on this below), and emails you the current process list if any of them exceed your specific threshold.

    This script is not developed, maintained, or supported by cPanel, Inc. Please do not open tickets about this script. If you experience any issues using it and require assistance, you can post a reply here, or consult an experienced system administrator. cPanel cannot provide support for this script.


    The resources that are checked are as follows:

    • 1 minute load average
    • kilobytes of swap used
    • kilobytes of memory usage
    • packets per second inbound
    • packets per second outbound
    • number of processes


    How to use the script

    To run the script automatically, set up a cron job that executes it as often as you'd like. I found every 5 minutes to be a good fit. The script does not need to be run as root, so do not run it as root.

    If one of the resources has exceeded its user defined threshold, the script will send you an email that contains the current process list (ps auxwwwf).

    The subject line of the email will look something like this:

    Code:
    server.example.com [L: 35] [P: 237] [Swap Use: 1% ] [pps in: 54  pps out: 289]
    Each of those items is explained as follows:

    • L - the 1 minute load average
    • P - the number of processes in the process list
    • Swap Usage - the percentage of swap memory being used
    • pps in - packets per second inbound
    • pps out - packets per second outbound

    Before you use the script

    IMPORTANT: You will need to adjust the values to your liking. There are no perfect default values. Why? Because different server environments are, well, different. For example, it may be preferred to set the 1 minute load average threshold higher for a server with 16 CPU cores than a server with just 1.

    NOTE: You will need to add your email address to the "EMAIL" variable. For example:

    Code:
    EMAIL="you@example.com"
    You would also likely want to adjust the following 5 items:

    • MAX_LOAD
    • MAX_SWAP_USED
    • MAX_MEM_USED
    • MAX_PPS_OUT
    • MAX_PPS_IN


    Code:
    #!/bin/sh
    
    export PATH=/bin:/usr/bin
    
    ##########################################################################
    #                                                                        #
    #  Copyright Jeff Petersen, 2009 - 2013                                  #
    #                                                                        #
    #  This program is free software: you can redistribute it and/or modify  #
    #  it under the terms of the GNU General Public License as published by  #
    #  the Free Software Foundation, either version 3 of the License, or     #
    #  (at your option) any later version.                                   #
    #                                                                        #
    #  This program is distributed in the hope that it will be useful,       #
    #  but WITHOUT ANY WARRANTY; without even the implied warranty of        #
    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         #
    #  GNU General Public License for more details.                          #
    #                                                                        #
    #  You should have received a copy of the GNU General Public License     #
    #  along with this program.  If not, see <http://www.gnu.org/licenses/>. #
    #                                                                        #
    ##########################################################################
    
    
    ###############################################################################
    # START USER CONFIGURABLE VARIABLES
    ###############################################################################
    
    EMAIL="you@example.com"
    
    # 1 minute load avg
    MAX_LOAD=3
    
    # kB
    MAX_SWAP_USED=1000
    
    # kB
    MAX_MEM_USED=500000
    
    # packets per second inbound
    MAX_PPS_IN=2000
    
    # packets per second outbound
    MAX_PPS_OUT=2000
    
    # max processes in the process list
    MAX_PROCS=400
    
    ###############################################################################
    # END USER CONFIGURABLE VARIABLES
    ###############################################################################
    
    
    IFACE=`grep ETHDEV /etc/wwwacct.conf | awk '{print $2}'`
    if [[ "$IFACE" =~ "venet" ]] ; then
        IFACE=venet0
    fi
    
    IFACE=${IFACE}:
    
    ###############################################################################
    # 1 min load avg
    ###############################################################################
    ONE_MIN_LOADAVG=`cut -d . -f 1 /proc/loadavg`
    echo "1 minute load avg: $ONE_MIN_LOADAVG"
    
    
    ###############################################################################
    # swap used
    ###############################################################################
    SWAP_TOTAL=`grep ^SwapTotal: /proc/meminfo | awk '{print $2}'`
    SWAP_FREE=`grep ^SwapFree: /proc/meminfo | awk '{print $2}'`
    
    let "SWAP_USED = (SWAP_TOTAL - SWAP_FREE)"
    echo "Swap used: $SWAP_USED kB"
    
    
    ###############################################################################
    # mem used
    ###############################################################################
    MEM_TOTAL=`grep ^MemTotal: /proc/meminfo | awk '{print $2}'`
    MEM_FREE=`grep ^MemFree: /proc/meminfo | awk '{print $2}'`
    
    let "MEM_USED = (MEM_TOTAL - MEM_FREE)"
    echo "Mem used: $MEM_USED kB"
    
    
    ###############################################################################
    # packets received
    ###############################################################################
    PACKETS_RX_1=`grep $IFACE /proc/net/dev | awk '{print $2}'`
    sleep 2;
    PACKETS_RX_2=`grep $IFACE /proc/net/dev | awk '{print $2}'`
    
    let "PACKETS_RX = (PACKETS_RX_2 - PACKETS_RX_1) / 2"
    echo "packets received (2 secs): $PACKETS_RX"
    
    
    ###############################################################################
    # packets sent
    ###############################################################################
    PACKETS_TX_1=`grep $IFACE /proc/net/dev | awk '{print $10}'`
    sleep 2;
    PACKETS_TX_2=`grep $IFACE /proc/net/dev | awk '{print $10}'`
    
    let "PACKETS_TX = (PACKETS_TX_2 - PACKETS_TX_1) / 2"
    echo "packets sent (2 secs): $PACKETS_TX"
    
    let "SWAP_USED = SWAP_TOTAL - SWAP_FREE"
    if [ ! "$SWAP_USED" == 0 ] ; then
        PERCENTAGE_SWAP_USED=`echo $SWAP_USED / $SWAP_TOTAL | bc -l`
        TOTAL_PERCENTAGE=`echo ${PERCENTAGE_SWAP_USED:1:2}%`
    else
        TOTAL_PERCENTAGE='0%'
    fi
    
    
    ###############################################################################
    # number of processes
    ###############################################################################
    MAX_PROCS_CHECK=`ps ax | wc -l`
    
    send_alert()
    {
        SUBJECTLINE="`hostname` [L: $ONE_MIN_LOADAVG] [P: $MAX_PROCS_CHECK] [Swap Use: $TOTAL_PERCENTAGE ] [pps in: $PACKETS_RX  pps out: $PACKETS_TX]"
        ps auxwwwf | mail -s "$SUBJECTLINE" $EMAIL
        exit
    }
    
    
    if   [ $ONE_MIN_LOADAVG -gt $MAX_LOAD      ] ; then send_alert
    elif [ $SWAP_USED       -gt $MAX_SWAP_USED ] ; then send_alert
    elif [ $MEM_USED        -gt $MAX_MEM_USED  ] ; then send_alert
    elif [ $PACKETS_RX      -gt $MAX_PPS_IN    ] ; then send_alert
    elif [ $PACKETS_TX      -gt $MAX_PPS_OUT   ] ; then send_alert
    elif [ $MAX_PROCS_CHECK -gt $MAX_PROCS ] ; then send_alert
    fi
    

    Note that the process list output contains several useful columns that pertain to CPU and memory usage for each process:

    • %CPU
    • %MEM
    • VSZ
    • RSS
    • TIME (shows how long a process has existed)

    There are various actions you can take to find the cause of your high server loads. Here is a partial list that will always be incomplete:

    • Check the MySQL process list using "mysqladmin processlist" (or just "mysqladmin pr" for short)
    • Check the MySQL process list using mytop
    • tail your logs! Listening to what your server says is very important. Is your server being brute forced?
    • Run dmesg and check for possible hardware issues
    • Use netstat to view the connections to your server

    Here are some logs to check:

    • syslogs: /var/log/messages, /var/log/secure
    • SMTP logs: /var/log/exim_mainlog, /var/log/exim_rejectlog, /var/log/exim_paniclog
    • POP3/IMAP logs: /var/log/maillog
    • Apache logs: /usr/local/apache/logs/access_log, /usr/local/apache/logs/error_log, /usr/local/apache/logs/suexec_log, /usr/local/apache/logs/suphp_log
    • Website logs: /usr/local/apache/domlogs/ (use this to find sites with traffic in the last 60 seconds: find -maxdepth 1 -type f -mmin -1 | egrep -v 'offset|_log$')
    • cron logs: /var/log/cron

    Please feel free to post questions, comments, and anything else about troubleshooting server loads by clicking on the Discussion tab. This resource will inevitably be missing some other useful troubleshooting items, and your comments are encouraged.

    I hope you find this useful.

    Thanks!
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice