The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

High Server Load Troubleshooting

Discussion in 'General Discussion' started by Major Headache, Mar 22, 2007.

  1. Major Headache

    Joined:
    Dec 7, 2005
    Messages:
    11
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    California
    I have a cpanel server through The Planet. Here are the specs:

    1 Intel \ 2.0 GHz 1333FSB - Woodcrest \ Xeon 5130 (Dual Core)
    2 Generic \ 1024 MB \ DDR2 667 FB DIMM
    1 Dell \ 9G Drive Controller - SAS/SATA \ SAS 5/i
    1 Western Digital \ 250GB:SATA2:7200RPM \ WD2500JS

    Its running the following software: Redhat Enterprise 3/Apache 1.3.37/MySQL 4.1.21/PHP 4.4.5.

    I got it on the first of the year as an upgrade replacement for a lesser celeron server, also with 2 gigs ram.

    Almost immediately I got the cpanel service from configserver. 95% of the time, it ticks like a swiss watch -- load on "top" is below 2, sites are responsive, life is good. But too often the load spikes up, usually still under 10, but sometimes, like tonight, I'll see a load over 100!

    I'm running the same sites as I was on the old box, which was basically handling things much better. On the old server I wasn't able to run any logging software or run the backups, so I thought with this upgrade I'd be able to do that. I have all that off and still have the problems.

    Of my sites, only one gets serious traffic, and it's a CMS with a fat database. I thought I improved things quite a bit when I made a robots.txt file that kept all bots out, but it's still happening periodically.

    I asked The Planet to look into it, and they told me that they installed SAR, I believe in order to monitor it and gather info, but then they dropped it. I have no idea what SAR was supposed to do, if anything.

    My main question is how can I troubleshoot this problem and nail down the cause(s). I'm reluctant to simply blame it on traffic because of the way it suddenly spikes, then suddenly dissipates. Looking at top during an "event" never shows me any one thing that is causing major load. sometimes MYSQL will be high during that time, but it will usually be high when the server load is normal too. The waitstate is always high when the load is high, but I need to know what caused that to happen in the first place.

    I sure hope someone can give me some insight into this.

    TIA
     
  2. neo_surya

    neo_surya Registered

    Joined:
    Feb 15, 2007
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    1
    Hi

    I am also having the same kind of situation. It would be great if someone suggest a better solution.
     
  3. xerophyte

    xerophyte Well-Known Member

    Joined:
    Mar 16, 2003
    Messages:
    216
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Canada
  4. ramprage

    ramprage Well-Known Member

    Joined:
    Jul 21, 2002
    Messages:
    667
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Canada
    Did you review your configuration files such as my.cnf and httpd.conf?
    When the load jumps what are in the results of the top command?

    Are you using mod_security at all?
     
  5. Major Headache

    Joined:
    Dec 7, 2005
    Messages:
    11
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    California
    Please forgive the lengthy post, I'm just trying to be as complete as I can be. None of this makes any sense to me, I'm just hoping it does to someone else reading this.

    I have taken a look at the folder where the sar reports are going. Last night I had an event I first noticed at 12:53AM. I grabbed the various reports from right before and after, except the really big report with "sar" in the title doesn't output until 8 tonight.

    first top screenshot [​IMG]
    second top screenshot [​IMG]

    This is top during a normal/medium load, but the server is still responsive [​IMG]

    iostat at 12:50

    Code:
    Linux 2.6.9-42.0.3.ELsmp (server.dfsites.net) 	03/22/07
    
    avg-cpu:  %user   %nice    %sys %iowait   %idle
              13.30    1.91    4.27    4.23   76.30
    
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda              32.44       252.73       133.94 2924599525 1549885058
    sda1              0.00         0.00         0.00       5178      12642
    sda2              6.53        19.20        48.47  222123123  560933920
    sda3             30.87       144.96       162.59 1677479708 1881501944
    sda4              0.00         0.00         0.00          2          0
    sda5             21.92        78.79       147.81  911799155 1710472704
    sda6             15.08         5.10       119.58   59041846 1383706592
    sda7              2.80         1.30        21.56   15063594  249491488
    sda8              1.06         3.38         5.07   39061873   58701032
    iostat at 12:55

    Code:
    Linux 2.6.9-42.0.3.ELsmp (server.dfsites.net) 	03/22/07
    
    avg-cpu:  %user   %nice    %sys %iowait   %idle
              13.30    1.91    4.27    4.23   76.30
    
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sda              32.44       252.73       133.95 2924629853 1550075130
    sda1              0.00         0.00         0.00       5178      12642
    sda2              6.53        19.19        48.47  222123275  560951400
    sda3             30.87       144.96       162.60 1677499716 1881580216
    sda4              0.00         0.00         0.00          2          0
    sda5             21.92        78.79       147.81  911807635 1710515768
    sda6             15.08         5.10       119.58   59041846 1383747808
    sda7              2.80         1.30        21.56   15063650  249501528
    sda8              1.06         3.38         5.07   39063505   58701032
    
    free at 12:50

    Code:
                 total       used       free     shared    buffers     cached
    Mem:          2026       1921        104          0         63        965
    -/+ buffers/cache:        892       1133
    Swap:         1992        542       1449
    free at 12:55

    Code:
                 total       used       free     shared    buffers     cached
    Mem:          2026       1876        149          0         68        928
    -/+ buffers/cache:        879       1146
    Swap:         1992        542       1449
    ps at 12:50

    ps-200703220350.txt

    ps at 12:55


    ps-200703220355.txt

    For what it's worth, here's a link to the sar output from the previous day. I'm quite sure I had outages that day too, I just can't pinpoint the time. I will post the one relevant to this outage after it appears on my machine.

    sar output

    If there's anything else I can dig up and post, just let me know.

    And yes, I do have mod_security installed (by configserver). I don't know what it does, except to note that some of my forum members end up getting their ip banned on the firewall and mod_security is given as the reason.

    Here is my my.cnf file (without comments) This one was configured by The Planet.

    Code:
    [mysqld]
    safe-show-database
    port
    socket = /var/lib/mysql/mysql.sock
    skip-locking
    key_buffer = 256M
    max_allowed_packet = 1M
    table_cache = 1024
    sort_buffer_size = 3M
    read_buffer_size = 3M
    read_rnd_buffer_size = 4M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 8
    query_cache_size= 256M
    thread_concurrency = 2
    max_connections = 800
    skip-innodb
    log-slow-queries = /var/log/mysql-slow.log
    long_query_time = 5
    
    server-id
    
    [mysqldump]
    quick
    max_allowed_packet = 16M
    
    [mysql]
    no-auto-rehash
    
    [isamchk]
    key_buffer = 128M
    sort_buffer_size = 128M
    read_buffer = 2M
    write_buffer = 2M
    
    [myisamchk]
    key_buffer = 128M
    sort_buffer_size = 128M
    read_buffer = 2M
    write_buffer = 2M
    
    [mysqlhotcopy]
    interactive-timeout
    
    [safe_mysqld]
    err-log=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    open_files_limit=8192
    
    This was my previous my.cnf file which seems more generous with the memory resources (I do have 2 GB Ram)

    Code:
    [mysqld]
    port
    socket
    skip-locking
    key_buffer = 256M
    max_allowed_packet = 1M
    table_cache = 256
    sort_buffer_size = 1M
    read_buffer_size = 1M
    read_rnd_buffer_size = 4M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 8
    query_cache_size= 16M
    thread_concurrency = 2
    max_connections = 500
    skip-innodb
    log-slow-queries = /var/log/mysql-slow.log <- I just looked, this log file is now 65MB!
    long_query_time = 5
    
    server-id = 1
    
    [mysqldump]
    quick
    max_allowed_packet = 16M
    
    [mysql]
    no-auto-rehash
    
    [isamchk]
    key_buffer = 128M
    sort_buffer_size = 128M
    read_buffer = 2M
    write_buffer = 2M
    
    [myisamchk]
    key_buffer = 128M
    sort_buffer_size = 128M
    read_buffer = 2M
    write_buffer = 2M
    
    [mysqlhotcopy]
    interactive-timeout
    and my httpd.conf file is here: httpd.conf
     
    #5 Major Headache, Mar 22, 2007
    Last edited: Mar 22, 2007
  6. fikse

    fikse Well-Known Member

    Joined:
    May 10, 2003
    Messages:
    112
    Likes Received:
    0
    Trophy Points:
    16
    when the load gets out of control like that, have you tried restarting mysql and seeing if the load drops for a bit?
     
  7. Major Headache

    Joined:
    Dec 7, 2005
    Messages:
    11
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    California
    Yes I have, and it usually does. I also will usually restart apache. And typically, once the load drops, it stays down for a long time, until the next occurrence.
     
  8. fikse

    fikse Well-Known Member

    Joined:
    May 10, 2003
    Messages:
    112
    Likes Received:
    0
    Trophy Points:
    16
    could be runaway mysql/php scripts.... I've used PRM in the past to restart servcies when they start loading up the server, before they get out of control....

    http://rfxnetworks.com/prm.php
     
  9. rkgroups

    rkgroups Member

    Joined:
    Nov 12, 2006
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    Don't know much about mysql but you can increase your apache MAXCLIENTS from 150 to 255. Hope that works little bit..
     
  10. freedog96150

    freedog96150 Well-Known Member

    Joined:
    Mar 25, 2005
    Messages:
    68
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Nevada, USA
    Do not know if this will help, but here goes... I was recently experiencing a very high server load that would occasionally crash the server. I was scratching my head, reading logs and trying to find a solution. It turns out that one of my users was running a PHP script that parsed an 80MB file MULTIPLE TIMES each time the script executed. He was running all this through a cron job. He had that cron job set to run */2 * * * * or in other words, every two minutes. Problem was that the script was taking about 180 seconds to run, so the previous job was never quite complete when the SAME file was being loaded back into memory and executed again. Caused quite a problem.

    So check all the cron tasks running for all users and compare against server load. I used the following to make a nice organized list of cron jobs by time:
    Code:
    root@server [~] cat /var/spool/cron/* | sort -ni >> filename.txt
    Then I watched the output of top at times when I saw lots of concurrent activity. For me it was a BINGO and from there it was just a matter of resolving the issue to everyone's satisfaction.

    I informed the user that I was modifying the cron task and pushed it out to every 5 minutes. That brought the load down to *almost* normal. Next, I offered to look at the PHP script and found one little teensy-tiny error that was causing the whole script to work harder than it needed to and with a single line of code changed, the script now runs in 4 seconds versus the previous 180 seconds.

    This may not fit your situation, but if you are scratching your head, you need to look at EVERY possibility.
     
  11. jayh38

    jayh38 Well-Known Member

    Joined:
    Mar 3, 2006
    Messages:
    1,215
    Likes Received:
    0
    Trophy Points:
    36
    Well.. I think the immediate fix is a reboot. Your server has been up 134 days
    and is putting a lot of power into the massive 1.8 gig swap file you have going.

    Reboot the system and you will most likely be fine. I would reboot before
    trying to troubleshoot anything as a massive swap like that would send loads
    through the roof even with very light tasks.

    good luck.
     
  12. Major Headache

    Joined:
    Dec 7, 2005
    Messages:
    11
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    California
    I have made some tweaks and things have dramtically improved, but still problematic.

    I found a script on one of my sites that was consuming a lot of memory - fixed that.

    That alone was a huge improvement. Now the problem is primarily with waitstate.

    A few times a day, without warning, the load will just ramp up from below 1 to around 20. Then it'll snap and bounce back down.

    I sent theplanet a trouble ticket asking if they could look at it and they sent me these two links:

    http://forums.theplanet.com/lofiversion/index.php/t74480.html

    http://www.daniweb.com/techtalkforums/thread7828.html

    The first, a 3 year old thread in their forums - about a different version of linux, basically says if the problem persists, submit a trouble ticket. I'm not sure I follow the circular logic of this suggestion. The second link suggests making some changes to my rc.local file, which I dutifully followed.

    I queried theplanet again, including a screenshot of top and they said it was because I was using too much swap.

    Here's the SS I sent:

    [​IMG]

    It looks to me that it's using 561,284k, with a load of 20+

    But normally, this is what top looks like:

    [​IMG]

    and there I see 633,632k being used. That's very typical on my server. Am i missing something? I did reboot the server as was suggested in the last post.

    And what steps can I take to determine the cause of the escalation?
     
    #12 Major Headache, Apr 3, 2007
    Last edited: Apr 3, 2007
  13. mctDarren

    mctDarren Well-Known Member

    Joined:
    Jan 6, 2004
    Messages:
    664
    Likes Received:
    2
    Trophy Points:
    18
    Location:
    New Jersey
    cPanel Access Level:
    Root Administrator
    Check your Mailwatch database. Is it writing to it successfully? Does it need to be purged or maybe just optimized? Sure looks to me like the combo of MailScanner/Mailwatch with your database heavy site might be overtaxing mysql... ps: Anyone running Xcart on the box?
     
    #13 mctDarren, Apr 3, 2007
    Last edited: Apr 3, 2007
  14. Major Headache

    Joined:
    Dec 7, 2005
    Messages:
    11
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    California
    I am very sorry to appear dense, but would you mind elaborating on these topics? They seem reasonable enough, I just don't know how to go about checking them. Hopefully I'm not a total noob and will be able to check these things with a minimum of guidance. :confused:

    Thanks!
     
Loading...

Share This Page