The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Sudden Serious Load

Discussion in 'General Discussion' started by Secret Agent, Mar 10, 2006.

  1. Secret Agent

    Secret Agent Guest

    Specs are:

    Dual Xeon 2.8GHz
    4GB Memory
    100MBit Port
    300GB SCSI drives (21% full)
    5GB /tmp partition (15% full)

    PHP 4.4.2 w/ zend optimizer and eaccelerator
    MySQL 4.1
    Apache 1.3x

    About 128 IP's on this server (resellers)

    Problem:
    Server load floats around 10 - 12 cpu suddenly past few days. Was only 2-3 cpu average usually. TOP shows nothing abnormal (see screenshots) and /tmp shows nothing unusual

    Ran rkhunter, chkrootkit, graceful reboot - nothing suspicious

    cpu/memory/mysql stats is typical, usual results

    Installed:
    APF
    BFD
    LSM
    LES
    SIM
    MOD DOSINFLATE
    MOD SECURITY
    EACCELERATOR
    MOD THROTTLE
    ZEND OPTIMIZER
    SECURED TMP AND VAR PARTITIONS

    RUBY RAILS
    FAST CGI
    EXIM ACL DICTIONARY ATTACK


    /etc/my.cnf

    #DO NOT MODIFY THE FOLLOWING COMMENTED LINES!
    #Created with ELS from www.nsonetworks.com
    #els-build=4.1
    [mysqld]
    datadir=/var/lib/mysql
    skip-locking
    #skip-networking
    safe-show-database
    query_cache_limit=1M
    query_cache_size=128M ## 32MB for every 1GB of RAM
    query_cache_type=1
    max_user_connections=200
    max_connections=500
    interactive_timeout=10
    wait_timeout=20
    connect_timeout=20
    thread_cache_size=128
    key_buffer=256M ## 64MB for every 1GB of RAM
    join_buffer=1M
    max_connect_errors=20
    max_allowed_packet=16M
    table_cache=1024
    record_buffer=1M
    sort_buffer_size=4M ## 1MB for every 1GB of RAM
    read_buffer_size=4M ## 1MB for every 1GB of RAM
    read_rnd_buffer_size=4M ## 1MB for every 1GB of RAM
    thread_concurrency=8 ## Number of CPUs x 2
    myisam_sort_buffer_size=64M
    server-id=1
    log_slow_queries=/var/log/mysql-slow-queries.log
    long_query_time=2
    collation-server=latin1_general_ci
    old-passwords

    [mysql.server]
    user=mysql
    basedir=/var/lib

    [safe_mysqld]
    err-log=/var/log/mysqld.log
    pid-file=/var/lib/mysql/mysql.pid
    open_files_limit=8192

    [mysqldump]
    quick
    max_allowed_packet=16M

    [mysql]
    no-auto-rehash
    #safe-updates

    [isamchk]
    key_buffer=256M
    sort_buffer=64M
    read_buffer=16M
    write_buffer=16M

    [myisamchk]
    key_buffer=256M
    sort_buffer=64M
    read_buffer=16M
    write_buffer=16M

    [mysqlhotcopy]
    interactive-timeout

    httpd.conf file

    Timeout 300
    KeepAlive On
    MaxKeepAliveRequests 100
    KeepAliveTimeout 15
    MinSpareServers 5
    MaxSpareServers 10
    StartServers 5
    MaxClients 300
    MaxRequestsPerChild 0

    php.ini file (resource limits)

    max_execution_time = 30
    memory_limit = 8M
    post_max_size = 55M
     

    Attached Files:

  2. Secret Agent

    Secret Agent Guest

    I'd like to add that after I did the reboot, I shut down httpd and the server load was minimal, normal. Once I restarted it, it shot back up.

    I had the config on 150 max clients entire time, I now changed it to 300 this morning, nothing changed.
     
  3. dalem

    dalem Well-Known Member
    PartnerNOC

    Joined:
    Oct 24, 2003
    Messages:
    2,577
    Likes Received:
    40
    Trophy Points:
    48
    Location:
    SLC
    cPanel Access Level:
    DataCenter Provider
    well you answered your own question you need to find the apache process that's loading the server
     
  4. Secret Agent

    Secret Agent Guest

    If I answered my own question and knew where to find the process I would not have posted this, correct?
     
  5. dalem

    dalem Well-Known Member
    PartnerNOC

    Joined:
    Oct 24, 2003
    Messages:
    2,577
    Likes Received:
    40
    Trophy Points:
    48
    Location:
    SLC
    cPanel Access Level:
    DataCenter Provider
    we ccan't find it from the forms

    strace -p <pid>
    lsof -p <pid>
    ps auxf


    i would start with th obvious the apache process thats been running for over 30 hours as seen in your top output

    and then all of your zombie processes
     
    #5 dalem, Mar 10, 2006
    Last edited: Mar 10, 2006
  6. Secret Agent

    Secret Agent Guest

    lsof -p 5009

    gave me a list of domlogs on all domains

    example:

    Code:
    httpd   5009 nobody  703w   REG     8,3         0  3119609 /usr/local/apache/domlogs/clientdomains.com-bytes_log
    httpd   5009 nobody  704w   REG     8,3         0  3118763 /usr/local/apache/domlogs/clientdomains.com-bytes_log
    httpd   5009 nobody  705w   REG     8,3      1965  3119384 /usr/local/apache/domlogs/clientdomains.clientdomains.net-bytes_log
    httpd   5009 nobody  706w   REG     8,3       179  3119557 /usr/local/apache/domlogs/clientdomains.com-bytes_log
    httpd   5009 nobody  707w   REG     8,3         0  3119610 /usr/local/apache/domlogs/beta.clientdomains.co.uk-bytes_log
    httpd   5009 nobody  708w   REG     8,3       206  3117986 /usr/local/apache/domlogs/clientdomains.com-bytes_log
    httpd   5009 nobody  709w   REG     8,3    470926  3119743 /usr/local/apache/domlogs/clientdomains.net-bytes_log
    httpd   5009 nobody  710w   REG     8,3       287  3119621 /usr/local/apache/domlogs/clientdomains.com-bytes_log
    
    
    I attached the ps auxf results

    Strace was nearly endless and mentioned numerous domains, not single one. If you need me to attach the entire file I will
     

    Attached Files:

  7. dalem

    dalem Well-Known Member
    PartnerNOC

    Joined:
    Oct 24, 2003
    Messages:
    2,577
    Likes Received:
    40
    Trophy Points:
    48
    Location:
    SLC
    cPanel Access Level:
    DataCenter Provider
    I am not inside your box so it would not do me any good

    use top, ps aux, to find the resource hogging pid

    use stace to see what script/file they are hitting

    start with the domain thats using the most resoures and work backwords shuting each site down and killing off the pid and see if the same resource hog starts up again

    is a process of elimination
     
  8. xidica

    xidica Well-Known Member

    Joined:
    Apr 21, 2005
    Messages:
    63
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Texas
    First off, you're using an extremely standard and untuned apache configuration per the following settings :

    Timeout 300
    KeepAlive On
    MaxKeepAliveRequests 100
    KeepAliveTimeout 15
    MinSpareServers 5
    MaxSpareServers 10
    StartServers 5
    MaxClients 300
    MaxRequestsPerChild 0

    I'd recommend the following be changed :

    Timeout 50-100
    KeepAliveTimeout 5-10
    MaxRequestsPerChild 1000

    The reduced timeout will help with how long child processes stay alive, and setting a finite limit on the max requests per child will help to prevent intentional and unintentional memory leaks. This is just a suggestion of course and may not be perfectly suited to your deployment.
     
  9. Secret Agent

    Secret Agent Guest

    I modified to that but it did not help
     
  10. Secret Agent

    Secret Agent Guest

    I was told that the nslookups were suspicious

    Code:
    nobody   13554  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13559  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13563  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13567  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13571  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13576  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13580  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13585  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   13589  0.0  0.0     0    0 ?        Z    14:48   0:00  |   \_ [nslookup] <defunct>
    nobody   15835  0.0  0.0     0    0 ?        Z    14:53   0:00  |   \_ [nslookup] <defunct>
    nobody   15839  0.0  0.0     0    0 ?        Z    14:53   0:00  |   \_ [nslookup] <defunct>
    nobody   15849  0.0  0.0     0    0 ?        Z    14:53   0:00  |   \_ [nslookup] <defunct>
    nobody   15853  0.0  0.0     0    0 ?        Z    14:53   0:00  |   \_ [nslookup] <defunct>
    nobody   15861  0.0  0.0     0    0 ?        Z    14:53   0:00  |   \_ [nslookup] <defunct>
    nobody   25512  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    nobody   25518  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    nobody   25522  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    nobody   25534  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    nobody   25542  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    nobody   25549  0.0  0.0     0    0 ?        Z    15:21   0:00  |   \_ [nslookup] <defunct>
    How can I trace these and stop them?
     
  11. richy

    richy Well-Known Member

    Joined:
    Jun 30, 2003
    Messages:
    276
    Likes Received:
    1
    Trophy Points:
    16
    If you enable phpSuExec (and suExec), PHP and Perl scripts will no longer run as "nobody", but as the username - which should make them easier for you to track.

    Going into:
    /proc/PID
    should help show you where nslookup is being run from.
     
  12. Secret Agent

    Secret Agent Guest

    The probem is enabling PHP SuExec is the conflict with customers' scripts. What would be required (what would I tell them to do) to work around this if enabled?

    Also, SuExec is enabled

    I never saw nslookup in TOP so I can't find a place to trace it
     
  13. Secret Agent

    Secret Agent Guest

    See attached.

    More details:

    Code:
    root@server4 [~]# cd /proc/22560
    root@server4 [/proc/22560]# ls
    /bin/ls: cannot read symbolic link cwd: No such file or directory
    /bin/ls: cannot read symbolic link root: No such file or directory
    /bin/ls: cannot read symbolic link exe: No such file or directory
    ./  ../  attr/  auxv  cmdline  cwd@  environ  exe@  fd/  maps  mem  mounts  root@  stat  statm  status  task/  wchan
    
     

    Attached Files:

  14. Secret Agent

    Secret Agent Guest

    Ok found this in /tmp

    -rw------- 1 nobody nobody 128K Mar 16 21:05 bot.tar

    How do I trace where this came from?
     
Loading...

Share This Page