High server loads - no idea about the reason

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
Hi,
I am using Dual Xeon 2.4 GHz Machine with 1 GIG RAM and 2 80 GIG IDE Hard Disks...
I have around 400 sites running on the server..

Now the problem is, since last few days server load reaches around 80+...

When I checked the processes I could not find anything wrong....

Following is the output of top...

Code:
 03:09:12  up 13 days,  2:02,  1 user,  load average: 86.69, 75.87, 49.11
635 processes: 630 sleeping, 1 running, 3 zombie, 1 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.6%    1.1%    0.9%   0.4%     0.2%   96.5%    0.0%
           cpu00    2.7%    0.9%    3.7%   0.0%     0.0%   92.5%    0.0%
           cpu01    0.0%    2.7%    0.0%   1.8%     0.0%   95.3%    0.0%
           cpu02    0.0%    0.9%    0.0%   0.0%     0.9%   98.1%    0.0%
           cpu03    0.0%    0.0%    0.0%   0.0%     0.0%  100.0%    0.0%
Mem:  1025320k av, 1007264k used,   18056k free,       0k shrd,   18092k buff
                    733688k actv,  137564k in_d,   14792k in_c
Swap: 2040212k av,  761140k used, 1279072k free                   98572k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
30401 root      18   0  1600 1600   884 R     1.6  0.1   0:00   0 top
17882 nobody    19   4 20240  14M  2896 D N   0.4  1.4   0:00   1 httpd
17235 nobody    20   4 17160  11M  2852 D N   0.2  1.1   0:00   1 httpd
17278 nobody    19   4 22024  15M  2940 D N   0.2  1.5   0:01   0 httpd
17389 nobody    19   4 20188  14M  2916 D N   0.2  1.4   0:00   2 httpd
For some reason vbulletin is not allowing me to post the whole ouput...following is the link to download the output..

http://rapidshare.de/files/11710180/top.txt.html

in the stats iowait time reaches too high..touches 100% in the last cpu..

Can anyone help me out with this issues?
 
Last edited:

randomuser

Well-Known Member
Jun 25, 2005
146
0
166
Is that the entire ps aux output you put on rapidshare? If so something is obviously horribly wrong:

635 processes: 630 sleeping, 1 running, 3 zombie, 1 stopped

That's quite a lot of processes. With 1.2G of swap left, I wouldn't say this is a memory issue, although you've only got 18M of physical memory left to use in that output, which could be a little better. With those iowait states, something is writing to your drive like crazy. I'd be curious to see a few lines of output from "vmstat 1". Try stopping a service and see if that helps, like MySQL. If that doesn't help, try stopping Apache, and so on. Could be someone's error_log is getting written to constantly. Have you noticed a significant, sudden decrease in drive space recently? If so, what partition is losing space (/home? /var? etc) What I'd really like to see is all 630+ process in the "ps auxwwwf" format.
 

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,437
31
473
Go on, have a guess
Actually I'd be pretty worried about the amount of swap used (761140k used) as well as the ridiculously high process count. Although the latter is probably the cause of the former. Looks like you've got something spawning processes ad infinitum.
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
randomuser said:
Is that the entire ps aux output you put on rapidshare? If so something is obviously horribly wrong:

635 processes: 630 sleeping, 1 running, 3 zombie, 1 stopped

That's quite a lot of processes. With 1.2G of swap left, I wouldn't say this is a memory issue, although you've only got 18M of physical memory left to use in that output, which could be a little better. With those iowait states, something is writing to your drive like crazy. I'd be curious to see a few lines of output from "vmstat 1". Try stopping a service and see if that helps, like MySQL. If that doesn't help, try stopping Apache, and so on. Could be someone's error_log is getting written to constantly. Have you noticed a significant, sudden decrease in drive space recently? If so, what partition is losing space (/home? /var? etc) What I'd really like to see is all 630+ process in the "ps auxwwwf" format.
Actually likle chirpy said someone might be trying to hack some forum or something...but the problem was I did not have enough resources to trace the problem...If I knew the correct command then I would definitely traced the stuff...anyways now back to the point...

Currently server is running fine, I will run the vmstat command once I see the high load again..

About suddent decrease in drive space - nopes everything seem to be fine without much of changes...

If it happens next time then I will def. run both the commands provided by you and post the output...

Thanks to both for helping me..I will update this thread if problem occurs again...

btw anything else should i do when it happens again?

Thanks
Deep
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
Here we go..sevrer is loaded again..this time I did not miss the stuff mentioned by randomuser

top output
vmstat 1 output
ps auxwwwf output (there were too many lines, it didnt show many lines from top)

The load went down when I stopped httpd, this means someone is attacking the server / site hosted on server?

Any solution to this?

Thanks
Deep

Edit: It happens daily around 230PM IST...but I do not see any cron jobs running at that time...
 
Last edited:

mctDarren

Well-Known Member
Jan 6, 2004
662
6
168
New Jersey
cPanel Access Level
Root Administrator
Check your domlogs to see who was requesting what pages during the high load period. Wondering if you have a spammer running a perl or php script and hogging httpd...?
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
webtiva said:
Check your domlogs to see who was requesting what pages during the high load period. Wondering if you have a spammer running a perl or php script and hogging httpd...?


Edit: Sorry I misread it..
Actually domlog wont be possible here because I have around 400 sites and checking logs of each and every site will be next to impossible....

Is there any way to trace? may be at the time when this load is high?

Thanks
Deep
 
Last edited:

MMarko

Well-Known Member
Apr 18, 2005
316
0
166
hmm said:
The load went down when I stopped httpd, this means someone is attacking the server / site hosted on server?
Something is wrong withing apache. I had problems with htaccess. Have do optimize apache?
 

mctDarren

Well-Known Member
Jan 6, 2004
662
6
168
New Jersey
cPanel Access Level
Root Administrator
To find your domain logs do 'updatedb; locate domlogs' .Your domain logs are often located in /usr/local/apache/domlogs/ but not always. Next do 'man grep' or google the command grep and learn about how it can help you in this situation. Then grep the logs for the time/dates your load was heavy. You should be able to at least see what, if any, pages were being accessed at the time. Hope that helps

[edit] Aha, you edited post while I was replying. :) You can still grep the logs to see what was going on. But you need to know how to use grep first. Then you can scan all the logs at once within a certain hour to see who was looking at what.[edit]
 
Last edited:

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
Hi,
I think it will display too many lines if I grep the folder with the time string...coz there are around 400 sites...

Any other solution for this?

Thanks
Deep
 
Last edited:

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
The load went up again..

I restarted httpd..the load went down...

I tried to store log of that time using grep command but it was taking too much time and increasing the load...so I had to cancel it in between

one thing I noticed..in my mod_security logs I found many attacks on one site between that time...(mambo attacks)

I have currently made the site offline and asked the owner of the site to fix it...

but now again the load is high and I can see one process pkgacct run by root in the process and when I go to the process id folder in proc..it shows me following stuff...

Code:
dr-xr-xr-x    3 root     root            0 Jan 27 03:49 ./
dr-xr-xr-x  279 root     root            0 Jan 10 20:07 ../
-r--r--r--    1 root     root            0 Jan 27 03:49 cmdline
-r--r--r--    1 root     root            0 Jan 27 03:49 cpu
lrwxrwxrwx    1 root     root            0 Jan 27 03:49 cwd -> /root/
-r--------    1 root     root            0 Jan 27 03:49 environ
lrwxrwxrwx    1 root     root            0 Jan 27 03:49 exe -> /usr/bin/perl*
dr-x------    2 root     root            0 Jan 27 03:49 fd/
-r--------    1 root     root            0 Jan 27 03:49 maps
-rw-------    1 root     root            0 Jan 27 03:49 mem
-r--r--r--    1 root     root            0 Jan 27 03:49 mounts
lrwxrwxrwx    1 root     root            0 Jan 27 03:49 root -> //
-r--r--r--    1 root     root            0 Jan 27 03:49 stat
-r--r--r--    1 root     root            0 Jan 27 03:49 statm
-r--r--r--    1 root     root            0 Jan 27 03:49 status
Do you think it can be anything suspicious? I know ppkgacct is to tar files but what wonders me is.../root/ as folder and perl script doing it...

My perl version is latest i.e. 5.8.7...

Any ideas?

Deep
 
Last edited:

simplestar

Well-Known Member
Nov 15, 2005
97
0
156
Looking at the resource dispersement is fine but you really need to check your logs to find out what is happening around the times your server is having problems in order to fix it. I, personally have been getting hit quite hard by various no-good exploits, attacks, etc. and that might be the case for you too. Alos, are you running any type of rootkit checks?


PHP:
/var/log/messages
/usr/local/apache/logs/error_logs
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
Hi,
Server is protected by APF, BFD, RKhunter, mod_security...and is totally updated with latest versions of softwares....

I will have a check on the logs coz this problem is not stopping at all...I have contacted chirpy's site for paid support but still waiting for his reply.

Deep
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
simplestar said:
Looking at the resource dispersement is fine but you really need to check your logs to find out what is happening around the times your server is having problems in order to fix it. I, personally have been getting hit quite hard by various no-good exploits, attacks, etc. and that might be the case for you too. Alos, are you running any type of rootkit checks?


PHP:
/var/log/messages
/usr/local/apache/logs/error_logs
YOU ROCK...
I overlooked error_log file
Just checked it..found one script creating php errors and increasing the load...

I have suspended that account.. :)

Thanks a bunch..the problem seem to have fixed now :)

Deep
 

hmm

Well-Known Member
Jan 11, 2006
56
0
156
India
Hi,
It was custom script creting problems with feof and fgets function...
I deleted it...

2nd problem was mambo exploit -- one site was getting attacked every other minute by some bot..so I had to delete that site....and ask client to point their name servers to somewhere else...

Regards,
Deep