Load average spike - how do I find out why, after the event?

spaceman

Well-Known Member
Mar 25, 2002
509
5
318
G'day.

5 minute load average on my server was hovering around 1 (as it was all day), rose sharply to 20, then subsided back down to 1, all over the space of less than 10 minutes. I know this a) from ip alerts from the data center (The Planet) and b) from Munin graphs.

My question - how do I work what caused this after the event? What logs should I be looking at and what should I be looking for? Normally 'top' gives it away to me if I'm looking at it while it's occurring, but I'm not very sure about working this out from historical data.

Can anyone throw me any bones? :)

Thx.
 

spaceman

Well-Known Member
Mar 25, 2002
509
5
318
Ok, ok, answered my own question :eek:

I checked out a few log files here

/var/log/
/usr/local/apache/logs

and found the smoking gun here:

/usr/local/apache/logs/error_log

Around the time of the spike the logs were stuffed full of a php function error generated from one of the hosting accounts. The owner of the offending accounts is a friend and ex-employee of mine, bit of a php tinkerer and geek. A suitably ticked off email has been sent to him :)