MySQL Crashed > Graceful/Forcefull Reboot Failed > Memory shot up and SLM kicked in

tnedator

Member
Oct 20, 2007
23
0
51
I have a WHM based VPS (1152MB SLM RAM) that I just use for a small vBulletin site.

WHM 11.15.0 cPanel 11.17.0-R19434
CENTOS Enterprise 4.5 i686 on virtuozzo - WHM X v3.1.0

Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.7a mod_auth_passthrough/2.1 mod_bwlimited/1.4 PHP/5.2.4

MySQL: 5.0.45

Ok, here is my series of events.

  • 1. MySQL crashed at 4:53, RAM usage was under 400MB dedicated to apps, under 500mb committed, according to Munin (1152MB SLM VPS).
  • 2. WHM could be accessed and navigated normally, including apache status, vBulletin was failing with DB error "Can't connect to local MySQL server through socket"
  • 3. I attempted a graceful, then after about five or more minutes a forceful reboot (via WHM) - 5:30
  • 4. Processes and RAM quickly started rising ~5:30-5:35
  • 5. Ram apparently exceeded VPS limits, SLM kicked in and Munin stopped recording data.
  • 6. I restarted the VPS via Viruozzo control panel.

At the time of the Virtuozzo restart, I had putty open, and it received a "Server Unexpectedly closed network connection". On the screen at that time (and still), was

Code:
Load Average .39, .78, .76
Tasks  111 total, 1 running, 109 sleeping, 0 stopped, 1 zombie
Cpu(s) 0.0% us, .2% sy, 0.0% ni, 92.9% id, 6.9% wa
Mem:  1179648k total, 365900k used, 813748k free.
This seems to be the trigger point, and again, according to Munin, less than half of my Ram was committed at this time, and my server was at about the lowest usage it sees in a 24 hour period.

Code:
[email protected] mysql]# grep ERROR server.mydomain.com.err 
080211 4:50:02 [ERROR] Can't create thread to kill server
Apparently SLM started throwing errors like this at some point, but there are no time stamps:

Code:
kill_signal(9914.3): task 9fa418c0, thg 858ae980, sig 1
kill_signal(9914.3): task 9711b980, thg 858ae980, sig 1
kill_signal(9914.3): task 60dcb940, thg 858ae980, sig 1
kill_signal(9914.3): task b9364cc0, thg 858ae980, sig 1
kill_signal(9914.3): task 66a95940, thg 858ae980, sig 1
kill_signal(9914.3): task b3adcd00, thg 858ae980, sig 1
kill_signal(9914.3): task b17d7900, thg 858ae980, sig 1
kill_signal(9914.3): task b58a12a0, thg 858ae980, sig 1
kill_signal(9914.3): task 7c5dd2e0, thg 858ae980, sig 1
kill_signal(9914.3): task 347c2100, thg 858ae980, sig 1
kill_signal(9914.3): task 33d94c80, thg 858ae980, sig 1
kill_signal(9914.3): task adf66100, thg 858ae980, sig 1
kill_signal(9914.3): task 5522d2a0, thg 858ae980, sig 1
kill_signal(9914.3): task bcd01320, thg 858ae980, sig 1
kill_signal(9914.3): task 8abe20c0, thg 858ae980, sig 1
kill_signal(9914.3): task ba6c6c80, thg 858ae980, sig 1
kill_signal(9914.3): task 764200c0, thg 858ae980, sig 1
kill_signal(9914.3): task 739546e0, thg 858ae980, sig 1
kill_signal(9914.3): selected 1, signalled 1, queued 1, seq 8, exc 0 2 red 40652 1624
kill_signal(9914.3): selecting to kill, queued 0, seq 9, exc 0 562 goal 0 42290...
kill_signal(9914.3): selected 1, signalled 1, queued 1, seq 9, exc 0 42290 red 40652 1624
kill_signal(9914.3): task a99e4080, thg 858ae980, sig 2
kill_signal(9914.3): task 9ea92080, thg 858ae980, sig 2
kill_signal(9914.3): task 60dcad00, thg 858ae980, sig 2
kill_signal(9914.3): task 3a55f320, thg 858ae980, sig 2
kill_signal(9914.3): task 541586e0, thg 858ae980, sig 2
kill_signal(9914.3): task 23054d40, thg 858ae980, sig 2
kill_signal(9914.3): task 6041f980, thg 858ae980, sig 2
Here are two munin graphs, that show memory and processes seeming normal up until around 5:30 am (and one more graph showing apache processes that would indicate the server was at about it's lowest usage for the day):


What I am trying to figure out are three things:

  • 1. Based on the munin graphs (and the fact that apache and WHM were working normally 30 minutes after the MySQL crash), is it safe to assume that MySQL did not crash at 4:50 am due to SLM kicking in because my VPS exceeded it's 1152MB of memory?
  • 2. That for some reason the gracefult/forceful reboots via WHM failed and the processes shot up (again, shown on munin graphs around 5:30, before munin stopped tracking stats) and at that point the VPS exceeded its RAM limits and SLM kicked in.
  • 3. What the heck is going on?
 
Last edited: