Re kernel panics, there's a flag you can set to restart the server automatically after a kernel panic. Without this, the box will panic and just sit there hoping you or someone else will come along and look at the console messages and restart it. This could be a long wait overnight!
To setup the panic reboot timer:
Code:
echo kernel.panic = 120 >> /etc/sysctl.conf
This will ensure that the system will reboot 120 seconds after a panic rather than just staying shut down.
You can change the live kernel setting on the fly by reading or writing to /proc/sys/kernel/panic. For instance, "cat /proc/sys/kernel/panic" will display 0 usually, "echo 120 > /proc/sys/kernel/panic" will modify the current value.
The above works for Centos, I haven't tried it anywhere else, and unfortunately since setting it up a month ago I haven't had a system crash.
Also, there's a thing called netdump available that should be able to show why a server crashed by logging it to another server when the crash occurs. I don't know the details of setting it up yet, my initial half-arsed attempt failed and I haven't been able to try again. This would work well coupled with the above panic timer as you should then be able to see why the crash occurred, and then recover from it automatically.