Originally posted by Sash
Here is the solution that fixed the problem for us and rpmws:
Add the following to /etc/rc.d/rc.local
echo 64000 > /proc/sys/fs/file-max
ulimit -u unlimited
ulimit -S -H -n 4096
Reboot the server
We had a server that looked like it was starting into this cycle of going down for no apparent reason, so we tried this fix.
We added the code to the file and before the server could be rebooted, it crashed. It needed a hard reboot at the NOC and came back with this error:
Error failed to mount dir or dir not found.
The NOC said the hard drive would only come back up in read-only.
12 hours later, we finally had the server back online after changing out the hard drive, re-installing cpanel and all of the accounts.
This is what our NOC had to say about the reason for the catastrophe:
"yes it was that code - Im not 100% sure but I think
that is dependent on the size and blocks of the hard drive - if you have the same size drive they did, it would probably work fine - they might have had an 80g or something like that - or a smaller drive with different block size"
Here is what a systems administrator had to say about the situation:
"We have access to several 80gb and 40gb servers, all have the same file-max limits (cannot be drive dependant or one would have crashed). Below is exactly what the file-max is for, it is kernel and ram related not harddrive related.
The file-max file /proc/sys/fs/file-max sets the maximum number of
file-handles that the Linux kernel will allocate. We generally tune this file to improve the number of open files by increasing the value of /proc/sys/fs/file-max to something reasonable like 256 for every 4M of RAM we have: i.e. for a machine with 128 MB of RAM, set it to 8192 - 128/4=32 32*256=8192.
The default setup for the file-max parameter under Red Hat Linux is: "4096" "
Here is a quote from one of the Unix Engineers that I asked about this issue:
"It's not harddrive related. Maximum number of file-handles has to do with memory, you want use it you set 256 for every 4MB or memory. In order to use 64000, you need 1GB of memory. Those applies to kernel 2.2x only.
for kernel 2.4x (redhat) this is best way you set:
edit /etc/sysctl.conf, and add the following line:
e.g. fs.file-max = 64000"
I'm not convinced at all that it was this change in the code that took the server down, but it is very odd that it happened just after the file was modified.
Can anybody (Sash?) shed any light on what happened to us?
cPanel.net Support Ticket Number: