New server, Apache and mySQL failed overnight, chkservd hangs

ttremain

Well-Known Member
Feb 16, 2003
216
0
166
cPanel Access Level
Root Administrator
I have a new server in service (well sort of "in service")

apache, and mySQL both seem to have failed overnight...

Chkservd HTML alerts say it is hanging...


/usr/local/cpanel/logs/tailwatchd_log
[17003] [2013-01-14 07:38:21 -0800] [Cpanel::TailWatch::Eximstats] [SQLERR] Could not prepare query, logging SQL to /var/cpanel/sql
DBI connect('eximstats:localhost','eximstats',...) failed: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) at /usr/local/cpanel/Cpanel/TailWatch/Eximstats.pm line 862.


/var/log/chkservd.log

[2013-01-14 07:06:03 -0800] Disk check .... /tmp (/var/tmp) [3%] ... /dev/sda3 (/) [6%] ... /usr/tmpDSK (/tmp) [3%] ... /dev/sda1 (/boot) [23%] ... /dev/sdb1 (/backup) [1%] ... {status:eek:k} ... Done
[2013-01-14 07:06:03 -0800] Service check ....tomcat [[check command:+][socket connect:N/A]]...sshd [[check command:+][socket connect:N/A]]...spamd [[check command:+][socket connect:N/A]]...queueprocd [[check command:+][socket connect:N/A]]...named [[check command:+][socket connect:N/A]]...mysql [Service Check Started
The previous service check is still running (302 second). It will be terminated if still hanging after 3 check intervals. (1/3)
Service Check Started
The previous service check is still running (603 second). It will be terminated if still hanging after 3 check intervals. (2/3)
Service Check Started
The previous service check was still running (904 second). It was terminated.
Loading services .....cpanellogd....cpdavd....cpsrvd....exim....exim-26....ftpd....httpd....imap....ipaliases....lfd....mailman....mysql....named....queueprocd....spamd....sshd....tomcat..Done


I am not sure what to check next. I did find that mySQL was not configured for enough open files, which I have adjusted.
 

SB-Nick

Well-Known Member
Aug 26, 2008
175
9
68
cPanel Access Level
Root Administrator
Hi,

Theres no error for apache on the output you provided, have you tried searching in /etc/httpd/logs/error_log for the time the daemon got restarted by chkservd?
 

ttremain

Well-Known Member
Feb 16, 2003
216
0
166
cPanel Access Level
Root Administrator
Hi,

Theres no error for apache on the output you provided, have you tried searching in /etc/httpd/logs/error_log for the time the daemon got restarted by chkservd?
Well, it just sat there hung for 6 hours...

[Mon Jan 14 01:26:59 2013] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Mon Jan 14 07:37:48 2013] [notice] caught SIGTERM, shutting down

Obviously I should look into raising the MaxClients setting, but that does not explain why it wasn't restarted by chkservd...

I have a theory, maybe someone can confirm it, or dispell it. Chkservd could not get mySQL running (which is what it appears to have been trying to do over and over and over, and never got around to trying to restart mySQL.

mySQL was hung due to not enough open files allowed in the config.
 

SB-Nick

Well-Known Member
Aug 26, 2008
175
9
68
cPanel Access Level
Root Administrator
Hi Thomas,

The MaxClients error is probably the cause why chkservd is showing httpd as failing.
Increasing the open file limit for mysql should also solve chksrvd and mysqld restart issues.
 

ttremain

Well-Known Member
Feb 16, 2003
216
0
166
cPanel Access Level
Root Administrator
I have doubled the mysql open files, from 2048 to 4096... Yet still:

/var/lib/mysql/<hostname>.err:
130115 8:33:58 [ERROR] /usr/sbin/mysqld: Can't open file: './xxxxxxxx_xxxxxxx/xxxxxxx.frm' (errno: 24)
130115 8:38:29 [ERROR] Error in accept: Too many open files
130115 8:42:45 [ERROR] Error in accept: Too many open files
130115 8:47:01 [ERROR] Error in accept: Too many open files
130115 8:51:17 [ERROR] Error in accept: Too many open files
130115 8:55:33 [ERROR] Error in accept: Too many open files
130115 8:59:49 [ERROR] Error in accept: Too many open files
130115 9:04:05 [ERROR] Error in accept: Too many open files
130115 9:08:21 [ERROR] Error in accept: Too many open files
130115 9:12:37 [ERROR] Error in accept: Too many open files
130115 9:16:53 [ERROR] Error in accept: Too many open files
130115 9:21:09 [ERROR] Error in accept: Too many open files
130115 9:25:25 [ERROR] Error in accept: Too many open files
130115 9:29:41 [ERROR] Error in accept: Too many open files
130115 9:33:57 [ERROR] Error in accept: Too many open files
130115 9:36:20 [Note] /usr/sbin/mysqld: Normal shutdown

130115 9:36:20 [Note] Event Scheduler: Purging the queue. 0 events
130115 9:36:23 InnoDB: Starting shutdown...
130115 9:36:26 InnoDB: Shutdown completed; log sequence number 0 15991140
130115 9:36:26 [Note] /usr/sbin/mysqld: Shutdown complete

130115 09:36:26 mysqld_safe mysqld from pid file /var/lib/mysql/raptor.likeit.net.pid ended
130115 09:36:27 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130115 9:36:27 [Warning] '--log_slow_queries' is deprecated and will be removed in a future release. Please use ''--slow_query_log'/'--slow_query_log_file'' instead.
130115 9:36:27 [Note] Plugin 'FEDERATED' is disabled.
130115 9:36:27 InnoDB: Initializing buffer pool, size = 128.0M
130115 9:36:27 InnoDB: Completed initialization of buffer pool
130115 9:36:27 InnoDB: Started; log sequence number 0 15991140
130115 9:36:27 [Note] Event Scheduler: Loaded 0 events
130115 9:36:27 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.66-cll' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)

It ran the "Can't open file" error several times, then chkservd started trying to reset it...


at 9:36 I did a /scripts/restartsrv_mysql

So how come /scripts/restartsrv_mysql can reset it, but not chkservd ?

I have now doubled open files again.
I've seen conflicting info, so I'm trying both syntax:
open_files_limit=8192
open-files-limit=8192
 

SB-Nick

Well-Known Member
Aug 26, 2008
175
9
68
cPanel Access Level
Root Administrator
Hi,

I cant see any obvious mesage from chksrvd showing mysql was unable to get restarted, you may need to provide further log output.
The cant open file message is certainly because of the ulimit values, try setting to some large value such as (ensure this setting is on the [mysqld] section),

open-files-limit=120000
 

ttremain

Well-Known Member
Feb 16, 2003
216
0
166
cPanel Access Level
Root Administrator
I just have more of this in the chkservd.log:


[2013-01-15 09:13:30 -0800] Service check ....tomcat [[check command:+][socket connect:N/A]]...sshd [[check command:+][socket connect:N/A]]...spamd [[check command:+][socket connect:N/A]]...queueprocd [[check command:+][socket connect:N/A]]...named [[check command:+][socket connect:N/A]]...mysql [Service Check Started
The previous service check is still running (301 second). It will be terminated if still hanging after 3 check intervals. (1/3)
Service Check Started
The previous service check is still running (602 second). It will be terminated if still hanging after 3 check intervals. (2/3)
Service Check Started
The previous service check was still running (903 second). It was terminated.
Loading services .....cpanellogd....cpdavd....cpsrvd....exim....exim-26....ftpd....httpd....imap....ipaliases....lfd....mailman....mysql....named....queueprocd....spamd....sshd....tomcat..Done
[2013-01-15 09:28:33 -0800] Disk check .... /tmp (/var/tmp) [3%] ... /dev/sda3 (/) [6%] ... /usr/tmpDSK (/tmp) [3%] ... /dev/sda1 (/boot) [23%] ... /dev/sdb1 (/backup) [4%] ... {status:eek:k} ... Done

Any other logs I should be looking at?

I have added the open-files-limit you suggested trying, and I am reinstituting some off-server monitoring (my bad)
 

ttremain

Well-Known Member
Feb 16, 2003
216
0
166
cPanel Access Level
Root Administrator
mysql> SHOW GLOBAL STATUS LIKE '%open%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Com_ha_open | 0 |
| Com_show_open_tables | 0 |
| Open_files | 3178 |
| Open_streams | 0 |
| Open_table_definitions | 1795 |
| Open_tables | 1800 |
| Opened_files | 28295 |
| Opened_table_definitions | 9496 |
| Opened_tables | 9540 |
| Slave_open_temp_tables | 0 |
+--------------------------+-------+
10 rows in set (0.00 sec)