Disk I/O Utilisation High and Errors in Logs - Site Unusable

heusdens

Active Member
Oct 1, 2013
25
0
1
cPanel Access Level
Root Administrator
hi

Just a few days ago my server started to show some high average loads but yesterday it really went out of control and I have battled to keep it up since then.

I am experiencing a disk I/O of over 100%. There is an insane amount of writing going on. In addition the average load are in the double digits, so much so that I can't even get into WHM unless I reboot the Virtual server via the hosting control panel.

In my messages log I find something like this:
Code:
Jan  6 04:43:46 server1 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan  6 04:43:46 server1 kernel: ata1.00: failed command: WRITE DMA
Jan  6 04:43:46 server1 kernel: ata1.00: cmd ca/00:60:50:9c:57/00:00:00:00:00/e3 tag 0 dma 49152 out
Jan  6 04:43:46 server1 kernel:         res 40/00:01:06:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jan  6 04:43:46 server1 kernel: ata1.00: status: { DRDY }
Jan  6 04:43:46 server1 kernel: ata1: soft resetting link
Jan  6 04:43:46 server1 kernel: ata1.00: configured for MWDMA2
Jan  6 04:43:46 server1 kernel: ata1: EH complete
Jan  6 04:47:09 server1 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan  6 04:47:09 server1 kernel: ata1.00: failed command: WRITE DMA
Jan  6 04:47:09 server1 kernel: ata1.00: cmd ca/00:58:c0:71:8a/00:00:00:00:00/e3 tag 0 dma 45056 out
Jan  6 04:47:09 server1 kernel:         res 40/00:01:06:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jan  6 04:47:09 server1 kernel: ata1.00: status: { DRDY }
Jan  6 04:47:09 server1 kernel: ata1: soft resetting link
Jan  6 04:47:09 server1 kernel: ata1.00: configured for MWDMA2
Jan  6 04:47:09 server1 kernel: ata1: EH complete
Jan  6 04:49:23 server1 kernel: Clocksource tsc unstable (delta = -8589720273 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
Jan  6 04:49:23 server1 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan  6 04:49:23 server1 kernel: ata1.00: failed command: WRITE DMA
Jan  6 04:49:23 server1 kernel: ata1.00: cmd ca/00:00:b0:75:8a/00:00:00:00:00/e3 tag 0 dma 131072 out
Jan  6 04:49:23 server1 kernel:         res 40/00:01:06:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jan  6 04:49:23 server1 kernel: ata1.00: status: { DRDY }
Jan  6 04:49:23 server1 kernel: ata1: soft resetting link
Jan  6 04:49:23 server1 kernel: ata1.00: configured for MWDMA2
Jan  6 04:49:23 server1 kernel: ata1: EH complete
The error log shows the following strange entries:
Code:
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'localhost' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'server1.xxx.info' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'localhost' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'server1.xxx.info' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'localhost' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
DBD::mysql::st execute failed: There is no such grant defined for user '' on host 'server1.xxx.info' at /usr/local/cpanel/Cpanel/MysqlUtils/Connect.pm line 82.
I also suspect this is when it all started:

Code:
[2014-01-03 16:22:58 +0200] warn [cpses_tool] Timeout of authentication at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 63
 at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 72
	Cpanel::MysqlUtils::NetMySQL::connect('dbuser', 'root', 'dbserver', 'localhost', 'database', 'mysql', 'debug', 0, 'dbpass', 'xxx') called at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 29
	Cpanel::MysqlUtils::NetMySQL::new('Cpanel::MysqlUtils', 'database', 'mysql', 'dbuser', 'root', 'dbpass', 'xxx', 'dbserver', 'localhost', 'debug', 0) called at /usr/local/cpanel/Cpanel/Mysql.pm line 113
	eval {...} called at /usr/local/cpanel/Cpanel/Mysql.pm line 113
	Cpanel::Mysql::_get_dbh(Cpanel::Mysql=HASH(0x271e000), 'localhost', 'root', 'xxx') called at /usr/local/cpanel/Cpanel/Mysql.pm line 77
	Cpanel::Mysql::new('Cpanel::Mysql', HASH(0x2a5f650)) called at bin/cpses_tool line 182
	bin::cpses_tool::action_CLEANUPSESSIONS('bin::cpses_tool', HASH(0x27652e0)) called at bin/cpses_tool line 73
	bin::cpses_tool::process_request(HASH(0x27652e0)) called at bin/cpses_tool line 60
	bin::cpses_tool::script('bin::cpses_tool') called at bin/cpses_tool line 36
[2014-01-03 16:23:11 +0200] warn [cpses_tool] Error while connecting to MySQL: Timeout of authentication at /usr/local/cpanel/Cpanel/Mysql.pm line 1600
	Cpanel::Mysql::_log_error_and_output(Cpanel::Mysql=HASH(0x271e000), 'Error while connecting to MySQL: [_1]', 'Timeout of authentication') called at /usr/local/cpanel/Cpanel/Mysql.pm line 117
	Cpanel::Mysql::_get_dbh(Cpanel::Mysql=HASH(0x271e000), 'localhost', 'root', 'xxx') called at /usr/local/cpanel/Cpanel/Mysql.pm line 77
	Cpanel::Mysql::new('Cpanel::Mysql', HASH(0x2a5f650)) called at bin/cpses_tool line 182
	bin::cpses_tool::action_CLEANUPSESSIONS('bin::cpses_tool', HASH(0x27652e0)) called at bin/cpses_tool line 73
	bin::cpses_tool::process_request(HASH(0x27652e0)) called at bin/cpses_tool line 60
	bin::cpses_tool::script('bin::cpses_tool') called at bin/cpses_tool line 36
[2014-01-03 16:27:13 +0200] warn [cpses_tool] Lock on /var/cpanel/cpanel.config.lock lost! at /usr/local/cpanel/Cpanel/SafeFile.pm line 159
	Cpanel::SafeFile::safeunlock(Cpanel::SafeFileLock=ARRAY(0x2945640)) called at /usr/local/cpanel/Cpanel/SafeFile.pm line 77
	Cpanel::SafeFile::safeclose(IO::Handle=GLOB(0x281d7c0), Cpanel::SafeFileLock=ARRAY(0x2945640)) called at /usr/local/cpanel/Cpanel/Config/LoadCpConf.pm line 179
	Cpanel::Config::LoadCpConf::loadcpconf() called at /usr/local/cpanel/Cpanel/Locale.pm line 278
	Cpanel::Locale::get_server_locale() called at /usr/local/cpanel/Cpanel/Locale/Utils/User.pm line 192
	Cpanel::Locale::Utils::User::get_user_locale('root') called at /usr/local/cpanel/Cpanel/Locale/Utils/User.pm line 35
	Cpanel::Locale::Utils::User::init_cpdata_keys() called at (eval 3) line 1
	eval ' Cpanel::Locale::Utils::User::init_cpdata_keys(); \x0A;' called at /usr/local/cpanel/Cpanel/Locale.pm line 38
	Cpanel::Locale::preinit() called at /usr/local/cpanel/Cpanel/Locale.pm line 99
	Cpanel::Locale::get_handle('Cpanel::Locale') called at /usr/local/cpanel/Cpanel/Mysql.pm line 70
	Cpanel::Mysql::new('Cpanel::Mysql', HASH(0x2b3f530)) called at bin/cpses_tool line 182
	bin::cpses_tool::action_CLEANUPSESSIONS('bin::cpses_tool', HASH(0x28463b0)) called at bin/cpses_tool line 73
	bin::cpses_tool::process_request(HASH(0x28463b0)) called at bin/cpses_tool line 60
	bin::cpses_tool::script('bin::cpses_tool') called at bin/cpses_tool line 36
[2014-01-03 16:28:15 +0200] warn [cpses_tool] Couldn't connect to localhost:3306/tcp: IO::Socket::INET: connect: Connection refused at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 63
 at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 72
	Cpanel::MysqlUtils::NetMySQL::connect('debug', 0, 'dbserver', 'localhost', 'dbpass', 'xxx', 'database', 'mysql', 'dbuser', 'root') called at /usr/local/cpanel/Cpanel/MysqlUtils/NetMySQL.pm line 29
	Cpanel::MysqlUtils::NetMySQL::new('Cpanel::MysqlUtils', 'database', 'mysql', 'dbuser', 'root', 'dbpass', 'xxx', 'dbserver', 'localhost', 'debug', 0) called at /usr/local/cpanel/Cpanel/Mysql.pm line 113
	eval {...} called at /usr/local/cpanel/Cpanel/Mysql.pm line 113
	Cpanel::Mysql::_get_dbh(Cpanel::Mysql=HASH(0x2801250), 'localhost', 'root', 'xxx') called at /usr/local/cpanel/Cpanel/Mysql.pm line 77
	Cpanel::Mysql::new('Cpanel::Mysql', HASH(0x2b3f530)) called at bin/cpses_tool line 182
	bin::cpses_tool::action_CLEANUPSESSIONS('bin::cpses_tool', HASH(0x28463b0)) called at bin/cpses_tool line 73
	bin::cpses_tool::process_request(HASH(0x28463b0)) called at bin/cpses_tool line 60
	bin::cpses_tool::script('bin::cpses_tool') called at bin/cpses_tool line 36
[2014-01-03 16:28:15 +0200] warn [cpses_tool] Error while connecting to MySQL: Couldn't connect to localhost:3306/tcp: IO::Socket::INET: connect: Connection refused at /usr/local/cpanel/Cpanel/Mysql.pm line 1600
	Cpanel::Mysql::_log_error_and_output(Cpanel::Mysql=HASH(0x2801250), 'Error while connecting to MySQL: [_1]', 'Couldn\'t connect to localhost:3306/tcp: IO::Socket::INET: connect: Connection refused') called at /usr/local/cpanel/Cpanel/Mysql.pm line 117
	Cpanel::Mysql::_get_dbh(Cpanel::Mysql=HASH(0x2801250), 'localhost', 'root', 'xxx') called at /usr/local/cpanel/Cpanel/Mysql.pm line 77
	Cpanel::Mysql::new('Cpanel::Mysql', HASH(0x2b3f530)) called at bin/cpses_tool line 182
	bin::cpses_tool::action_CLEANUPSESSIONS('bin::cpses_tool', HASH(0x28463b0)) called at bin/cpses_tool line 73
	bin::cpses_tool::process_request(HASH(0x28463b0)) called at bin/cpses_tool line 60
	bin::cpses_tool::script('bin::cpses_tool') called at bin/cpses_tool line 36
Building global cache for cpanel...Done
Any idea where to start?
 

cPanelPeter

Senior Technical Analyst
Staff member
Sep 23, 2013
585
25
153
cPanel Access Level
Root Administrator
Hello,

You most likely have a hard drive problem. The first log you provided, clearly indicates DMA write errors on ata1. Hard drive is slowly failing and needs to be replaced. Once that is done, see if the other problems go away.
 

heusdens

Active Member
Oct 1, 2013
25
0
1
cPanel Access Level
Root Administrator
Thank you Peter

My host says I'm on a shared SAN and is subject to abuse now and then, but that there is no abuse and therefore will be moving me to another hypervisor to see if that resolves the issue.

I'm not entirely convinced of this. Even restarting my virtual server does not solve the issue.

What do you make of the other logs that I posted?
 

JaredR.

Well-Known Member
Feb 25, 2010
1,834
27
143
Houston, TX
cPanel Access Level
Root Administrator
My host says I'm on a shared SAN and is subject to abuse now and then
This:

Code:
Jan  6 04:47:09 server1 kernel: ata1.00: failed command: WRITE DMA
Jan  6 04:47:09 server1 kernel: ata1.00: cmd ca/00:58:c0:71:8a/00:00:00:00:00/e3 tag 0 dma 45056 out
Jan  6 04:47:09 server1 kernel:         res 40/00:01:06:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jan  6 04:47:09 server1 kernel: ata1.00: status: { DRDY }
is a sign of a hardware problem, not "abuse". It is happening a lot, based on the logs you provided. You need to ask your host to run diagnostics on each actual hard drive, because when you start to see that kind of error message, hardware failure may be imminent.

This is not something that cPanel would have any control over. It is happening at a deeper level, in the hardware, and the hardware needs to be carefully investigated before a hard drive is lost (and your data with it).
 

heusdens

Active Member
Oct 1, 2013
25
0
1
cPanel Access Level
Root Administrator
Thanks

I agree with you, I've been moved over to another machine and everything is running smooth now.

Just another thing they picked up was they saw I was running a debug version of the Kernel. I actually upgraded the Kernel yesterday, this was subsequent to these issues so it was not the primary issue. Security Advisor in Cpanel recommended I upgrade to 2.6.32-431.3.1.el6 which I did. Hosting support changed it to the "normal" kernel in the bootloader.

How would I ensure that the debug version is not selected in the reboot going forward? How does one check the difference between the debug and normal version and ensure it's configured properly in the bootloader?