Hi all,
We are running a AWS server (t2.2xlarge) with centos 7 - since oct 2019. We use cloudflare (and while we tried to resolve this issue we enabled cloudflare ddos protection - just to rule it out)
Most of the time the site seems to run fine with load averages of: 0.51 0.51 0.50
The site gets 700-900k page views a day, and each day it peaks at 2500-4000 active users (from analytics) on the site. But as mentioned - the server seems to handle these numbers fine, so we do not think the sever is not handling the load.
Twice our server crashed, and we haven't been able to work out the reason. The first time was in Sept 2020, and I asked about it here
Yesterday we had the same problem, it looks like this was causing the issue:
Our load averages went to: 1120.90 911.14 570.19
We tried restarting the server, which did nothing, the site came back, then it started hanging/crashing again. We did yum update - which also did not help.
We then tried /scripts/upcp --force which we think started to resolve the issue, after we restarted apache, MySQL, etc in WHM, before restarting the server and doing yum clean all and rpm --rebuilddb We then just kept killing all new instances of mysqld.pid as they occurred.
Once the server was back - we have been trying to look to find out what was wrong. So far we can find nothing.
Nothing stands out in the error_log, and all other logs we've checked just look normal typical things.
The only thing we can find is in our mysqld.log - both dates when the site crashed, we found this:
But we cannot find why it crashed.
The site crashed at about 16:06 UTC, that morning - we had an email from the server saying:
But we cannot see if the site tried to update again around the time the site went down. We are thinking something could be trying to update or run on the server automatically that is crashing the site?
If anyone can give us any help, we would be really grateful.
We are running a AWS server (t2.2xlarge) with centos 7 - since oct 2019. We use cloudflare (and while we tried to resolve this issue we enabled cloudflare ddos protection - just to rule it out)
Most of the time the site seems to run fine with load averages of: 0.51 0.51 0.50
The site gets 700-900k page views a day, and each day it peaks at 2500-4000 active users (from analytics) on the site. But as mentioned - the server seems to handle these numbers fine, so we do not think the sever is not handling the load.
Twice our server crashed, and we haven't been able to work out the reason. The first time was in Sept 2020, and I asked about it here
Yesterday we had the same problem, it looks like this was causing the issue:
Code:
/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
We tried restarting the server, which did nothing, the site came back, then it started hanging/crashing again. We did yum update - which also did not help.
We then tried /scripts/upcp --force which we think started to resolve the issue, after we restarted apache, MySQL, etc in WHM, before restarting the server and doing yum clean all and rpm --rebuilddb We then just kept killing all new instances of mysqld.pid as they occurred.
Once the server was back - we have been trying to look to find out what was wrong. So far we can find nothing.
Nothing stands out in the error_log, and all other logs we've checked just look normal typical things.
The only thing we can find is in our mysqld.log - both dates when the site crashed, we found this:
Code:
2020-11-20T18:55:00.531148Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 2808360152
2020-11-20T18:55:00.531156Z 0 [Note] InnoDB: Database was not shutdown normally!
2020-11-20T18:55:00.531162Z 0 [Note] InnoDB: Starting crash recovery.
The site crashed at about 16:06 UTC, that morning - we had an email from the server saying:
Code:
Cron <[email protected]> /usr/bin/kcarectl -q --auto-update
sysctl: cannot stat /proc/sys/fs/enforce_symlinksifowner: No such file or directory
sysctl: cannot stat /proc/sys/fs/symlinkown_gid: No such file or directory
If anyone can give us any help, we would be really grateful.
Last edited by a moderator: