JohnnyBgood

Member
Feb 6, 2015
22
4
53
cPanel Access Level
Root Administrator
Hi all,

We are running a popular site - on an AWS t2.2xlarge - it has 32gb of memory.

The server loads all look fine - but when the active users pass 2300 - the site starts to run really slow. It's fine at just under 2300 (it's usally at 2000 most of the day) - but it goes up to 2500+, then the site nearly stops.

We cannot see anything in whm that shows where the problem is - such as a certain page causing the issue.

Looking in the whm > server status > daily process logs I see this:

UserDomain% CPU% MEMMySQL Processes
nobody5.3857.560.0


Any advice on how to track down where the memory is being used would be great!
(AWS do not allow us to add more RAM, the only thing we could do is upgrade the instance if that is just the typical memory used for 2500 current users)

Thanks again,
 
Last edited:

rackaid

Well-Known Member
Jan 18, 2003
89
28
168
Jacksonville, FL
cPanel Access Level
DataCenter Provider
You will likely not be able to use WHM stats to track down the problem. Use Cloudwatch metrics and systat on your server.

Here's a couple of areas to check.

CPU Steal:
If you run "sar" on your server, see if you have a high %steal in the CPU column. t2 instances use burstable CPU credits. If you exhaust your credits, the system will operate at baseline. If your application is CPU-bound, this could cause the behavior.

We had a discussion forum exhibit this problem. The site would run fine, but once traffic spikes happened, the system would grind to a halt. We noticed very high %steal during these periods.

If you find you are CPU bound, consider switching to the C5 instance types. These do not use the burstable features. You can more easily debug CPU-bound errors. We found that we could actually drop our RAM requirements because the C5 instances had higher throughput (req/sec). As a result, concurrency dropped. The reduced concurrency meant we needed less RAM.

Concurrency
You may be hitting some max connection limits in Apache/PHP-FPM (or whatever you are using). I recommend checking the Apache status page to check req/sec. Also, look for connections stuck in the W state. The latter can fill up the connection slots and even with low loads cause the site not to respond. Your apache logs will tell you if you are hitting these limits.

App Stack?
What is the application stack you are running?
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston

JohnnyBgood

Member
Feb 6, 2015
22
4
53
cPanel Access Level
Root Administrator
Thanks so much for the detailed reply @rackaid - you gave some great things to try there.

It is our peak time right now - I changed our apache settings to:

Server Limit (Maximum: 20,000): 4000
Max Request Workers: 2000

This has allowed the server to take the current 2500 concurrent users, but our apache status shows:

Code:
Total accesses: 8504 - Total Traffic: 181.6 MB - Total Duration: 848694
CPU Usage: u133.41 s26.83 cu129.23 cs98.9 - 191% CPU load
41.9 requests/sec - 0.9 MB/second - 21.9 kB/request - 99.7994 ms/request
173 requests currently being processed, 15 idle workers
Our sar steal is still showing 0.17% - and server loads are Load Averages: 1.92 1.82 1.61


In anyones opinion - is this a php code problem, a server config problem, or just we need to change aws instance to higher one?

Thanks so much.
 
Last edited:

rackaid

Well-Known Member
Jan 18, 2003
89
28
168
Jacksonville, FL
cPanel Access Level
DataCenter Provider
You need to check all systems involved in your application. This likely means MySQL, PHP-FPM (FastCGI), and Apache.

If you are hitting Max Connections, you should see the error in your apache logs. In the database, you need to check for the peak connection usage. If you are using PHP-FPM/FastCGI, there are similar limits and errors.

If you suspect resource issues, a quick test is just to apply more resources (try a C5.2XL instance).

If you continue to see poor performance, then you really have to dig into the system to find the problem.