Internal Server Error on cloudlinux

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
I'm testing CloudLinux with cPanel

I have only one site on my server. Today I have Internal Server Error on this site.

My apache status looks like:

Current Time: Thursday, 29-Mar-2012 16:58:45 CEST
Restart Time: Wednesday, 28-Mar-2012 09:53:56 CEST
Parent Server Generation: 18
Server uptime: 1 day 7 hours 4 minutes 49 seconds
Total accesses: 21481 - Total Traffic: 170.0 MB
CPU Usage: u1.21 s.51 cu0 cs0 - .00154% CPU load
.192 requests/sec - 1593 B/second - 8.1 kB/request
1 requests currently being processed, 124 idle workers

_________________________.......................................
_________________________.......................................
_________________________.......................................
_________________________.......................................
________W________________.......................................
................................................................
................................................................
................................................................

so I have only processes in state "Waiting for Connection"

Connection is only one ... and nothing more. What limit should it be ?

There is many processes of php:

220311 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
226647 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
228701 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
237433 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
239140 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
245895 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
251575 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
255216 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
255369 (Trace) (Kill) smart 0 0.0 0.4 /usr/bin/php
187780 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
188442 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
191328 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
193113 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
193448 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
195743 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
201294 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
207729 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
212233 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
212580 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
221320 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
223370 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
234052 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
236968 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
252737 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
260328 (Trace) (Kill) smart 0 0.0 0.3 /usr/bin/php
236501 (Trace) (Kill) smart 0 0.0 0.2 /usr/bin/php

when I kill one of php process owned by this user - the webpage works correctly.

WHM 11.30.6 (build 6) [TRIAL]
CLOUDLINUX 6.2 x86_64 standard on michael

I have:
Server Version: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/1.0.0-fips mod_bwlimited/1.4 mod_fcgid/2.3.6
Server Built: Mar 21 2012 14:33:15

Why it is so ... I think is very unstable for production ... ? When apache stuck like that ?

WB
 

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
[email protected] [/etc/httpd/domlogs/smart]# uname -a
Linux michael.smarthost.pl 2.6.32-231.21.1.lve0.9.18.1.x86_64 #1 SMP Thu Jan 5 06:59:41 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

===========

[email protected] [/etc/httpd/domlogs/smart]# lveinfo --by-fault=mem
ID aCPU mCPU lCPU aEP mEP lEP aMem mMem lMem MemF MepF

(but now I have any processes)

==========

ro[email protected] [/etc/httpd/domlogs/smart]# lveps -p
ID EP PNO PID TNO TID CPU MEM I/O
bofh 0 4 --- 4 --- 80 131240 0
--- --- 253180 --- 253180 0 6472 N/A
--- --- 232341 --- 232341 0 6780 N/A
--- --- 222889 --- 222889 0 6464 N/A
--- --- 199842 --- 199842 0 6416 N/A
daniel 0 1 --- 1 --- 12 53276 0
--- --- 252321 --- 252321 0 19344 N/A
smart 0 27 --- 27 --- 3391002348 0
--- --- 260920 --- 260920 0 14424 N/A
--- --- 260328 --- 260328 0 13112 N/A
--- --- 255369 --- 255369 2 12988 N/A
--- --- 255216 --- 255216 1 16848 N/A
--- --- 252737 --- 252737 0 14252 N/A
--- --- 251575 --- 251575 3 13064 N/A
--- --- 245895 --- 245895 5 14932 N/A
--- --- 239140 --- 239140 1 14704 N/A
--- --- 237433 --- 237433 0 17080 N/A
--- --- 236968 --- 236968 0 14188 N/A
--- --- 236501 --- 236501 0 10072 N/A
--- --- 234052 --- 234052 0 13820 N/A
--- --- 228701 --- 228701 5 14144 N/A
--- --- 226647 --- 226647 2 12808 N/A
--- --- 223370 --- 223370 0 13732 N/A
--- --- 221320 --- 221320 0 13996 N/A
--- --- 220311 --- 220311 0 16796 N/A
--- --- 212580 --- 212580 0 17520 N/A
--- --- 212233 --- 212233 0 16016 N/A
--- --- 207729 --- 207729 1 17788 N/A
--- --- 201294 --- 201294 0 16948 N/A
--- --- 195743 --- 195743 2 17512 N/A
--- --- 193448 --- 193448 0 16868 N/A
--- --- 193113 --- 193113 0 17840 N/A
--- --- 191328 --- 191328 0 17384 N/A
--- --- 188442 --- 188442 1 18592 N/A
--- --- 187780 --- 187780 0 14544 N/A
 

SoR

Registered
Apr 17, 2008
1
0
51
Hello,

Yes, we are using mod_fcgi. Here is the output od your command:
lveinfo --by-fault=mem --period=2d

ID aCPU mCPU lCPU aEP mEP lEP aMem mMem lMem MemF MepF
502 0 1 12 0 2 20 466K 1.0G 1.0G 19 0
 

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
Now I have error again !

problem is with user: "smart". Website looks like: www.smarthost.pl

and You test when there is an error:

[email protected] [/home/wojtek]# lveinfo --by-fault=mem --period=2d
ID aCPU mCPU lCPU aEP mEP lEP aMem mMem lMem MemF MepF
502 0 1 12 0 2 20 658K 1.0G 1.0G 21 0

[email protected] [/home/wojtek]# lveps -p
ID EP PNO PID TNO TID CPU MEM I/O
bofh 0 4 --- 4 --- 80 131240 0
--- --- 253180 --- 253180 0 6472 N/A
--- --- 232341 --- 232341 0 6780 N/A
--- --- 222889 --- 222889 0 6464 N/A
--- --- 199842 --- 199842 0 6416 N/A
daniel 0 1 --- 1 --- 12 53276 0
--- --- 252321 --- 252321 0 19344 N/A
smart 0 27 --- 27 --- 3501001692 0
--- --- 267769 --- 267769 0 13580 N/A
--- --- 260328 --- 260328 0 13112 N/A
--- --- 255369 --- 255369 2 12988 N/A
--- --- 255216 --- 255216 1 16848 N/A
--- --- 252737 --- 252737 0 14252 N/A
--- --- 251575 --- 251575 3 13064 N/A
--- --- 245895 --- 245895 5 14932 N/A
--- --- 239140 --- 239140 1 14704 N/A
--- --- 237433 --- 237433 0 17080 N/A
--- --- 236968 --- 236968 0 14188 N/A
--- --- 236501 --- 236501 0 10072 N/A
--- --- 234052 --- 234052 0 13820 N/A
--- --- 228701 --- 228701 5 14144 N/A
--- --- 226647 --- 226647 2 12808 N/A
--- --- 223370 --- 223370 0 13732 N/A
--- --- 221320 --- 221320 0 13996 N/A
--- --- 220311 --- 220311 0 16796 N/A
--- --- 212580 --- 212580 0 17520 N/A
--- --- 212233 --- 212233 0 16016 N/A
--- --- 207729 --- 207729 1 17788 N/A
--- --- 201294 --- 201294 0 16948 N/A
--- --- 195743 --- 195743 2 17512 N/A
--- --- 193448 --- 193448 0 16868 N/A
--- --- 193113 --- 193113 0 17840 N/A
--- --- 191328 --- 191328 0 17384 N/A
--- --- 188442 --- 188442 1 18592 N/A
--- --- 187780 --- 187780 0 14544 N/A
 

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
LVE manager shows something like this:

LVE Id User Domain Concurrent Connections Processes Threads CPU % Memory
509 (?)(c) bofh bofh.smarthost.pl 0 5 5 0% 168.0M
505 (?)(c) daniel testdaniel.smarthost.pl 0 1 1 0% 52.0M
502 (?)(c) smart smarthost.pl 0 27 27 0%1001692 0K

in column %CPU shows: "0%1001692" it looks suspicus ...

Maybe the reason of apache stuck is default setting od LVE "persist" checked "on" ?
 

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
I have many servers with many account (>300-500 on each). Now I'm testing CludLinux - so on this testing server is for now only one domain ... I can't create customers account on unstable/untested system. What I would say my customer when he will see Internal Server Error on small website ? Now I do tests ... It is no production on this server.
 

bejbi

Well-Known Member
PartnerNOC
Jan 20, 2006
153
27
178
Poland
cPanel Access Level
DataCenter Provider
Yes, I'm using fastCGi compilled in apache by easycapache. It is probably not by memory limits ... when I have Internal Server Error my site was viewed by no one.

But I have answer - probably it is a bug in memory limit in CloudLinux - I'm using APC module for PHP. In logs I have (now I found):
PHP Fatal error: PHP Startup: apc_mmap: mmap failed: in Unknown on line 0
so I try to remove APC ... and I will see. But if You could try too - if APC module is not break LVE limit ? Or maybe it is a problem in APC itself ?
 

iseletsk

Well-Known Member
Verifed Vendor
It is a memory limits, as seen by this:
lveinfo --by-fault=mem --period=2d

ID aCPU mCPU lCPU aEP mEP lEP aMem mMem lMem MemF MepF
502 0 1 12 0 2 20 466K 1.0G 1.0G 19 0

And it s not a bug.

mod_fcgid by default allows 100 processes per site, which just keep accumulating. This is also seen by lveps -p -- you have bunch of them there, living, doing nothing.

Add:
FcgidMaxProcessesPerClass 8
FcgidMinProcessesPerClass 0
FcgidIdleScanInterval 60
FcgidIdleTimeout 120
FcgidProcessLifeTime 240

To your apache config -- that will optimize your mod_fcgid settings, and should prevent you from hitting memory limits for a user all the time.