Please whitelist cPanel in your adblocker so that you’re able to see our version release promotions, thanks!

The Community Forums

Interact with an entire community of cPanel & WHM users!

Apache restart loop & stuck PHP-FPM processes

Discussion in 'EasyApache' started by APatchworkBoy, Aug 24, 2018.

  1. APatchworkBoy

    APatchworkBoy Active Member

    Joined:
    Feb 26, 2018
    Messages:
    25
    Likes Received:
    7
    Trophy Points:
    3
    Location:
    West Yorkshire, UK
    cPanel Access Level:
    Root Administrator
    Morning folks... last night at 23:03 httpd decided to restart itself, and ended up in a loop.

    Code:
    [Fri Aug 24 07:12:57.922269 2018] [mpm_worker:notice] [pid 25315:tid 139892578699232] AH00297: SIGUSR1 received.  Doing graceful restart
    
    [Fri Aug 24 07:12:58.520698 2018] [mpm_worker:warn] [pid 25315:tid 139892578699232] AH00317: MaxRequestWorkers of 80 is not an integer multiple of ThreadsPerChild of 25, decreasing to nearest multiple 75
    
    [Fri Aug 24 07:12:58.554426 2018] [http2:info] [pid 25315:tid 139892578699232] AH03090: mod_http2 (v1.10.20, feats=CHPRIO+SHA256+INVHD+DWINS, nghttp2 1.32.0), initializing...
    
    [Fri Aug 24 07:12:58.555398 2018] [mpm_worker:notice] [pid 25315:tid 139892578699232] AH00292: Apache/2.4.34 (cPanel) OpenSSL/1.0.2o mod_bwlimited/1.4 configured -- resuming normal operations
    
    [Fri Aug 24 07:12:58.555434 2018] [core:notice] [pid 25315:tid 139892578699232] AH00094: Command line: '/usr/sbin/httpd'
    
    [Fri Aug 24 07:17:58.963914 2018] [mpm_worker:notice] [pid 25315:tid 139892578699232] AH00297: SIGUSR1 received.  Doing graceful restart
    
    [Fri Aug 24 07:17:59.469715 2018] [mpm_worker:warn] [pid 25315:tid 139892578699232] AH00317: MaxRequestWorkers of 80 is not an integer multiple of ThreadsPerChild of 25, decreasing to nearest multiple 75
    
    [Fri Aug 24 07:17:59.498680 2018] [http2:info] [pid 25315:tid 139892578699232] AH03090: mod_http2 (v1.10.20, feats=CHPRIO+SHA256+INVHD+DWINS, nghttp2 1.32.0), initializing...
    
    [Fri Aug 24 07:17:59.499476 2018] [mpm_worker:notice] [pid 25315:tid 139892578699232] AH00292: Apache/2.4.34 (cPanel) OpenSSL/1.0.2o mod_bwlimited/1.4 configured -- resuming normal operations
    
    [Fri Aug 24 07:17:59.499506 2018] [core:notice] [pid 25315:tid 139892578699232] AH00094: Command line: '/usr/sbin/httpd'
    
    [Fri Aug 24 07:23:00.332397 2018] [mpm_worker:notice] [pid 25315:tid 139892578699232] AH00297: SIGUSR1 received.  Doing graceful restart
    
    A quick ps -aux revealed a ton of php_fpm processes all for the same user. Running /script/restartsrv_apache_php_fpm suddenly allowed httpd to come up and stay up, and as expected the php_fpm processes all vanished down to the usual 4 and all has been fine since.

    A check of /opt/cpanel/ea-php71/root/usr/var/log/php-fpm/error.log shows php-fpm reaching max_children (40) at 20:41 last night, then no further log entries until I restart php-fpm this morning at 08:04 to allow httpd to recover.

    Code:
    [23-Aug-2018 19:45:48] NOTICE: [pool doma_in] child 6530 started
    [23-Aug-2018 19:49:51] NOTICE: [pool doma_in] child 6444 exited with code 0 after 369.456144 seconds from start
    [23-Aug-2018 19:49:51] NOTICE: [pool doma_in] child 6663 started
    [23-Aug-2018 19:50:48] NOTICE: [pool doma_in] child 6468 exited with code 0 after 380.832191 seconds from start
    [23-Aug-2018 19:50:48] NOTICE: [pool doma_in] child 6711 started
    [23-Aug-2018 19:51:48] NOTICE: [pool doma_in] child 6526 exited with code 0 after 386.958746 seconds from start
    [23-Aug-2018 19:51:48] NOTICE: [pool doma_in] child 6737 started
    [23-Aug-2018 19:52:18] NOTICE: [pool doma_in] child 6530 exited with code 0 after 389.997204 seconds from start
    [23-Aug-2018 19:52:18] NOTICE: [pool doma_in] child 6761 started
    [23-Aug-2018 19:56:20] NOTICE: [pool doma_in] child 6663 exited with code 0 after 388.938428 seconds from start
    [23-Aug-2018 19:56:20] NOTICE: [pool doma_in] child 6891 started
    [23-Aug-2018 19:57:11] NOTICE: [pool doma_in] child 6711 exited with code 0 after 383.051672 seconds from start
    [23-Aug-2018 19:57:11] NOTICE: [pool doma_in] child 6916 started
    [23-Aug-2018 19:58:12] NOTICE: [pool doma_in] child 6737 exited with code 0 after 384.241633 seconds from start
    [23-Aug-2018 19:58:12] NOTICE: [pool doma_in] child 6944 started
    [23-Aug-2018 19:58:48] NOTICE: [pool doma_in] child 6761 exited with code 0 after 389.961665 seconds from start
    [23-Aug-2018 19:58:48] NOTICE: [pool doma_in] child 6954 started
    [23-Aug-2018 20:02:44] NOTICE: [pool doma_in] child 6891 exited with code 0 after 383.337425 seconds from start
    [23-Aug-2018 20:02:44] NOTICE: [pool doma_in] child 7135 started
    [23-Aug-2018 20:03:20] NOTICE: [pool doma_in] child 6916 exited with code 0 after 368.832361 seconds from start
    [23-Aug-2018 20:03:20] NOTICE: [pool doma_in] child 7168 started
    [23-Aug-2018 20:04:47] NOTICE: [pool doma_in] child 6944 exited with code 0 after 395.402790 seconds from start
    [23-Aug-2018 20:04:47] NOTICE: [pool doma_in] child 7220 started
    [23-Aug-2018 20:05:11] NOTICE: [pool doma_in] child 6954 exited with code 0 after 383.537759 seconds from start
    [23-Aug-2018 20:05:11] NOTICE: [pool doma_in] child 7255 started
    [23-Aug-2018 20:08:58] NOTICE: [pool doma_in] child 7135 exited with code 0 after 374.449430 seconds from start
    [23-Aug-2018 20:08:58] NOTICE: [pool doma_in] child 7338 started
    [23-Aug-2018 20:09:48] NOTICE: [pool doma_in] child 7168 exited with code 0 after 387.326123 seconds from start
    [23-Aug-2018 20:09:48] NOTICE: [pool doma_in] child 7392 started
    [23-Aug-2018 20:11:12] NOTICE: [pool doma_in] child 7220 exited with code 0 after 384.725260 seconds from start
    [23-Aug-2018 20:11:12] NOTICE: [pool doma_in] child 7470 started
    [23-Aug-2018 20:11:40] NOTICE: [pool doma_in] child 7255 exited with code 0 after 389.052525 seconds from start
    [23-Aug-2018 20:11:40] NOTICE: [pool doma_in] child 7477 started
    [23-Aug-2018 20:15:42] NOTICE: [pool doma_in] child 7338 exited with code 0 after 404.041951 seconds from start
    [23-Aug-2018 20:15:42] NOTICE: [pool doma_in] child 7809 started
    [23-Aug-2018 20:15:50] NOTICE: [pool doma_in] child 7392 exited with code 0 after 362.688153 seconds from start
    [23-Aug-2018 20:15:50] NOTICE: [pool doma_in] child 7812 started
    [23-Aug-2018 20:17:43] NOTICE: [pool doma_in] child 7470 exited with code 0 after 390.406430 seconds from start
    [23-Aug-2018 20:17:43] NOTICE: [pool doma_in] child 7861 started
    [23-Aug-2018 20:17:58] NOTICE: [pool doma_in] child 7477 exited with code 0 after 377.616078 seconds from start
    [23-Aug-2018 20:17:58] NOTICE: [pool doma_in] child 7865 started
    [23-Aug-2018 20:21:50] NOTICE: [pool doma_in] child 7809 exited with code 0 after 368.196082 seconds from start
    [23-Aug-2018 20:21:50] NOTICE: [pool doma_in] child 7996 started
    [23-Aug-2018 20:22:21] NOTICE: [pool doma_in] child 7812 exited with code 0 after 390.710030 seconds from start
    [23-Aug-2018 20:22:21] NOTICE: [pool doma_in] child 8024 started
    [23-Aug-2018 20:24:18] NOTICE: [pool doma_in] child 7861 exited with code 0 after 395.430634 seconds from start
    [23-Aug-2018 20:24:18] NOTICE: [pool doma_in] child 8081 started
    [23-Aug-2018 20:24:21] NOTICE: [pool doma_in] child 7865 exited with code 0 after 382.774186 seconds from start
    [23-Aug-2018 20:24:21] NOTICE: [pool doma_in] child 8083 started
    [23-Aug-2018 20:28:12] NOTICE: [pool doma_in] child 7996 exited with code 0 after 381.299786 seconds from start
    [23-Aug-2018 20:28:12] NOTICE: [pool doma_in] child 8212 started
    [23-Aug-2018 20:28:53] NOTICE: [pool doma_in] child 8024 exited with code 0 after 392.200477 seconds from start
    [23-Aug-2018 20:28:53] NOTICE: [pool doma_in] child 8219 started
    [23-Aug-2018 20:30:42] NOTICE: [pool doma_in] child 8083 exited with code 0 after 380.700935 seconds from start
    [23-Aug-2018 20:30:42] NOTICE: [pool doma_in] child 8309 started
    [23-Aug-2018 20:30:48] NOTICE: [pool doma_in] child 8081 exited with code 0 after 390.045318 seconds from start
    [23-Aug-2018 20:30:48] NOTICE: [pool doma_in] child 8313 started
    [23-Aug-2018 20:41:11] WARNING: [pool doma_in] server reached max_children setting (40), consider raising it
    [24-Aug-2018 08:04:29] NOTICE: Terminating ...
    [24-Aug-2018 08:04:29] NOTICE: exiting, bye-bye!
    [24-Aug-2018 08:04:29] NOTICE: fpm is running, pid 32041
    [24-Aug-2018 08:04:29] NOTICE: ready to handle connections
    [24-Aug-2018 08:04:42] NOTICE: Terminating ...
    [24-Aug-2018 08:04:42] NOTICE: exiting, bye-bye!
    [24-Aug-2018 08:04:42] NOTICE: fpm is running, pid 32302
    [24-Aug-2018 08:04:42] NOTICE: ready to handle connections
    
    The domain_in in question's own php-fpm error log in /home/user/log/doma_in.php.error.log reveals nothing over that duration of time.

    I _should_ have straced one of the processes, but didn't. There were 3x users who were sat on that website all night (which checks for notifications and other things periodically via ajax) and it would appear something got stuck somewhere, causing more and more child processes to be created til the server hit max_children. Eventually this took httpd down and chksrvd did it's job and attempted to restart itself, but it seems at no point did it attempt to restart php_fpm alongside so nothing could come back up without manual intervention.

    The actual question: Is there anyway to modify whatever routine chksrvd uses to restart apache to also restart apache_php_fpm at the same time, whilst I examine the code on the website in question to find out the actual cause...? This would be enough to prevent the problem taking down all sites on the server. ('PHP-FPM service for Apache' is enabled and monitored in Service Manager but it would appear that if service manager finds a need to restart apache, it _only_ restarts apache, not apache_php_fpm)

    I've noticed this issue of stuck php-fpm processes a few times with this particular server and site, mainly because it's also accessed from an internal network and if our IT guys reboot our proxy server (for instance) it causes the same problem for users with this particular web application sat open, until either they close and reopen their browser to restart their _session or I manually restart apache_php_fpm.
     
  2. cPanelLauren

    cPanelLauren Forums Analyst II Staff Member

    Joined:
    Nov 14, 2017
    Messages:
    5,815
    Likes Received:
    444
    Trophy Points:
    233
    Location:
    Houston
    cPanel Access Level:
    DataCenter Provider
    Hi @APatchworkBoy
    While in this instance it looks like your issue was related to the PHP-FPM user pool being maxed out I do think there is a further issue here as it shouldn't enter into a restart loop like this. I believe that this issue warrants more in-depth investigation and because of that can you please open a ticket using the link in my signature? Once open please reply with the Ticket ID here so that we can update this thread with the resolution once the ticket is resolved.


    Thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. APatchworkBoy

    APatchworkBoy Active Member

    Joined:
    Feb 26, 2018
    Messages:
    25
    Likes Received:
    7
    Trophy Points:
    3
    Location:
    West Yorkshire, UK
    cPanel Access Level:
    Root Administrator
    Evening @cPanelLauren (UK time)... ta for that - basic investigation = I suspect the root cause was an XMLRPC attack resulting in the segfault. Mod_security _was_ blocking this at the time so not sure why it resulted in segfaults (poss linked to opcache & php-fpm in use?). The IP has since been added as a deny rule in /etc/hosts.allow - details in the ticket. Ticket ID: 10200131

    Normal day-to-day use, the server rarely gets anywhere _near_ hitting maxchildren.
     
    #3 APatchworkBoy, Aug 28, 2018
    Last edited by a moderator: Aug 28, 2018
  4. cPanelLauren

    cPanelLauren Forums Analyst II Staff Member

    Joined:
    Nov 14, 2017
    Messages:
    5,815
    Likes Received:
    444
    Trophy Points:
    233
    Location:
    Houston
    cPanel Access Level:
    DataCenter Provider
    Hi @APatchworkBoy

    I'm glad you opened a ticket I just checked in on it and it seems that you may have identified a coding issue as the cause for the problem as well?

    Thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. APatchworkBoy

    APatchworkBoy Active Member

    Joined:
    Feb 26, 2018
    Messages:
    25
    Likes Received:
    7
    Trophy Points:
    3
    Location:
    West Yorkshire, UK
    cPanel Access Level:
    Root Administrator
    Sorry @cPanelLauren - been on leave for a week...!

    Combination of issues - maxrequestworkers too low combined with a possible coding error resulting in a PHP Session race condition causing maxrequestworkers to be hit. Am waiting for recurrence to identify for certain, although I think the session race condition’s now been mitigated so may never know (nothing concrete in the logs to indicate it _was_ a race, but as I’ve said: ordinarily this box gets nowhere near hitting it’s limits). My segfault references in earlier post weren’t segfaults - misdiagnosis at my end!

    Have tweaked config according to suggestions by (the ever-wonderful) Roman and will be monitoring from tomorrow when the server will be back under it’s 'usual' workload (educational establishment - all staff & students have been off for 6 weeks so server traffic was technically at it’s lowest when all this happened).
     
  6. cPanelLauren

    cPanelLauren Forums Analyst II Staff Member

    Joined:
    Nov 14, 2017
    Messages:
    5,815
    Likes Received:
    444
    Trophy Points:
    233
    Location:
    Houston
    cPanel Access Level:
    DataCenter Provider
    Hi @APatchworkBoy

    I'm glad to hear that they were able to get you situated, it's easiest to identify an issue when you're able to access the server and see what's going in real time.

    Thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. masterross

    masterross Well-Known Member

    Joined:
    Apr 7, 2004
    Messages:
    46
    Likes Received:
    3
    Trophy Points:
    158
    Hi @APatchworkBoy and @cPanelLauren,

    Last night I had similar problem.
    Httpd fell down and restarted few times but not the php_fpm and all site using it were down.
    Can you please share your tweaks?

    Thank you!
     
  8. cPanelLauren

    cPanelLauren Forums Analyst II Staff Member

    Joined:
    Nov 14, 2017
    Messages:
    5,815
    Likes Received:
    444
    Trophy Points:
    233
    Location:
    Houston
    cPanel Access Level:
    DataCenter Provider
    Hello @masterross

    Without knowing why it failed it's hard to provide advice. If you check the error logs as listed previously in this thread and post relevant errors we might be able to assist better
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
Loading...

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice