SOS apache fails every few seconds

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
hello
i have few servers where i deleted domlogs folders because that was above 10-20 GB
suddenly after few hours i suddenly discovered that apache fails every few seconds

i then recreated domlogs made sure that domlogs have same permission and owner

# ll
total 76
drwxr-xr-x 15 root root 4096 Feb 5 22:00 ./
drwxr-xr-x 23 root root 4096 Jan 10 20:00 ../
drwxr-xr-x 2 root root 4096 Jan 21 00:33 bin/
drwxr-xr-x 2 root root 4096 Jan 21 01:04 cgi-bin/
drwxr-xr-x 11 root root 4096 Feb 5 22:35 conf/
drwxr-xr-x 9 root root 4096 Nov 23 14:02 conf_pre_ea3/
drwx--x--x 2 root wheel 12288 Feb 5 22:42 domlogs/
drwxr-xr-x 70 root wheel 12288 Feb 5 21:45 domlogsx/
drwxr-xr-x 4 root root 4096 Feb 4 17:06 htdocs/
drwxr-xr-x 3 root root 4096 Jan 21 00:33 icons/
drwxr-xr-x 3 root root 4096 Jan 21 00:33 include/
drwxr-xr-x 2 root root 4096 Jan 21 00:48 libexec/
drwxr-xr-x 2 root root 4096 Feb 5 22:48 logs/
drwxr-xr-x 4 root root 4096 Jan 21 00:33 man/
drwxr-xr-x 2 nobody nobody 4096 Jan 21 00:33 proxy/


then restarted and problem repeated

then i commented out the
#CustomLog
#BytesLog
and restarted but same problem

i rebuilded apache and php several times with different versions and it didn't help either
i have the same problem on all the servers from which i delete domlogs

on var/log/messages nothing
on apache error log there is

[Tue Feb 4 12:16:04 2008] [notice] caught SIGTERM, shutting down

which usually apache dies after it appear
please do help me
waiting ..........
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
thank you for your reply
1-yes and their owner is root:root or root:user
2- it dies in few seconds only
3- no i didn't and dont know how either

waiting for your update
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
i saw this on one of the affected server before apache dies too

[Wed Feb 6 00:05:03 2008] [error] Bad pid (7747) in scoreboard slot 23
[Wed Feb 6 00:05:03 2008] [error] Bad pid (7809) in scoreboard slot 24
[Wed Feb 6 00:05:03 2008] [error] Bad pid (9910) in scoreboard slot 25
[Wed Feb 6 00:05:03 2008] [error] Bad pid (10105) in scoreboard slot 26
[Wed Feb 6 00:05:03 2008] [error] Bad pid (7461) in scoreboard slot 22
[Wed Feb 6 00:05:03 2008] [error] Bad pid (7747) in scoreboard slot 23
[Wed Feb 6 00:05:03 2008] [error] Bad pid (7809) in scoreboard slot 24
[Wed Feb 6 00:05:03 2008] [error] Bad pid (9910) in scoreboard slot 25
[Wed Feb 6 00:05:03 2008] [error] Bad pid (10105) in scoreboard slot 26
[Wed Feb 6 00:05:03 2008] [notice] caught SIGTERM, shutting down



now running your command
 

nyjimbo

Well-Known Member
Jan 25, 2003
1,134
1
168
New York
on apache error log there is

[Tue Feb 4 12:16:04 2008] [notice] caught SIGTERM, shutting down

which usually apache dies after it appear
please do help me
waiting ..........
Are there any lines above it that give you any more ideas?. Can you turn up the debug level on apache.conf manually to LogLevel debug and see if more info comes out ?
 

troxalias

Well-Known Member
Nov 21, 2001
96
0
306
Athens - Greece
Try to run the following command:

Code:
strace -o /tmp/lala -f /usr/local/apache/bin/httpd -DSSL
it will probably take some time until apache dies. then check the file /tmp/lala (especially the last 50-100 lines) for any files that could not be accessed either with the error ENOENT of EACC. I know it is not an easy way to debug this but it's the best i can suggest. If you can compress the file /tmp/lala and post it as an attachment would be perfect.
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
also could catch 25821 bind(16, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)


appearing everywhile but apache is not died yet
 

troxalias

Well-Known Member
Nov 21, 2001
96
0
306
Athens - Greece
also could catch 25821 bind(16, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)


appearing everywhile but apache is not died yet

Is apache already running ???? this means that a process is already bind to TCP port 443. Try
Code:
lsof -i tcp |grep https
to check the process that is bind to port 443.
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
now apache dies again

attached is the lala file

at the time oif apache failure this reappaeared in error log

[Wed Feb 6 00:10:07 2008] [error] Bad pid (28416) in scoreboard slot 16
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22008) in scoreboard slot 18
[Wed Feb 6 00:10:07 2008] [error] Bad pid (28574) in scoreboard slot 19
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22136) in scoreboard slot 21
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22518) in scoreboard slot 22
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23590) in scoreboard slot 23
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23634) in scoreboard slot 25
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23664) in scoreboard slot 26
[Wed Feb 6 00:10:07 2008] [error] Bad pid (28416) in scoreboard slot 16
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22008) in scoreboard slot 18
[Wed Feb 6 00:10:07 2008] [error] Bad pid (28574) in scoreboard slot 19
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22136) in scoreboard slot 21
[Wed Feb 6 00:10:07 2008] [error] Bad pid (22518) in scoreboard slot 22
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23590) in scoreboard slot 23
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23634) in scoreboard slot 25
[Wed Feb 6 00:10:07 2008] [error] Bad pid (23664) in scoreboard slot 26
[Wed Feb 6 00:10:07 2008] [notice] caught SIGTERM, shutting down
 

Attachments

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
thanks for your interest in helping me

while httpd was down there were nothing listening to the https port

but after i restarted it , its only https

httpd 27962 root 16u IPv4 29004321 TCP *:https (LISTEN)
httpd 28034 nobody 16u IPv4 29004321 TCP *:https (LISTEN)
httpd 28035 nobody 16u IPv4 29004321 TCP *:https (LISTEN)
httpd 28036 nobody 16u IPv4 29004321 TCP *:https (LISTEN)
httpd 28037 nobody 16u IPv4 29004321 TCP *:https (LISTEN)
httpd 28038 nobody 16u IPv4 29004321 TCP *:https (LISTEN)


also apache dies only after [Wed Feb 6 00:10:07 2008] [notice] caught SIGTERM, shutting down
however the error of listening on ssl port happens but it doesn't kill apache
 

troxalias

Well-Known Member
Nov 21, 2001
96
0
306
Athens - Greece
Can you please post the file again but run it with the command (please make sure that no apache processes are still running, if any, and post also the results of lsof i ask ed you befora:

Code:
strace -s 512 -o /tmp/lala -f /usr/local/apache/bin/httpd -DSSL
I would also suggest you to disable any eaccelarator entry in /usr/local/lib/php.ini file before you try to start apache.
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
died again

this time i see only


[Wed Feb 6 00:35:03 2008] [notice] caught SIGTERM, shutting down

i commented out all php extensions
in error log 200+ mb this time so here are the last 100 lines of it
11380 waitpid(13539, 0xbf85a0e0, WNOHANG) = 0
11380 select(0, NULL, NULL, NULL, {0, 65536} <unfinished ...>
13539 exit_group(0) = ?
11738 fcntl64(6, F_SETFL, O_RDWR <unfinished ...>
11785 close(20 <unfinished ...>
11380 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
11380 --- SIGCHLD (Child exited) @ 0 (0) ---
11380 select(0, NULL, NULL, NULL, {0, 61000} <unfinished ...>
11738 <... fcntl64 resumed> ) = 0
11738 setsockopt(6, SOL_SOCKET, SO_SNDTIMEO, "\2003\341\1\0\0\0\0", 8) = 0
11738 write(6, "\1\0\0\0\1", 5) = 5
11738 shutdown(6, 2 /* send and receive */) = 0
11738 close(6) = 0
11738 brk(0x82e9000) = 0x82e9000
11738 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_IGN}, 8) = 0
12231 close(20 <unfinished ...>
11785 <... close resumed> ) = 0
11738 close(20) = 0
11785 exit_group(0) = ?
11738 exit_group(0) = ?
11380 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
12231 <... close resumed> ) = 0
11380 --- SIGCHLD (Child exited) @ 0 (0) ---
12231 exit_group(0) = ?
11380 select(0, NULL, NULL, NULL, {0, 51000}) = ? ERESTARTNOHAND (To be restarted)
11380 --- SIGCHLD (Child exited) @ 0 (0) ---
11380 select(0, NULL, NULL, NULL, {0, 51000}) = 0 (Timeout)
11380 waitpid(11736, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11736
11380 waitpid(11738, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11738
11380 waitpid(11739, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11739
11380 waitpid(11784, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11784
11380 waitpid(11785, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11785
11380 waitpid(11790, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 11790
11380 waitpid(12031, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12031
11380 waitpid(12068, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12068
11380 waitpid(12123, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12123
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (12228) in scoreboard slot 29\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (12229) in scoreboard slot 30\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (12230) in scoreboard slot 31\n", 73) = 73
11380 waitpid(12231, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 12231
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13396) in scoreboard slot 33\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13425) in scoreboard slot 35\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13477) in scoreboard slot 36\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13478) in scoreboard slot 37\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13479) in scoreboard slot 38\n", 73) = 73
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [error] Bad pid (13480) in scoreboard slot 39\n", 73) = 73
11380 waitpid(13539, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 13539
11380 unlink("/usr/local/apache/logs/httpd.pid") = 0
11380 time(NULL) = 1202251204
11380 write(15, "[Wed Feb 6 00:40:04 2008] [notice] caught SIGTERM, shutting down\n", 66) = 66
11380 close(15) = 0
11380 munmap(0xb7f4e000, 4096) = 0
11380 semctl(1474574, 0, IPC_64|IPC_RMID, 0xbf859ff4) = 0
11380 close(7) = 0
11380 munmap(0xb7671000, 4096) = 0
11380 close(4) = 0
11380 munmap(0xb7f48000, 4096) = 0
11380 close(5) = 0
11380 munmap(0xb7f4d000, 4096) = 0
11380 close(19) = 0
11380 close(18) = 0
11380 unlink("/usr/local/apache/logs/ssl_scache.dir") = 0
11380 unlink("/usr/local/apache/logs/ssl_scache.pag") = 0
11380 unlink("/usr/local/apache/logs/ssl_scache.dir") = -1 ENOENT (No such file or directory)
11380 unlink("/usr/local/apache/logs/ssl_scache.pag") = -1 ENOENT (No such file or directory)
11380 unlink("/usr/local/apache/logs/ssl_scache.db") = -1 ENOENT (No such file or directory)
11380 unlink("/usr/local/apache/logs/ssl_scache") = -1 ENOENT (No such file or directory)
11380 unlink("/usr/local/apache/logs/ssl_mutex.11367") = 0
11380 close(17) = 0
11380 close(16) = 0
11380 munmap(0xb78e2000, 2998976) = 0
11380 munmap(0xb781a000, 818920) = 0
11380 munmap(0xb76e3000, 1273472) = 0
11380 munmap(0xb76a9000, 235120) = 0
11380 munmap(0xb7679000, 196352) = 0
11380 munmap(0xb7672000, 26200) = 0
11380 munmap(0xb7662000, 33620) = 0
11380 munmap(0xb75f7000, 437856) = 0
11380 munmap(0xb7509000, 58244) = 0
11380 munmap(0xb7470000, 55444) = 0
11380 munmap(0xb7518000, 911768) = 0
11380 munmap(0xb74e5000, 145732) = 0
11380 munmap(0xb74c7000, 120820) = 0
11380 munmap(0xb74b0000, 92328) = 0
11380 munmap(0xb74a6000, 37340) = 0
11380 munmap(0xb7496000, 61720) = 0
11380 munmap(0xb7bbf000, 8092) = 0
11380 munmap(0xb7f49000, 7932) = 0
11380 munmap(0xb7f4b000, 7964) = 0
11380 waitpid(11692, NULL, WNOHANG) = 11692
11380 exit_group(0) = ?
 

troxalias

Well-Known Member
Nov 21, 2001
96
0
306
Athens - Greece
Not very helpful. If you zip the file how big is it ? Please try to run it again under strace but instead of 512 (in -s 512) try 64 to see it the trace file gets smaller. Also try to compress it with gzip -9 in order to get maximum compression.
 

troxalias

Well-Known Member
Nov 21, 2001
96
0
306
Athens - Greece
That's not good. One last try to see if the file gets smaller:
Code:
strace -s 64 -o /tmp/lala -f /usr/local/apache/bin/httpd -F -DSSL

If the file is huge again I have to try to guess...
1. Disable any php modules in your configuration file.
2. Disable any other loaded modules (one at a time).
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
i believe it may be all because of the domlogs folder that i deleted please help
 

s_2_s

Well-Known Member
Aug 9, 2004
215
0
166
problem fixed by forcing apache to listen to the ip only and not on 0.0.0.0.0
for an unknown reason apache was competing with its own processes or kinda