monitoring service with chkservd

hgbas

Registered
Dec 23, 2011
3
0
51
cPanel Access Level
Root Administrator
Hello,

I am trying to get chkservd to monitor a service called openfire, but it continually shows it as down and restarts it.

Here's my running process:
Code:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
daemon    5787  1.2  8.5 370152 67132 ?        Sl   08:31   0:12 /opt/openfire/jre/bin/java -server -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib -classpath /opt/openfire/lib/startup.jar -jar /opt/openfire/lib/startup.jar
Cpanel version:
Code:
11.30.5.3
And chkservd setting (service based):
Code:
/etc/chkserv.d/openfire
service[openfire]=x,x,x,/etc/init.d/openfire restart,java,daemon
Running PID:
Code:
cat /var/run/openfire.pid 
5787
checkservd status?
Code:
cat /var/run/chkservd/openfire 
+
What am I doing wrong?

Thanks
 

cPanelDavidG

Technical Product Specialist
Nov 29, 2006
11,216
11
313
Houston, TX
cPanel Access Level
Root Administrator
Hello,

I am trying to get chkservd to monitor a service called openfire, but it continually shows it as down and restarts it.

Here's my running process:
Code:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
daemon    5787  1.2  8.5 370152 67132 ?        Sl   08:31   0:12 /opt/openfire/jre/bin/java -server -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib -classpath /opt/openfire/lib/startup.jar -jar /opt/openfire/lib/startup.jar
Cpanel version:
Code:
11.30.5.3
And chkservd setting (service based):
Code:
/etc/chkserv.d/openfire
service[openfire]=x,x,x,/etc/init.d/openfire restart,java,daemon
Running PID:
Code:
cat /var/run/openfire.pid 
5787
checkservd status?
Code:
cat /var/run/chkservd/openfire 
+
What am I doing wrong?

Thanks
If it's restarting it, then the restart command is correct.

Is the process name in ps: java (case sensitive)?

Is it running as user: daemon (case sensitive)?
 

cPanelDavidG

Technical Product Specialist
Nov 29, 2006
11,216
11
313
Houston, TX
cPanel Access Level
Root Administrator
Okay, the user is correct, but that's a long process name. Not sure on what to put in here myself. Typically chkservd monitors executables rather than interpreted scripts/bytecode.
 

hgbas

Registered
Dec 23, 2011
3
0
51
cPanel Access Level
Root Administrator
Hello David,

Thanks for reply.

Yes, it does restart, but that's the problem :) It keeps restarting over & over each 8m interval (chkservd monitoring), and I get the - fail:

Code:
..openfire [[check command:-][tcp connect:N/A][fail count:1]Restarting openfire....
So, I have to disable it until I can figure it out.

I checked the "sub _servicecmdcheck" call in /usr/local/cpanel/Cpanel/TailWatch/ChkServd.pm, and I'm not entirely certain what exactly is returned into my @RUN = from the `ps` + grep.

I tried enabling --debug in tailwatchd, but it did not shed any more light why it is failing.

Here's what I get from abbreviated
Code:
ps -U daemon
  PID TTY          TIME CMD
 5787 ?        00:00:24 java
ps xuww -U daemon returns the full CMD as given previously + a bunch of normal processes owned by root, so the list is large.

Any help or suggestions would be appreciated.
 

cPanelDavidG

Technical Product Specialist
Nov 29, 2006
11,216
11
313
Houston, TX
cPanel Access Level
Root Administrator
Thinking on this with a fresh mind (yay holidays), I would say just try "java" as the command to look for. Listing all the parameters sent to the command isn't something we do, you can just look at our own stuff in /etc/chkserv.d/ - the best (and perhaps most complex) example is ftpd.

Unfortunately, by the nature of running Java apps this way, this also means essentially any Java app running will trigger a false positive.
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
38
248
somewhere over the rainbow
cPanel Access Level
Root Administrator
I just installed openfire onto my machine and set it up for monitoring. It isn't failing on service checks. Now, I have my process running as root user with /usr/local/jdk/bin/java showing for the process:

root 9250 1.2 3.4 867076 72064 pts/0 Sl 06:54 0:05 /usr/local/jdk/bin/java -server -Dinstall4j.jvmDir=/usr/local/jdk -Dexe4j.moduleName=/opt/openfire/bin/openfire -classpath /opt/openfire/.install4j/i4jruntime.jar:/opt/openfire/lib/activation.jar:/opt/openfire/lib/bouncycastle.jar:/opt/openfire/lib/jdic.jar:/opt/openfire/lib/mail.jar:/opt/openfire/lib/startup.jar com.install4j.runtime.Launcher start org.jivesoftware.openfire.starter.ServerStarter false false /opt/openfire/bin/../logs/stderror.log /opt/openfire/bin/../logs/stdoutt.log true true false true true 0 0 20 20 Arial 0,0,0 8 500 version 3.6.4 20 40 Arial 0,0,0 8 500 -1 -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib
Here are the steps I used to add monitoring for it:

1. Created /etc/chkserv.d/openfire file:

Code:
echo "service[openfire]=x,x,x,/etc/init.d/openfire restart,/usr/local/jdk/bin/java,root" > /etc/chkserv.d/openfire
2. Added monitoring for openfire in chkservd:

Code:
vi /etc/chkserv.d/chkservd.conf
Put this line alphabetically into the file:

Code:
openfire:1
Saved the file :)wq)

3. Added a file to have monitoring up for openfire:

Code:
echo "+" > /var/run/chkservd/openfire
4. Restarted chkservd

Code:
/scripts/restartsrv_chkservd
At that point, I tailed /var/log/chkservd.log to see the results for 10 minutes. I have had no failures:

Code:
tail -fn0 /var/log/chkservd.log
Here are my success results:

Loading services .....clamd....cpanellogd....cpsrvd....exim....httpd....imap....mysql....named....openfire....queueprocd....spamd....sshd....syslogd..Done
[2011-12-27 06:59:17 -0800] Service check ....syslogd [[check command:+][tcp connect:N/A]]...sshd [[check command:+][tcp connect:N/A]]...spamd [[check command:+][tcp connect:N/A]]...queueprocd [[check command:+][tcp connect:N/A]]...openfire [[check command:+][tcp connect:N/A]]...named [[check command:+][tcp connect:N/A]]...mysql [[check command:+][tcp connect:N/A]]...lfd [[check command:N/A][tcp connect:N/A]]...imap [[socket_service_auth:1][check command:+][tcp connect:+]]...httpd [[check command:N/A][tcp connect:+]]...exim [[check command:+][tcp connect:+]]...entropychat [[check command:N/A][tcp connect:N/A]]...dnsadmin [[check command:N/A][tcp connect:N/A]]...cpsrvd [[http_service_auth:1][check command:N/A][tcp connect:+]]...cpanellogd [[check command:+][tcp connect:N/A]]...clamd [[check command:+][tcp connect:N/A]]...Done
Service Check Finished
Service Check Started
Loading services .....clamd....cpanellogd....cpsrvd....exim....httpd....imap....mysql....named....openfire....queueprocd....spamd....sshd....syslogd..Done
[2011-12-27 07:04:18 -0800] Service check ....syslogd [[check command:+][tcp connect:N/A]]...sshd [[check command:+][tcp connect:N/A]]...spamd [[check command:+][tcp connect:N/A]]...queueprocd [[check command:+][tcp connect:N/A]]...openfire [[check command:+][tcp connect:N/A]]...named [[check command:+][tcp connect:N/A]]...mysql [[check command:+][tcp connect:N/A]]...lfd [[check command:N/A][tcp connect:N/A]]...imap [[socket_service_auth:1][check command:+][tcp connect:+]]...httpd [[check command:N/A][tcp connect:+]]...exim [[check command:+][tcp connect:+]]...entropychat [[check command:N/A][tcp connect:N/A]]...dnsadmin [[check command:N/A][tcp connect:N/A]]...cpsrvd [[http_service_auth:1][check command:N/A][tcp connect:+]]...cpanellogd [[check command:+][tcp connect:N/A]]...clamd [[check command:+][tcp connect:N/A]]...Done
Service Check Finished
 

hgbas

Registered
Dec 23, 2011
3
0
51
cPanel Access Level
Root Administrator
Hello,

Thanks so much for the great help.
I had already tried many iterations of CMD & USER to no avail, but not like your last reply with the full path to the binary.
I will give it a shot and let you know.
 

PbG

Well-Known Member
Mar 11, 2003
247
0
166
I successfully added nagios to chkservd absent this instruction which I do not understand. Please clarify where you added said file and what it is titled?

3. Added a file to have monitoring up for openfire:

Code:
echo "+" > /var/run/chkservd/openfire
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
38
248
somewhere over the rainbow
cPanel Access Level
Root Administrator
Hello,

The /var/run/chkservd files are named after the services being monitored and in this instance openfire was the name of the service being created. The file contents will have either - or + based on whether they are seen to be up (+) or down (-) on service checks. If they are up, they will have + and will appear as up in WHM > Server Status > Service Status area. You would want to call your service nagios if that's what you've named it in /etc/chkserv.d/chkservd.conf file.

You are very welcome!
 

jimpic

Member
Aug 23, 2011
7
0
51
cPanel Access Level
Root Administrator
This topic is now one year old but it's precisely what I am trying to do: monitor openfire with chkservd.
Openfire is running well on the server but when I do what cpaneltristan wrote in a post above, openfire stops and restarts every 5 minutes (at every check in fact). I must have done something wrong.
So I disabled openfire monitoring.
Any clue on this matter?