'Why is solaris smf giving "Method or service exit timed out" even when the method exits with status 0

I am really new to Solaris SMF and was writing an SMF for Weblogic Nodemanager. I followed the steps from : http://www.camelrichard.org/controlling-weblogic-node-manager-solaris-smf-non-root

To test if SMF is restarting the service when it gets killed, I am sending it a kill signal from another terminal, but it does not restart. This is what the log file says:

[ Nov 19 10:17:39 Stopping because process received fatal signal from outside the service. ]
Killed
+ set +x
[ Nov 19 10:17:39 Executing stop method ("/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh") ]
Trying to find the PID of the nodeManager process
Cannot find the PID, NodeManager is not running - cannot kill
[ Nov 19 10:17:39 Method "stop" exited with status 0 ]
[ Nov 19 10:18:40 Method or service exit timed out.  Killing contract 100 ]

What I do not get is if you look at the last two lines, the first one says the method exited, while the second one says the method timed out. I find that weird. Anyone knows whats going on here? relevant parts of the smf are below:

<service_bundle type='manifest' name='nodemanager'>
<service name='application/management/nodemanager' type='service' version='1'>

   <single_instance />

   <dependency
      name='multi-user-server'
      grouping='require_all'
      restart_on='error'
      type='service'>
      <service_fmri value='svc:/milestone/multi-user-server' />
   </dependency>

   <exec_method
      type='method'
      name='start'
      exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh'
      timeout_seconds='120' >
<!-- Trying as root for now :
      <method_context>
         <method_credential user='weblogic' group='weblogic' />
      </method_context>
-->
   </exec_method>

 <exec_method
  type='method'
  name='stop'
  exec='/usr/local/Oracle/Middleware/wlserver_10.3/server/bin/killNodeManager.sh'
  timeout_seconds='60' />


Solution 1:[1]

The reason for first message:

Method "stop" exited with status 0

This is because of what executes in your method script which will be in usr/local/Oracle/Middleware/wlserver_10.3/server/bin/startNodeManager2.sh

Normally methods that work will return the macro SMF_EXIT_OK I am not sure about why the second message shows up. Must be something related to the 'killNodeManager.sh'

Solution 2:[2]

IT looks like killNodeManager.sh had an internal error. It could not find the PID of the process it is supposed to stop. So, it exited very quickly, within 1 second from its start. However, the contract #100 associated with this service was still active from SMF perspective. After 60 seconds allocated for the "stop" method execution, SMF saw that the contract is still up and it had no other option but to attempt killing the whole contract. It legitimately assumed that the "stop" method did not do its job. Hence, you see the last message in the log and the service goes into maintenance mode after killing the contract.

Hope this helps!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tomkaith13
Solution 2 evolvah