Symptom
Customer has a cluster setup for SAP application. Failover happened from ASCS node to ERS node.
During the failover, the stop operation of the file system resource failed due to error "Couldn't unmount <file system name>".
The file system resides on NFS share.
- OS log (/var/log/messages)
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: error: Result of monitor operation for abc_ascs01 on use1abcscs1: Timed Out after 60s (Resource agent did not complete in time)
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Transition 556 action 18 (abc_ascs01_monitor_120000 on use1abcscs1): expected 'ok' but got 'error'
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr 2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting fail-count-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1
Apr 2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting last-failure-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1712101835
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr 2 19:50:35 2024
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover abc_ascs01 ( use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 888, saving inputs in /var/lib/pacemaker/pengine/pe-input-456.bz2
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr 2 19:50:35 2024
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: abc_ascs01 cannot run on use1abcscs1 due to reaching migration threshold (clean up resource to allow again)
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_fs_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_vip_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover abc_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_fs_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_vip_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 889, saving inputs in /var/lib/pacemaker/pengine/pe-input-457.bz2
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_ascs01_stop_0 locally on use1abcscs1
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_ascs01 on use1abcscs1
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_vip_ascs01 on use1abcscs1: ok
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_fs_ascs01_stop_0 locally on use1abcscs1
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_fs_ascs01 on use1abcscs1
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Running stop for sapnas.net.bms.com:/abcSCS01 on /usr/sap/abc/ASCS01
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Trying to unmount /usr/sap/abc/ASCS01
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: ERROR: Couldn't unmount /usr/sap/abc/ASCS01; trying cleanup with TERM
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: sending signal TERM to: abcadm 1697899 1 0 Mar30 ? Ssl 0:57 /usr/sap/abc/ASCS01/exe/sapstartsrv pf=/sapmnt/abc/profile/abc_ASCS01_abcscs1 -D -u abcadm
Apr 2 19:51:22 use1abcscs1 SAPabc_01[1697899]: sapstartsrv stopped
Apr 2 19:51:23 use1abcscs1 systemd[1]: session-c58.scope: Succeeded.
Apr 2 19:51:23 use1abcscs1 systemd[1]: usr-sap-abc-ASCS01.mount: Succeeded.
Apr 2 19:51:23 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: unmounted /usr/sap/abc/ASCS01 successfully
Apr 2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_fs_ascs01 on use1abcscs1: ok
Apr 2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_fs_ascs01_start_0 on use1abcers1
Apr 2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating monitor operation abc_fs_ascs01_monitor_20000 on use1abcers1
Apr 2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_vip_ascs01_start_0 on use1abcers1
The work process trace of message server and enqueue server only gives the information about the signal 2, which was sent from operating system.
- dev_ms.old
[Thr 139997982332736] Tue Apr 2 19:51:19:931 2024
[Thr 139997982332736] MsSExit: received SIGINT (2)
[Thr 139997982332736] Server state SHUTDOWN
[Thr 139997982332736] set HTTP state to CLOSED
[Thr 139997982332736] *** HTTP port 8101 state CLOSED ***
[Thr 139997982332736] set HTTPS state to CLOSED
[Thr 139997982332736] *** HTTPS port 8401 state CLOSED ***
[Thr 139997982332736] ***LOG Q02=> MsSHalt, MSStop (Msg Server 1698347) [msxxserv.c 8450]
[Thr 139997982332736] Good Bye .....
- dev_enqsrv.old
[Thr 139867499947840] Sat Mar 30 09:37:24 2024
[Thr 139867499947840] ***LOG GEZ=> Server start [encllog.cpp 550]
[Thr 139867499947840] Enqueue server start with instance number 01
[Thr 139867499947840] Tue Apr 2 19:51:18 2024
[Thr 139867499947840] calling doAsyncSignal ( 2 ) (SigThrDefaultHandler, 55eb82af1510)
[Thr 139867499947840] caught SIGINT or SIGQUIT (2)
[Thr 139867499947840] Process User Time: 99470 msec; Process System Time: 74450 msec
[Thr 139867499947840] stopAllThreads: stop Thread worker thread ...
Read more...
Environment
- SAP application based on ABAP Platform (On-premise)
- Linux pacemaker cluster
- NFS share
Product
Keywords
couldn't unmount, cleanup, TERM, kill , KBA , BC-OP-LNX , Linux , BC-OP-LNX-RH , Red Hat Linux , BC-OP-LNX-SUSE , SUSE Linux , Problem
About this page
This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).Search for additional results
Visit SAP Support Portal's SAP Notes and KBA Search.