Symptom
Customer has a cluster setup for SAP application. Failover happened from ASCS node to ERS node.
During the failover, the stop operation of the file system resource failed due to error "Couldn't unmount <file system name>".
The file system resides on NFS share.
- OS log (/var/log/messages)
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: error: Result of monitor operation for abc_ascs01 on use1abcscs1: Timed Out after 60s (Resource agent did not complete in time)
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Transition 556 action 18 (abc_ascs01_monitor_120000 on use1abcscs1): expected 'ok' but got 'error'
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr 2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting fail-count-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1
Apr 2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting last-failure-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1712101835
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr 2 19:50:35 2024
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover abc_ascs01 ( use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 888, saving inputs in /var/lib/pacemaker/pengine/pe-input-456.bz2
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr 2 19:50:35 2024
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: abc_ascs01 cannot run on use1abcscs1 due to reaching migration threshold (clean up resource to allow again)
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_fs_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_vip_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover abc_ascs01 ( use1abcscs1 -> use1abcers1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_fs_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_vip_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move abc_ers11 ( use1abcers1 -> use1abcscs1 )
Apr 2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 889, saving inputs in /var/lib/pacemaker/pengine/pe-input-457.bz2
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_ascs01_stop_0 locally on use1abcscs1
Apr 2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_ascs01 on use1abcscs1
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_vip_ascs01 on use1abcscs1: ok
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_fs_ascs01_stop_0 locally on use1abcscs1
Apr 2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_fs_ascs01 on use1abcscs1
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Running stop for sapnas.net.bms.com:/abcSCS01 on /usr/sap/abc/ASCS01
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Trying to unmount /usr/sap/abc/ASCS01
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: ERROR: Couldn't unmount /usr/sap/abc/ASCS01; trying cleanup with TERM
Apr 2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: sending signal TERM to: abcadm 1697899 1 0 Mar30 ? Ssl 0:57 /usr/sap/abc/ASCS01/exe/sapstartsrv pf=/sapmnt/abc/profile/abc_ASCS01_abcscs1 -D -u abcadm
Apr 2 19:51:22 use1abcscs1 SAPabc_01[1697899]: sapstartsrv stopped
Apr 2 19:51:23 use1abcscs1 systemd[1]: session-c58.scope: Succeeded.
Apr 2 19:51:23 use1abcscs1 systemd[1]: usr-sap-abc-ASCS01.mount: Succeeded.
Apr 2 19:51:23 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: unmounted /usr/sap/abc/ASCS01 successfully
Apr 2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_fs_ascs01 on use1abcscs1: ok
Apr 2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_fs_ascs01_start_0 on use1abcers1
Apr 2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating monitor operation abc_fs_ascs01_monitor_20000 on use1abcers1
Apr 2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_vip_ascs01_start_0 on use1abcers1
The work process trace of message server and enqueue server only gives the information about the signal 2, which was sent from operating system.
- dev_ms.old
[Thr 139997982332736] Tue Apr 2 19:51:19:931 2024
[Thr 139997982332736] MsSExit: received SIGINT (2)
[Thr 139997982332736] Server state SHUTDOWN
[Thr 139997982332736] set HTTP state to CLOSED
[Thr 139997982332736] *** HTTP port 8101 state CLOSED ***
[Thr 139997982332736] set HTTPS state to CLOSED
[Thr 139997982332736] *** HTTPS port 8401 state CLOSED ***
[Thr 139997982332736] ***LOG Q02=> MsSHalt, MSStop (Msg Server 1698347) [msxxserv.c 8450]
[Thr 139997982332736] Good Bye .....
- dev_enqsrv.old
[Thr 139867499947840] Sat Mar 30 09:37:24 2024
[Thr 139867499947840] ***LOG GEZ=> Server start [encllog.cpp 550]
[Thr 139867499947840] Enqueue server start with instance number 01
[Thr 139867499947840] Tue Apr 2 19:51:18 2024
[Thr 139867499947840] calling doAsyncSignal ( 2 ) (SigThrDefaultHandler, 55eb82af1510)
[Thr 139867499947840] caught SIGINT or SIGQUIT (2)
[Thr 139867499947840] Process User Time: 99470 msec; Process System Time: 74450 msec
[Thr 139867499947840] stopAllThreads: stop Thread worker thread ...
Read more...
Environment
- SAP application based on ABAP Platform (On-premise)
- Linux pacemaker cluster
- NFS share
Product
Keywords
couldn't unmount, cleanup, TERM, kill , KBA , BC-OP-LNX , Linux , BC-OP-LNX-SUSE , SUSE Linux , BC-OP-LNX-RH , Red Hat Linux , Problem
About this page
This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).Search for additional results
Visit SAP Support Portal's SAP Notes and KBA Search.
SAP Knowledge Base Article - Preview