SAP Knowledge Base Article - Preview

3466529 - LNX: Fail to to stop file system resource in a pacemaker cluster

Symptom

Customer has a cluster setup for SAP application. Failover happened from ASCS node to ERS node.

During the failover, the stop operation of the file system resource failed due to error "Couldn't unmount <file system name>".
The file system resides on NFS share.

  • OS log (/var/log/messages)


Apr  2 19:50:35 use1abcscs1 pacemaker-controld[2539]: error: Result of monitor operation for abc_ascs01 on use1abcscs1: Timed Out after 60s (Resource agent did not complete in time)
Apr  2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Transition 556 action 18 (abc_ascs01_monitor_120000 on use1abcscs1): expected 'ok' but got 'error'
Apr  2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr  2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting fail-count-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1
Apr  2 19:50:35 use1abcscs1 pacemaker-attrd[2537]: notice: Setting last-failure-abc_ascs01#monitor_120000[use1abcscs1]: (unset) -> 1712101835
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr  2 19:50:35 2024
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover    abc_ascs01       (                use1abcscs1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 888, saving inputs in /var/lib/pacemaker/pengine/pe-input-456.bz2
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: Unexpected result (error: Resource agent did not complete in time) was recorded for monitor of abc_ascs01 on use1abcscs1 at Apr  2 19:50:35 2024
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: warning: abc_ascs01 cannot run on use1abcscs1 due to reaching migration threshold (clean up resource to allow again)
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move       abc_fs_ascs01    ( use1abcscs1 -> use1abcers1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move       abc_vip_ascs01   ( use1abcscs1 -> use1abcers1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Recover    abc_ascs01       ( use1abcscs1 -> use1abcers1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move       abc_fs_ers11     ( use1abcers1 -> use1abcscs1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move       abc_vip_ers11    ( use1abcers1 -> use1abcscs1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Actions: Move       abc_ers11        ( use1abcers1 -> use1abcscs1 )
Apr  2 19:50:35 use1abcscs1 pacemaker-schedulerd[2538]: notice: Calculated transition 889, saving inputs in /var/lib/pacemaker/pengine/pe-input-457.bz2
Apr  2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_ascs01_stop_0 locally on use1abcscs1
Apr  2 19:50:35 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_ascs01 on use1abcscs1

 

Apr  2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_vip_ascs01 on use1abcscs1: ok
Apr  2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Initiating stop operation abc_fs_ascs01_stop_0 locally on use1abcscs1
Apr  2 19:51:22 use1abcscs1 pacemaker-controld[2539]: notice: Requesting local execution of stop operation for abc_fs_ascs01 on use1abcscs1
Apr  2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Running stop for sapnas.net.bms.com:/abcSCS01 on /usr/sap/abc/ASCS01
Apr  2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: Trying to unmount /usr/sap/abc/ASCS01
Apr  2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: ERROR: Couldn't unmount /usr/sap/abc/ASCS01; trying cleanup with TERM  

Apr  2 19:51:22 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: sending signal TERM to: abcadm   1697899       1  0 Mar30 ?        Ssl    0:57 /usr/sap/abc/ASCS01/exe/sapstartsrv pf=/sapmnt/abc/profile/abc_ASCS01_abcscs1 -D -u abcadm
Apr  2 19:51:22 use1abcscs1 SAPabc_01[1697899]: sapstartsrv stopped
Apr  2 19:51:23 use1abcscs1 systemd[1]: session-c58.scope: Succeeded.
Apr  2 19:51:23 use1abcscs1 systemd[1]: usr-sap-abc-ASCS01.mount: Succeeded.
Apr  2 19:51:23 use1abcscs1 Filesystem(abc_fs_ascs01)[1280819]: INFO: unmounted /usr/sap/abc/ASCS01 successfully
Apr  2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Result of stop operation for abc_fs_ascs01 on use1abcscs1: ok
Apr  2 19:51:23 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_fs_ascs01_start_0 on use1abcers1
Apr  2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating monitor operation abc_fs_ascs01_monitor_20000 on use1abcers1
Apr  2 19:51:24 use1abcscs1 pacemaker-controld[2539]: notice: Initiating start operation abc_vip_ascs01_start_0 on use1abcers1


The work process trace of message server and enqueue server only gives the information about the signal 2, which was sent from operating system.

  • dev_ms.old

[Thr 139997982332736] Tue Apr  2 19:51:19:931 2024
[Thr 139997982332736] MsSExit: received SIGINT (2)
[Thr 139997982332736] Server state SHUTDOWN
[Thr 139997982332736] set HTTP state to CLOSED
[Thr 139997982332736] *** HTTP port 8101 state CLOSED ***
[Thr 139997982332736] set HTTPS state to CLOSED
[Thr 139997982332736] *** HTTPS port 8401 state CLOSED ***
[Thr 139997982332736] ***LOG Q02=> MsSHalt, MSStop (Msg Server 1698347) [msxxserv.c   8450]
[Thr 139997982332736] Good Bye .....

  • dev_enqsrv.old

[Thr 139867499947840] Sat Mar 30 09:37:24 2024
[Thr 139867499947840] ***LOG GEZ=> Server start [encllog.cpp  550]
[Thr 139867499947840] Enqueue server start with instance number 01

[Thr 139867499947840] Tue Apr  2 19:51:18 2024
[Thr 139867499947840] calling doAsyncSignal ( 2 ) (SigThrDefaultHandler, 55eb82af1510)
[Thr 139867499947840] caught SIGINT or SIGQUIT (2)
[Thr 139867499947840] Process User Time: 99470 msec; Process System Time: 74450 msec
[Thr 139867499947840] stopAllThreads: stop Thread worker thread ...


Read more...

Environment

  • SAP application based on ABAP Platform (On-premise)
  • Linux pacemaker cluster
  • NFS share

Product

SAP NetWeaver Application Server for ABAP 7.52 for SAP S/4HANA

Keywords

couldn't unmount, cleanup, TERM, kill , KBA , BC-OP-LNX , Linux , BC-OP-LNX-RH , Red Hat Linux , BC-OP-LNX-SUSE , SUSE Linux , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.