SAP Knowledge Base Article - Preview

3472573 - DB6: Pacemaker can not initiate a takeover in HADR

Symptom

You have pacemaker cluster configured to manage Db2 database with HADR following note 3100330.

Pacemaker failed to initiate a takeover from primary database to standby database.

OS log on primary database (server: abc123)

2024-05-02T18:42:21.724380-04:00 abc123 pengine[5592]:  warning: Processing failed stop of rsc_Db2_db2abc_DE1:1 on abc123: unknown error
2024-05-02T18:42:21.725265-04:00 abc123 pengine[5592]:  warning: Cluster node abc123 will be fenced: rsc_Db2_db2abc_DE1:1 failed there
2024-05-02T18:42:21.725439-04:00 abc123 pengine[5592]:  warning: Processing failed monitor of stonith-sbd on abc123: unknown error
2024-05-02T18:42:21.726132-04:00 abc123 pengine[5592]:  warning: Forcing msl_Db2_db2abc_DE1 away from abc123 after 1000000 failures (max=5000)
2024-05-02T18:42:21.726543-04:00 abc123 pengine[5592]:  warning: Forcing msl_Db2_db2abc_DE1 away from abc123 after 1000000 failures (max=5000)
2024-05-02T18:42:21.727113-04:00 abc123 pengine[5592]:  warning: Scheduling Node abc123 for STONITH
2024-05-02T18:42:21.727535-04:00 abc123 pengine[5592]:   notice: Stop of failed resource rsc_Db2_db2abc_DE1:1 is implicit after abc123 is fenced
2024-05-02T18:42:21.727827-04:00 abc123 pengine[5592]:   notice:  * Fence (reboot) abc123 'rsc_Db2_db2abc_DE1:1 failed there'
2024-05-02T18:42:21.727988-04:00 abc123 pengine[5592]:   notice:  * Move       stonith-sbd              ( abc123 -> abc456 )
2024-05-02T18:42:21.728150-04:00 abc123 pengine[5592]:   notice:  * Stop       rsc_azure-events:1       (                 abc123 )   due to node availability
2024-05-02T18:42:21.728313-04:00 abc123 pengine[5592]:   notice:  * Start      rsc_ip_db2abc_DE1        (                 abc456 )
2024-05-02T18:42:21.728473-04:00 abc123 pengine[5592]:   notice:  * Start      rsc_nc_db2abc_DE1        (                 abc456 )
2024-05-02T18:42:21.728634-04:00 abc123 pengine[5592]:   notice:  * Promote    rsc_Db2_db2abc_DE1:0     ( Slave -> Master abc456 )
2024-05-02T18:42:21.728801-04:00 abc123 pengine[5592]:   notice:  * Stop       rsc_Db2_db2abc_DE1:1     (          Master abc123 )   due to node availability

2024-05-02T18:43:08.034084-04:00 abc123 kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.12.14-122.194-default root=UUID=9d66c1a0-22a5-4e1c-b073-fa2ae28aa46d USE_BY_UUID_DEVICE_NAMES=1 earlyprintk=ttyS0 console=ttyS0 rootdelay=300 multipath=off net.ifnames=0 dis_ucode_ldr scsi_mod.use_blk_mq=1 root=UUID=9d66c1a0-22a5-4e1c-b073-fa2ae28aa46d rw audit=1

2024-05-02T18:45:21.933984-04:00 abc123 kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.12.14-122.194-default root=UUID=9d66c1a0-22a5-4e1c-b073-fa2ae28aa46d USE_BY_UUID_DEVICE_NAMES=1 earlyprintk=ttyS0 console=ttyS0 rootdelay=300 multipath=off net.ifnames=0 dis_ucode_ldr scsi_mod.use_blk_mq=1 root=UUID=9d66c1a0-22a5-4e1c-b073-fa2ae28aa46d rw audit=1


db2diag.log on standby database (server: abc456)

2024-05-02-18.42.09.192123-240 I66102483E507         LEVEL: Error
PID     : 42123                TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1     
HOSTNAME: abc456
EDUID   : 193                  EDUNAME: db2hadrs.0.0 (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
MESSAGE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection refused"
DATA #1 : <preformatted>
Failed to connect to primary.

2024-05-02-18.43.39.150054-240 E66102991E652         LEVEL: Error
PID     : 42123                TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1     
HOSTNAME: abc456
EDUID   : 193                  EDUNAME: db2hadrs.0.0 (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20200
MESSAGE : Did not receive anything through HADR connection for the duration of 
          HADR_TIMEOUT. Closing connection.
DATA #1 : String, 30 bytes
hdrCurrentTime/hdrLastRecvTime
DATA #2 : unsigned integer, 4 bytes
1714689819
DATA #3 : unsigned integer, 4 bytes
1714689758

2024-05-02-18.46.58.956798-240 E66104297E501         LEVEL: Event
PID     : 42123                TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1
APPHDL  : 0-36414              APPID: *LOCAL.db2DE1.240502224658
AUTHID  : DB2DE1               HOSTNAME: abc456
EDUID   : 248                  EDUNAME: db2agent (DE1) 0
FUNCTION: DB2 UDB, base sys utilities, sqeDBMgr::StartUsingLocalDatabase, probe:13
START   : Received TAKEOVER HADR command.

2024-05-02-18.46.59.030866-240 I66104799E427         LEVEL: Warning
PID     : 42123                TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1     
HOSTNAME: abc456
EDUID   : 193                  EDUNAME: db2hadrs.0.0 (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20202
MESSAGE : Peer window ends. Peer window expired.

2024-05-02-18.46.59.119853-240 E66105227E462         LEVEL: Event
PID     : 42123                TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1     
HOSTNAME: abc456
EDUID   : 193                  EDUNAME: db2hadrs.0.0 (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE  : HADR state set to HDR_S_REM_CATCHUP_PENDING (was HDR_S_DISCONN_PEER), connId=6

2024-05-02-18.46.59.292989-240 I66105690E857         LEVEL: Warning
PID     : 42123                TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1
APPHDL  : 0-36414              APPID: *LOCAL.db2DE1.240502224658
AUTHID  : DB2DE1               HOSTNAME: abc456
EDUID   : 248                  EDUNAME: db2agent (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrValidateTakeoverRequest, probe:52050
MESSAGE : ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
          "Forced takeover rejected as standby is in the wrong state or peer window has expired"
DATA #1 : <preformatted>
HADR standby not ready for takeover.
   Current HADR state: HDR_S_REM_CATCHUP_PENDING
   Light scan status : Inactive
   Peer Window End   : 1714690018
   Current Time      : 1714690019

2024-05-02-18.46.59.342053-240 I66106548E700         LEVEL: Error
PID     : 42123                TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2DE1               NODE : 000            DB   : DE1
APPHDL  : 0-36414              APPID: *LOCAL.db2DE1.240502224658
AUTHID  : DB2DE1               HOSTNAME: abc456
EDUID   : 248                  EDUNAME: db2agent (DE1) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrRequestTakeover, probe:39999
MESSAGE : ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
          "Forced takeover rejected as standby is in the wrong state or peer window has expired"
DATA #1 : String, 36 bytes
HADR takeover pre-validation failed.


Read more...

Environment

  • Linux server with Pacemaker cluster configured.
  • IBM Db2 11.5.6 or higher

Product

SAP ERP 6.0

Keywords

pacemaker, cluster, HADR, takeover, peer window, expired , KBA , BC-DB-DB6 , DB2 Universal Database for Unix / NT , BC-OP-LNX , Linux , BC-OP-LNX-RH , Red Hat Linux , BC-OP-LNX-SUSE , SUSE Linux , How To

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.