Symptom
You have pacemaker cluster configured to manage Db2 database with HADR following note 3100330.
Pacemaker failed to initiate a takeover from primary database to standby database.
OS log on primary database (server: abc123)
2024-05-02T18:42:21.724380-04:00 abc123 pengine[5592]: warning: Processing failed stop of rsc_Db2_db2abc_ABC:1 on abc123: unknown error
2024-05-02T18:42:21.725265-04:00 abc123 pengine[5592]: warning: Cluster node abc123 will be fenced: rsc_Db2_db2abc_ABC:1 failed there
2024-05-02T18:42:21.725439-04:00 abc123 pengine[5592]: warning: Processing failed monitor of stonith-sbd on abc123: unknown error
2024-05-02T18:42:21.726132-04:00 abc123 pengine[5592]: warning: Forcing msl_Db2_db2abc_ABC away from abc123 after 1000000 failures (max=5000)
2024-05-02T18:42:21.726543-04:00 abc123 pengine[5592]: warning: Forcing msl_Db2_db2abc_ABC away from abc123 after 1000000 failures (max=5000)
2024-05-02T18:42:21.727113-04:00 abc123 pengine[5592]: warning: Scheduling Node abc123 for STONITH
2024-05-02T18:42:21.727535-04:00 abc123 pengine[5592]: notice: Stop of failed resource rsc_Db2_db2abc_ABC:1 is implicit after abc123 is fenced
2024-05-02T18:42:21.727827-04:00 abc123 pengine[5592]: notice: * Fence (reboot) abc123 'rsc_Db2_db2abc_ABC:1 failed there'
2024-05-02T18:42:21.727988-04:00 abc123 pengine[5592]: notice: * Move stonith-sbd ( abc123 -> abc456 )
2024-05-02T18:42:21.728150-04:00 abc123 pengine[5592]: notice: * Stop rsc_azure-events:1 ( abc123 ) due to node availability
2024-05-02T18:42:21.728313-04:00 abc123 pengine[5592]: notice: * Start rsc_ip_db2abc_ABC ( abc456 )
2024-05-02T18:42:21.728473-04:00 abc123 pengine[5592]: notice: * Start rsc_nc_db2abc_ABC ( abc456 )
2024-05-02T18:42:21.728634-04:00 abc123 pengine[5592]: notice: * Promote rsc_Db2_db2abc_ABC:0 ( Slave -> Master abc456 )
2024-05-02T18:42:21.728801-04:00 abc123 pengine[5592]: notice: * Stop rsc_Db2_db2abc_ABC:1 ( Master abc123 ) due to node availability
2024-05-02T18:43:08.034084-04:00 abc123 kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.12.14-122.194-default ...<snipped>
db2diag.log on standby database (server: abc456)
2024-05-02-18.42.09.192123-240 I66102483E507 LEVEL: Error
PID : 42123 TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
HOSTNAME: abc456
EDUID : 193 EDUNAME: db2hadrs.0.0 (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
MESSAGE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection refused"
DATA #1 : <preformatted>
Failed to connect to primary.
2024-05-02-18.43.39.150054-240 E66102991E652 LEVEL: Error
PID : 42123 TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
HOSTNAME: abc456
EDUID : 193 EDUNAME: db2hadrs.0.0 (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20200
MESSAGE : Did not receive anything through HADR connection for the duration of
HADR_TIMEOUT. Closing connection.
DATA #1 : String, 30 bytes
hdrCurrentTime/hdrLastRecvTime
DATA #2 : unsigned integer, 4 bytes
1714689819
DATA #3 : unsigned integer, 4 bytes
1714689758
2024-05-02-18.46.58.956798-240 E66104297E501 LEVEL: Event
PID : 42123 TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2DE1 NODE : 000 DB : ABC
APPHDL : 0-36414 APPID: *LOCAL.db2DE1.240502224658
AUTHID : DB2ABC HOSTNAME: abc456
EDUID : 248 EDUNAME: db2agent (ABC) 0
FUNCTION: DB2 UDB, base sys utilities, sqeDBMgr::StartUsingLocalDatabase, probe:13
START : Received TAKEOVER HADR command.
2024-05-02-18.46.59.030866-240 I66104799E427 LEVEL: Warning
PID : 42123 TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
HOSTNAME: abc456
EDUID : 193 EDUNAME: db2hadrs.0.0 (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20202
MESSAGE : Peer window ends. Peer window expired.
2024-05-02-18.46.59.119853-240 E66105227E462 LEVEL: Event
PID : 42123 TID : 139837151635200 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
HOSTNAME: abc456
EDUID : 193 EDUNAME: db2hadrs.0.0 (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE : HADR state set to HDR_S_REM_CATCHUP_PENDING (was HDR_S_DISCONN_PEER), connId=6
2024-05-02-18.46.59.292989-240 I66105690E857 LEVEL: Warning
PID : 42123 TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
APPHDL : 0-36414 APPID: *LOCAL.db2DE1.240502224658
AUTHID : DB2ABC HOSTNAME: abc456
EDUID : 248 EDUNAME: db2agent (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrValidateTakeoverRequest, probe:52050
MESSAGE : ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
"Forced takeover rejected as standby is in the wrong state or peer window has expired"
DATA #1 : <preformatted>
HADR standby not ready for takeover.
Current HADR state: HDR_S_REM_CATCHUP_PENDING
Light scan status : Inactive
Peer Window End : 1714690018
Current Time : 1714690019
2024-05-02-18.46.59.342053-240 I66106548E700 LEVEL: Error
PID : 42123 TID : 139352109737728 PROC : db2sysc 0
INSTANCE: db2ABC NODE : 000 DB : ABC
APPHDL : 0-36414 APPID: *LOCAL.db2ABC.240502224658
AUTHID : DB2ABC HOSTNAME: abc456
EDUID : 248 EDUNAME: db2agent (ABC) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrRequestTakeover, probe:39999
MESSAGE : ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
"Forced takeover rejected as standby is in the wrong state or peer window has expired"
DATA #1 : String, 36 bytes
HADR takeover pre-validation failed.
Read more...
Environment
- Linux server with Pacemaker cluster configured.
- IBM Db2 11.5.6 or higher
Product
Keywords
pacemaker, cluster, HADR, takeover, peer window, expired , KBA , BC-DB-DB6 , DB2 Universal Database for Unix / NT , BC-OP-LNX , Linux , BC-OP-LNX-SUSE , SUSE Linux , BC-OP-LNX-RH , Red Hat Linux , Bug Filed
About this page
This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).Search for additional results
Visit SAP Support Portal's SAP Notes and KBA Search.