Symptom
- A blocked savepoint is observed on the active primary site in a HANA System with an active Replication setup.
- Runtime dump shows the following thread callstack responsible for the hanging commit under "CommitTrans" thread method :
indexserver_hostname.port.rtedump.counter.savepoint_blocked.trc
. . . . .
[SAVEPOINT_SHAREDLOCK_OWNERS] Owners of shared SavepointLocks: (2025-02-17 21:24:14 486 Local)
13511271701[thr=9506]: SqlExecutor, TID: -1, UTID: -1, CID: 309423, LCID: 0, SEID: 844439956190165, parent: ---, SQLUserName: "SAPHANADB", AppUserName: "BATCH_USER", AppName: "ABAP:SID", ConnCtx: ---, StmtCtx: ---,
WorkloadCtx: ---, NUMA node: 0, type: "SqlExecutor", method: "CommitTrans", detail: "", command: "" at 0x00007ff376f90d89 in syscall+0x19 (libc.so.6)
[OK] in 00:00:00 002
--
13511271701[thr=9506]: SqlExecutor at
0: 0x00007ff376f90d89 in syscall+0x15 (libc.so.6)
1: 0x00007ff3789a9c62 in Synchronization::Mutex::lockInternal(Execution::Context&, unsigned long, unsigned long, bool)+0x320 at Basis/Synchronization/impl/LinuxFutexOps.hpp:46 (libhdbbasis.so)
2: 0x00007ff37ab05638 in DataAccess::ReplicationProtocolPrimaryHandler::trySendLogBufferDataIfNeeded(DataAccess::LogShippingRequest*)+0x224 at Basis/Synchronization/LockHandle.hpp:340 (libhdbdataaccess.so)
3: 0x00007ff37ab08ee3 in DataAccess::ReplicationProtocolPrimaryHandler::handleLogBufferSent()+0x20 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7164 (libhdbdataaccess.so)
4: 0x00007ff37aad7d9f in DataAccess::ReplicationProtocolHandler::streamDataSent(Stream::Channel&, void const*, unsigned long, bool)+0x6b at DataAccess/impl/DisasterRecoveryProtocol.cpp:1700 (libhdbdataaccess.so)
5: 0x00007ff37877e047 in Stream::NetworkChannel::triggerSend()+0x143 at Basis/IO/Stream/impl/NetworkChannel.cpp:3131 (libhdbbasis.so)
6: 0x00007ff378788482 in Stream::NetworkChannel::doSend(void const*, unsigned long)+0x5a0 at Basis/IO/Stream/impl/NetworkChannel.cpp:1446 (libhdbbasis.so)
7: 0x00007ff37876d799 in Stream::NetworkChannel::send(void const*, unsigned long)+0x5 at Basis/IO/Stream/impl/NetworkChannel.hpp:545 (libhdbbasis.so)
8: 0x00007ff37ab0179a in DataAccess::ReplicationProtocolPrimaryHandler::sendLogBufferToChannel(Stream::Channel&, DataAccess::LogShippingRequest*)+0x3a6 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7936 (libhdbdataaccess.so)
9: 0x00007ff37ab06617 in DataAccess::ReplicationProtocolPrimaryHandler::doSendNextLog(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x643 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7837 (libhdbdataaccess.so)
10: 0x00007ff37ab07252 in DataAccess::ReplicationProtocolPrimaryHandler::processNextLogBuffer(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x70 at DataAccess/impl/DisasterRecoveryProtocol.cpp:8254 (libhdbdataaccess.so)
11: 0x00007ff37ab07658 in DataAccess::ReplicationProtocolPrimaryHandler::sendLogQueueContent(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x2f4 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7640 (libhdbdataaccess.so)
12: 0x00007ff37aaad8d6 in DataAccess::DisasterRecoveryPrimaryHandlerImpl::sendLog(unsigned long, unsigned long, DataRecovery::PersLogBufferHeader&, DataRecovery::LogBuffer*, DataAccess::LogShippingCallback*, void*)+0x7a2 at DataAccess/impl/DisasterRecoveryPrimaryImpl.cpp:1800 (libhdbdataaccess.so)
13: 0x00007ff37ae8aae5 in DataRecovery::LogPartition::startIO(DataRecovery::LogBuffer&)+0xa01 at DataAccess/DataRecovery/impl/LogSegment.cpp:4502 (libhdbdataaccess.so)
14: 0x00007ff37ae5c5c4 in DataRecovery::LogBuffer::closeEntry(unsigned long, bool, DataAccess::RedoLogCallback*, void*)+0x3b0 at DataAccess/DataRecovery/impl/LogBuffer.cpp:682 (libhdbdataaccess.so)
15: 0x00007ff37ae6748b in DataRecovery::LoggerImpl::finishLog(DataAccess::RedoLogSpaceHandle&)+0xb7 at DataAccess/DataRecovery/impl/LoggerImpl.cpp:1264 (libhdbdataaccess.so)
16: 0x00007ff37ac3cf2a in DataAccess::RedoLogSpaceHandle::~RedoLogSpaceHandle()+0x6 at DataAccess/impl/RedoLogHandler.cpp:43 (libhdbdataaccess.so)
17: 0x00007ff37af95669 in TransactionManager::CommitLogRecord::logMasterCommit(DataAccess::PersistenceSession&, long, ltt::array_base<DataAccess::SlaveLogPositionInfo> const&, long, DataAccess::RedoLogCallback*, void*)+0x1e5 at DataAccess/TransactionManager/impl/CommitLogRecord.cpp:492 (libhdbdataaccess.so)
18: 0x00007ff3828cd975 in TransactionManager::PersistenceSessionContainer::commitWriteLog(TransactionManager::TransactionBase*)+0xe1 at TransactionManager/Persistence/PersistenceSessionContainer.cpp:810 (libhdbtmpersistence.so)
19: 0x00007ff37a6a5f08 in TransactionManager::Transaction::precommit()+0xb24 at TransactionManager/TransactionBase.hpp:366 (libhdbtransactionmanager.so)
20: 0x00007ff3992d583b in ptime::Transaction::precommit2(ptime::PostcommitHandler*, session::SessionStateCallBack*, bool)+0x307 at ptime/storage/tm/transmgmt.cc:2270 (libhdbrskernel.so)
21: 0x00007ff398f2ce58 in ptime::EAPIConnTrans::handleCommitForSession(Execution::Context&, unsigned long, int, bool, bool, bool, ptime::PostcommitHandler*, session::SessionStateCallBack*, ptime::Connection*)+0x464 at ptime/session/eapi/jdbc/EAPITransaction.cc:463 (libhdbrskernel.so)
22: 0x00007ff398f2d22b in ptime::Connection::handleCommitForSession(Execution::Context&, unsigned long, int, bool, bool, bool, ptime::PostcommitHandler*, session::SessionStateCallBack*)+0x27 at ptime/session/eapi/jdbc/Connection.cc:1361 (libhdbrskernel.so)
23: 0x00007ff395f74a83 in ptime::ActionExecutor::txCommit(Execution::Context&, ptime::SessionEvent*, ptime::Action&)+0x350 at ptime/session/sm_action_executor.cc:914 (libhdbsqlsession.so)
24: 0x00007ff395f73dc0 in ptime::ActionExecutor::executeAction(Execution::Context&, ptime::SessionEvent*, ptime::Action&)+0x160 at ptime/session/sm_action_executor.cc:742 (libhdbsqlsession.so)
25: 0x00007ff395ff70b5 in ptime::SessionHandler::executeAction(Execution::Context&, SessionLayer::SessionEvent*, ptime::Action&)+0x2d1 at ptime/session/sm_handler.cc:2573 (libhdbsqlsession.so)
26: 0x00007ff395ffd75c in .LTHUNK392.lto_priv.1+0x188 at ptime/session/sm_handler.cc:2301 (libhdbsqlsession.so)
27: 0x00007ff395ff89bd in ptime::SessionHandler::receiveMessage_(Execution::Context&, ptime::ThreadStatusCallbackBase&, SessionLayer::CommEvent*, bool)+0x8f9 at ptime/session/sm_handler.cc:2015 (libhdbsqlsession.so)
28: 0x00007ff395ff91f0 in ptime::SessionHandler::receiveMessage(Execution::Context&, SessionLayer::CommEvent*, bool)+0x30 at ptime/session/sm_handler.cc:1854 (libhdbsqlsession.so)
29: 0x00007ff39602ccdf in ptime::TcpReceiver::doWork(Execution::Context&, SessionLayer::CommMgr*)+0xdeb at ptime/session/tcp_receiver.cc:392 (libhdbsqlsession.so)
30: 0x00007ff396027e3a in ptime::TcpReceiver::runInner(void*)+0xd6 at ptime/session/tcp_receiver.cc:503 (libhdbsqlsession.so)
31: 0x00007ff3960280a4 in non-virtual thunk to ptime::TcpReceiver::run(void*)+0x30 at ptime/session/tcp_receiver.cc:480 (libhdbsqlsession.so)
32: 0x00007ff37ca2b656 in TrexThreads::PoolThread::run()+0xe42 at TrexThreads/PoolThread.cpp:437 (libhdbbasement.so)
33: 0x00007ff37ca2d491 in TrexThreads::PoolThread::run(Execution::ThreadRC&)+0x10 at TrexThreads/PoolThread.cpp:121 (libhdbbasement.so)
34: 0x00007ff37869d754 in Execution::Thread::staticMainImp(Execution::Thread*)+0x550 at Basis/Execution/impl/Thread.cpp:603 (libhdbbasis.so)
35: 0x00007ff3786a40df in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1eb at Basis/Execution/impl/ThreadInterposition.cpp:683 (libhdbbasis.so)
36: 0x00007ff377e4e6ea in start_thread+0xd8 (libpthread.so.0)
37: 0x00007ff376f9794f in __GI___clone+0x3b (libc.so.6)
- Broken replication channel is observed between the primary and secondary sites in the minutes prior to savepoint block :
Primary site
indexserver_hostname.port.xxx.trc
#
[122514]{-1}[-1/-1] 2025-02-17 21:21:31.982741 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(00671) : Closing connection to siteID 2. LogShipping was waiting for 31 seconds (logshipping_timeout = 30)! TimeoutInfo[count=1/1, BufferHeader[0x17e2ecb140-0x17e2ecb1c0,cs=0x3e6771e3(CRC32),V2,BV3,partition=0,pos=2725,segmentID=102591236096,filler=3392B,ts=2025-02-17 21:21:00.031951,PersGUID=27290ca9-0063-20230821-192918-3080000015,GUID=5a09497f-0063-20230926-062013-26808d3dc7,PrevGUID=5a09497f-0063-20230926-062013-26808d3ce2], sendSz=0, dur=31950748 us]
[122514]{-1}[-1/-1] 2025-02-17 21:21:31.982923 i sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(01119) : Replication session closed (site=2, remoteHost=xx.xx.xx.xx, invalidateSession=0, eventReported=0) <<, ptr=0x00007ff0e519e980
[122514]{-1}[-1/-1] 2025-02-17 21:21:31.983019 i EventHandler EventManagerImpl.cpp(00680) : New event reported: 'SystemReplicationEvent: site=2, Site 2: Log shipping timeout occurred'
Secondary siteindexserver_hostname.port.xxx.trc
#
[54079]{-1}[-1/-1] 2025-02-17 21:18:10.487955 i Logger RecoveryHandlerImpl.cpp(00760) : Redo done up to position: 0x17e2e9e400 and time: 2025-02-17 21:18:07.592785+09:00
[93419]{-1}[-1/-1] 2025-02-17 21:19:02.338139 i Logger LoggerImpl.cpp(03043) : primaryBackupFinished(0x17e2e5e3c0)
[102355]{-1}[-1/-1] 2025-02-17 21:21:58.256329 e sr_dataaccess DisasterRecoverySecondaryImpl.cpp(01145) : Trying to reconnect data handler to primary (1)
Read more...
Product
SAP HANA, platform edition 2.0
Keywords
KBA , HAN-DB , SAP HANA Database , HAN-DB-HA , SAP HANA High Availability (System Replication, DR, etc.) , Problem
About this page
This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).Search for additional results
Visit SAP Support Portal's SAP Notes and KBA Search.
SAP Knowledge Base Article - Preview