SAP Knowledge Base Article - Preview

3627354 - Hanging Commit waiting for the HANA log buffer blocks the savepoint

Symptom

  • A blocked savepoint is observed on the active primary site in a HANA System with an active Replication setup.
  • Runtime dump shows the following thread callstack responsible for the hanging commit under "CommitTrans" thread method :

    indexserver_hostname.port.rtedump.counter.savepoint_blocked.trc

     . . . . . 
    [SAVEPOINT_SHAREDLOCK_OWNERS]  Owners of shared SavepointLocks: (2025-02-17 21:24:14 486 Local)
    13511271701[thr=9506]: SqlExecutor, TID: -1, UTID: -1, CID: 309423, LCID: 0, SEID: 844439956190165, parent: ---, SQLUserName: "SAPHANADB", AppUserName: "BATCH_USER", AppName: "ABAP:SID", ConnCtx: ---, StmtCtx: ---,
    WorkloadCtx: ---, NUMA node: 0, type: "SqlExecutor", method: "CommitTrans", detail: "", command: "" at 0x00007ff376f90d89 in syscall+0x19 (libc.so.6)
    [OK] in 00:00:00 002
    --


    13511271701[thr=9506]: SqlExecutor at
       0: 0x00007ff376f90d89 in syscall+0x15 (libc.so.6)
       1: 0x00007ff3789a9c62 in Synchronization::Mutex::lockInternal(Execution::Context&, unsigned long, unsigned long, bool)+0x320 at Basis/Synchronization/impl/LinuxFutexOps.hpp:46 (libhdbbasis.so)
       2: 0x00007ff37ab05638 in DataAccess::ReplicationProtocolPrimaryHandler::trySendLogBufferDataIfNeeded(DataAccess::LogShippingRequest*)+0x224 at Basis/Synchronization/LockHandle.hpp:340 (libhdbdataaccess.so)
       3: 0x00007ff37ab08ee3 in DataAccess::ReplicationProtocolPrimaryHandler::handleLogBufferSent()+0x20 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7164 (libhdbdataaccess.so)
       4: 0x00007ff37aad7d9f in DataAccess::ReplicationProtocolHandler::streamDataSent(Stream::Channel&, void const*, unsigned long, bool)+0x6b at DataAccess/impl/DisasterRecoveryProtocol.cpp:1700 (libhdbdataaccess.so)
       5: 0x00007ff37877e047 in Stream::NetworkChannel::triggerSend()+0x143 at Basis/IO/Stream/impl/NetworkChannel.cpp:3131 (libhdbbasis.so)
       6: 0x00007ff378788482 in Stream::NetworkChannel::doSend(void const*, unsigned long)+0x5a0 at Basis/IO/Stream/impl/NetworkChannel.cpp:1446 (libhdbbasis.so)
       7: 0x00007ff37876d799 in Stream::NetworkChannel::send(void const*, unsigned long)+0x5 at Basis/IO/Stream/impl/NetworkChannel.hpp:545 (libhdbbasis.so)
       8: 0x00007ff37ab0179a in DataAccess::ReplicationProtocolPrimaryHandler::sendLogBufferToChannel(Stream::Channel&, DataAccess::LogShippingRequest*)+0x3a6 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7936 (libhdbdataaccess.so)
       9: 0x00007ff37ab06617 in DataAccess::ReplicationProtocolPrimaryHandler::doSendNextLog(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x643 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7837 (libhdbdataaccess.so)
      10: 0x00007ff37ab07252 in DataAccess::ReplicationProtocolPrimaryHandler::processNextLogBuffer(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x70 at DataAccess/impl/DisasterRecoveryProtocol.cpp:8254 (libhdbdataaccess.so)
      11: 0x00007ff37ab07658 in DataAccess::ReplicationProtocolPrimaryHandler::sendLogQueueContent(Synchronization::LockHandle<Synchronization::Mutex, false>)+0x2f4 at DataAccess/impl/DisasterRecoveryProtocol.cpp:7640 (libhdbdataaccess.so)
      12: 0x00007ff37aaad8d6 in DataAccess::DisasterRecoveryPrimaryHandlerImpl::sendLog(unsigned long, unsigned long, DataRecovery::PersLogBufferHeader&, DataRecovery::LogBuffer*, DataAccess::LogShippingCallback*, void*)+0x7a2 at DataAccess/impl/DisasterRecoveryPrimaryImpl.cpp:1800 (libhdbdataaccess.so)
      13: 0x00007ff37ae8aae5 in DataRecovery::LogPartition::startIO(DataRecovery::LogBuffer&)+0xa01 at DataAccess/DataRecovery/impl/LogSegment.cpp:4502 (libhdbdataaccess.so)
      14: 0x00007ff37ae5c5c4 in DataRecovery::LogBuffer::closeEntry(unsigned long, bool, DataAccess::RedoLogCallback*, void*)+0x3b0 at DataAccess/DataRecovery/impl/LogBuffer.cpp:682 (libhdbdataaccess.so)
      15: 0x00007ff37ae6748b in DataRecovery::LoggerImpl::finishLog(DataAccess::RedoLogSpaceHandle&)+0xb7 at DataAccess/DataRecovery/impl/LoggerImpl.cpp:1264 (libhdbdataaccess.so)
      16: 0x00007ff37ac3cf2a in DataAccess::RedoLogSpaceHandle::~RedoLogSpaceHandle()+0x6 at DataAccess/impl/RedoLogHandler.cpp:43 (libhdbdataaccess.so)
      17: 0x00007ff37af95669 in TransactionManager::CommitLogRecord::logMasterCommit(DataAccess::PersistenceSession&, long, ltt::array_base<DataAccess::SlaveLogPositionInfo> const&, long, DataAccess::RedoLogCallback*, void*)+0x1e5 at DataAccess/TransactionManager/impl/CommitLogRecord.cpp:492 (libhdbdataaccess.so)
      18: 0x00007ff3828cd975 in TransactionManager::PersistenceSessionContainer::commitWriteLog(TransactionManager::TransactionBase*)+0xe1 at TransactionManager/Persistence/PersistenceSessionContainer.cpp:810 (libhdbtmpersistence.so)
      19: 0x00007ff37a6a5f08 in TransactionManager::Transaction::precommit()+0xb24 at TransactionManager/TransactionBase.hpp:366 (libhdbtransactionmanager.so)
      20: 0x00007ff3992d583b in ptime::Transaction::precommit2(ptime::PostcommitHandler*, session::SessionStateCallBack*, bool)+0x307 at ptime/storage/tm/transmgmt.cc:2270 (libhdbrskernel.so)
      21: 0x00007ff398f2ce58 in ptime::EAPIConnTrans::handleCommitForSession(Execution::Context&, unsigned long, int, bool, bool, bool, ptime::PostcommitHandler*, session::SessionStateCallBack*, ptime::Connection*)+0x464 at ptime/session/eapi/jdbc/EAPITransaction.cc:463 (libhdbrskernel.so)
      22: 0x00007ff398f2d22b in ptime::Connection::handleCommitForSession(Execution::Context&, unsigned long, int, bool, bool, bool, ptime::PostcommitHandler*, session::SessionStateCallBack*)+0x27 at ptime/session/eapi/jdbc/Connection.cc:1361 (libhdbrskernel.so)
      23: 0x00007ff395f74a83 in ptime::ActionExecutor::txCommit(Execution::Context&, ptime::SessionEvent*, ptime::Action&)+0x350 at ptime/session/sm_action_executor.cc:914 (libhdbsqlsession.so)
      24: 0x00007ff395f73dc0 in ptime::ActionExecutor::executeAction(Execution::Context&, ptime::SessionEvent*, ptime::Action&)+0x160 at ptime/session/sm_action_executor.cc:742 (libhdbsqlsession.so)
      25: 0x00007ff395ff70b5 in ptime::SessionHandler::executeAction(Execution::Context&, SessionLayer::SessionEvent*, ptime::Action&)+0x2d1 at ptime/session/sm_handler.cc:2573 (libhdbsqlsession.so)
      26: 0x00007ff395ffd75c in .LTHUNK392.lto_priv.1+0x188 at ptime/session/sm_handler.cc:2301 (libhdbsqlsession.so)
      27: 0x00007ff395ff89bd in ptime::SessionHandler::receiveMessage_(Execution::Context&, ptime::ThreadStatusCallbackBase&, SessionLayer::CommEvent*, bool)+0x8f9 at ptime/session/sm_handler.cc:2015 (libhdbsqlsession.so)
      28: 0x00007ff395ff91f0 in ptime::SessionHandler::receiveMessage(Execution::Context&, SessionLayer::CommEvent*, bool)+0x30 at ptime/session/sm_handler.cc:1854 (libhdbsqlsession.so)
      29: 0x00007ff39602ccdf in ptime::TcpReceiver::doWork(Execution::Context&, SessionLayer::CommMgr*)+0xdeb at ptime/session/tcp_receiver.cc:392 (libhdbsqlsession.so)
      30: 0x00007ff396027e3a in ptime::TcpReceiver::runInner(void*)+0xd6 at ptime/session/tcp_receiver.cc:503 (libhdbsqlsession.so)
      31: 0x00007ff3960280a4 in non-virtual thunk to ptime::TcpReceiver::run(void*)+0x30 at ptime/session/tcp_receiver.cc:480 (libhdbsqlsession.so)
      32: 0x00007ff37ca2b656 in TrexThreads::PoolThread::run()+0xe42 at TrexThreads/PoolThread.cpp:437 (libhdbbasement.so)
      33: 0x00007ff37ca2d491 in TrexThreads::PoolThread::run(Execution::ThreadRC&)+0x10 at TrexThreads/PoolThread.cpp:121 (libhdbbasement.so)
      34: 0x00007ff37869d754 in Execution::Thread::staticMainImp(Execution::Thread*)+0x550 at Basis/Execution/impl/Thread.cpp:603 (libhdbbasis.so)
      35: 0x00007ff3786a40df in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1eb at Basis/Execution/impl/ThreadInterposition.cpp:683 (libhdbbasis.so)
      36: 0x00007ff377e4e6ea in start_thread+0xd8 (libpthread.so.0)
      37: 0x00007ff376f9794f in __GI___clone+0x3b (libc.so.6)

 

  • Broken replication channel is observed between the primary and secondary sites in the minutes prior to savepoint block :

    Primary site 

    indexserver_hostname.port.xxx.trc
    #


    [122514]{-1}[-1/-1] 2025-02-17 21:21:31.982741 e sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(00671) : Closing connection to siteID 2. LogShipping was waiting for 31 seconds (logshipping_timeout = 30)! TimeoutInfo[count=1/1, BufferHeader[0x17e2ecb140-0x17e2ecb1c0,cs=0x3e6771e3(CRC32),V2,BV3,partition=0,pos=2725,segmentID=102591236096,filler=3392B,ts=2025-02-17 21:21:00.031951,PersGUID=27290ca9-0063-20230821-192918-3080000015,GUID=5a09497f-0063-20230926-062013-26808d3dc7,PrevGUID=5a09497f-0063-20230926-062013-26808d3ce2], sendSz=0, dur=31950748 us]
    [122514]{-1}[-1/-1] 2025-02-17 21:21:31.982923 i sr_dataaccess    DisasterRecoveryPrimaryImpl.cpp(01119) : Replication session closed (site=2, remoteHost=xx.xx.xx.xx, invalidateSession=0, eventReported=0) <<, ptr=0x00007ff0e519e980
    [122514]{-1}[-1/-1] 2025-02-17 21:21:31.983019 i EventHandler     EventManagerImpl.cpp(00680) : New event reported: 'SystemReplicationEvent: site=2, Site 2: Log shipping timeout occurred'


    Secondary site

    indexserver_hostname.port.xxx.trc
    #


    [54079]{-1}[-1/-1] 2025-02-17 21:18:10.487955 i Logger           RecoveryHandlerImpl.cpp(00760) : Redo done up to position: 0x17e2e9e400 and time: 2025-02-17 21:18:07.592785+09:00
    [93419]{-1}[-1/-1] 2025-02-17 21:19:02.338139 i Logger           LoggerImpl.cpp(03043) : primaryBackupFinished(0x17e2e5e3c0)
    [102355]{-1}[-1/-1] 2025-02-17 21:21:58.256329 e sr_dataaccess    DisasterRecoverySecondaryImpl.cpp(01145) : Trying to reconnect data handler to primary (1)


Read more...

Product

SAP HANA, platform edition 2.0

Keywords

KBA , HAN-DB , SAP HANA Database , HAN-DB-HA , SAP HANA High Availability (System Replication, DR, etc.) , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.