SAP Knowledge Base Article - Preview

3464122 - HANA Service crashes due to memory corruption on operating system layer

Symptom

  • Indexserver crashes with a signal 4 code message 
[CRASH_SHORTINFO]  Exception short info: (2024-04-22 01:15:27 871 Local)
SIGNAL 7 (SIGBUS) caught, thread: 44636415192[thr=144261]: JobWrk6463019, addr: 0x000076add6544000, range: 4096, time: 2024-04-22 01:15:27 871 Local
[OK]
--

[CRASH_EXTINFO]  Extended exception info: (2024-04-22 01:15:27 873 Local)
----> Dump of siginfo contents <----
  signal:      7(SIGBUS)
  code:        4(MCEERR_AR: accessing physical memory marked as broken by the hardware/OS)
  addr:        0x000076add6544000
  range:       4096
----> Dump of system info <----
  SID:         SID
  instance:    02
  OS:          Linux
  node:        Hostname
  OS release:  5.3.18-150200.24.129-default
  OS version:  #1 SMP Tue Sep 6 13:05:38 UTC 2022 (c4cee83)
  OS machid:   x86_64
  Max core:    0 KB
----> Dump of current processor info <----
  Current NUMA node index: 0
  Current NUMA node id: 0
  Current logical Processor index: 2
  Current physical Processor index: 2
[OK]
--

[CRASH_CONTEXT]  Context info: (2024-04-22 01:15:27 873 Local)
----> Crashing context information <----
  ContextStack at (0x00007fb5609a91a0)
  stack:  7fb5607ad000-7fb5609a9fff, size 2084864
  guard:  7fb56078d000-7fb5607acfff, size 131072
  alt  :  7fb5609aa000-7fb5609c9fff, size 131072
  ctx addr: 0x00007c60b5314000, ctx link: 0x00007c60b5314000, ctx owner: 0x00007c60b5314000
  ctx name: JobWrk6463019, ctx type: thread, ctx id: 44636415192
  job creation:     0: 0x00007fbfa5f07a31 in TRexAPI::CsTablePartMerge::mergeDeltaIndex2(TrexBase::IndexName const&, TrexBase::IndexName const&, TRexConfig::IndexHandle&, TRexCommonObjects::TRexApiError&, TrexStore::UdivMgrHandle<TrexStore::UdivList> const&, int&, unsigned long&, TRexUtils::BitVector const&, ltt::vector<int> const&, TRexUtils::BitVector&, TRexUtils::BitVector&, TRexUtils::BitVector&, long&, unsigned int, ltt_adp::basic_string<char, ltt::char_traits<char>, ltt::integral_constant<bool, true> >*, TRexAPI::TableMergeProgress&, unsigned long&, ltt::smart_ptr<UnifiedTable::MergeState, ltt::integral_constant<bool, false>, ltt::integral_constant<bool, false> >&, TRexAPI::AnnounceFinishHistoryAccessGuard&)+0x4b60 at ims_search_api/DeltaMerge/CsTableMerge.cpp:4322 (libhdbcsapi.so)
  job execution:    0: 0x00007fbfa5f07a9f in TRexAPI::CsTablePartMerge::mergeDeltaIndex2(TrexBase::IndexName const&, TrexBase::IndexName const&, TRexConfig::IndexHandle&, TRexCommonObjects::TRexApiError&, TrexStore::UdivMgrHandle<TrexStore::UdivList> const&, int&, unsigned long&, TRexUtils::BitVector const&, ltt::vector<int> const&, TRexUtils::BitVector&, TRexUtils::BitVector&, TRexUtils::BitVector&, long&, unsigned int, ltt_adp::basic_string<char, ltt::char_traits<char>, ltt::integral_constant<bool, true> >*, TRexAPI::TableMergeProgress&, unsigned long&, ltt::smart_ptr<UnifiedTable::MergeState, ltt::integral_constant<bool, false>, ltt::integral_constant<bool, false> >&, TRexAPI::AnnounceFinishHistoryAccessGuard&)+0x4bcb at ims_search_api/DeltaMerge/CsTableMerge.cpp:4325 (libhdbcsapi.so)
  ctx command text: 
  ctx update transactionID: 9563233304
  ctx transactionID       : 1738
  ctx connectionID        : -1
  ctx logical connectionID: 0
  ctx statementID         : 0
  ctx statement hash      : 
  ctx statementExecutionID: 844434401550471
  ctx sqlusername         : 
  ctx appusername         : 
[OK]
--

[CRASH_STACK]  Stacktrace of crash: (2024-04-22 01:15:27 913 Local)
----> Symbolic stack backtrace <----
   0: memcpy_impl + 0x79b
         SFrame: IP: 0x00005606bb6a1d7f (0x00005606bb6a15e4+0x79b) FP: 0x00007fb5609a6010 SP: 0x00007fb5609a6000 RP: 0x00007fbf95ec2489
         Params: 0x7ca8a12930f0, 0x76add6544610, 0x18, 0xffffffffffffff60, 0x5606bb6a1d7a, 0x7ca8a1292f0b
         Regs: rax=0xe, rdx=0x18, rcx=0xffffffffffffff60, rbx=0x3, rsi=0x76add6544610, rdi=0x7ca8a12930f0, rbp=0x7fb5609a6000, r8=0x5606bb6a1d7a, r9=0x7ca8a1292f0b, r10=0x1d6888f6, r11=0x5606bb6a15e0, r12=0x7fb5609a6370, r13=0x1fd, r14=0x3, r15=0x7b9d93094200
         Module: /hana/shared/H3P/exe/linuxx86_64/HDB_2.00.065.00.1665753120_6c34d45b0567c95dfd8fa5f0310fa7b91be152f1/hdbindexserver
         NOTE: Missing frame information, following frames may be invalid (fallback unwinder)
     -----------------------------------------
 
 ....
 

   

 

  • indexserver trace
[44530]{-1}[-1/-1] 2024-04-22 01:14:46.483656 i Savepoint        SavepointImpl.cpp(03108) : Savepoint current savepoint version: 581091, restart redo log position: 0x2c168b1a849, next savepoint version: 581092, last snapshot SP version: 581088
[144261]{-1}[1738/9563233304] 2024-04-22 01:15:27.560505 e Basis            FaultProtectionImpl.cpp(01615) : SIGNAL 7 (SIGBUS) caught, thread: 44636415192[thr=144261]: JobWrk6463019, addr: 0x000076add6544000, range: 4096, time: 2024-04-22 01:15:27 560 Local
Instance SID/02, OS Linux Hostname 5.3.18-150200.24.129-default #1 SMP Tue Sep 6 13:05:38 UTC 2022 (c4cee83) x86_64

...

[144261]{-1}[1738/9563233304] 2024-04-22 01:15:27.560663 i Basis            Helper.cpp(00101) : Using 'x64_64 ABI unwind' for stack tracing
NOTE: full crash dump will be written to /usr/sap/SID/HDB02/Hostname/trace/DB_XXX/indexserver_Hostname.30240.crashdump.20240422-011527.0017834.trc
Call stack of crashing context:
   0: 0x00005606bb6a1d7f in memcpy_impl+0x79b (hdbindexserver)
   1: 0x00007fbf95ec2489 in AttributeEngine::RoDictDefaultPages<TRexUtils::JustSensitive>::getFirst(AttributeEngine::RoDictIterator&, unsigned int) const+0x7b5 at AttributeEngine/Main/Dictionary/RoDictDefaultPages.h:250 (libhdbcs.so)
   2: 0x00007fbf95763449 in AttributeEngine::DeltaMerge::MergeValueIdsStep<TrexTypes::RawAttributeValue, AttributeEngine::ValueDict<TrexTypes::RawAttributeValue>, AttributeEngine::BTreeAttribute<TrexTypes::RawAttributeValue>, AttributeEngine::DeltaMerge::DictStatUpdater<AttributeEngine::ValueDict<TrexTypes::RawAttributeValue>, TrexTypes::RawAttributeValue, AttributeEngine::BTreeAttribute<TrexTypes::RawAttributeValue>, AttributeEngine::DictInfo, void> >::doStepImpl()+0xf45 at AttributeEngine/Main/Dictionary/RoDictUnified.h:622 (libhdbcs.so)
   3: 0x00007fbf95771cc5 in AttributeEngine::DeltaMerge::SingleMergePipeline<TrexTypes::RawAttributeValue, AttributeEngine::ValueDict<TrexTypes::RawAttributeValue>, AttributeEngine::BTreeAttribute<TrexTypes::RawAttributeValue> >::merge(AttributeEngine::BTreeAttribute<TrexTypes::RawAttributeValue> const&, AttributeEngine::DeltaMerge::AttributeMergeData&, AttributeEngine::ValueDict<TrexTypes::RawAttributeValue>&, AttributeEngine::DeltaMerge::PrepareNewDictCallback<AttributeEngine::ValueDict<TrexTypes::RawAttributeValue> >&, AttributeEngine::DeltaMerge::SingleMergeIndexVector&, int&)+0x2a1 at AttributeEngine/DeltaMerge/SingleMergePipeline.cpp:182 (libhdbcs.so)
   4: 0x00007fbf96ed4ce3 in AttributeEngine::SingleAttribute<TrexTypes::RawAttributeValue, AttributeEngine::ValueDict<TrexTypes::RawAttributeValue> >::mergeOldIntoNew(AttributeEngine::AttributeValueContainer*, AttributeEngine::DeltaMerge::AttributeMergeData&, AttributeEngine::CAN_MERGE)+0x270 at AttributeEngine/DeltaMerge/SingleMergePipeline.cpp:132 (libhdbcs.so)
   5: 0x00007fbf9b13b144 in AttributeEngine::MemoryAvc2::prepareDeltaMerge(AttributeEngine::AttributeValueContainer*, AttributeEngine::DeltaMerge::AttributeMergeData&, bool)+0xe0 at AttributeEngine/AttributeValueContainer.cpp:3154 (libhdbcs.so)
   6: 0x00007fbf9b0f47d6 in AttributeEngine::AttributeApi::prepareDeltaMerge(TrexBase::IndexName const&, AttributeEngine::DeltaMerge::AttributeMergeData&, bool)+0x642 at AttributeEngine/AttributeApi.cpp:1606 (libhdbcs.so)
   7: 0x00007fbfa5eec047 in TRexAPI::MergeAttributeJob::doMerge(TRexAPI::MergeAttributeInfo&, TRexAPI::DeltaMergeState&, bool)+0x53 at ims_search_api/DeltaMerge/MergeAttributeJob.cpp:163 (libhdbcsapi.so)
   8: 0x00007fbfa5ef222c in TRexAPI::MergeAttributeJob::run(Execution::Context&, Execution::JobObject&)+0x208 at ims_search_api/DeltaMerge/MergeAttributeJob.cpp:270 (libhdbcsapi.so)
   9: 0x00007fbf860a1080 in Execution::JobObjectImpl::run(Execution::JobWorker*)+0x15b0 at Basis/Execution/impl/JobExecutionLog.hpp:155 (libhdbbasis.so)
  10: 0x00007fbf860ae811 in Execution::JobWorker::runJob(ltt::smartptr_handle<Execution::JobObjectForHandle>&)+0x710 at Basis/Execution/impl/JobExecutorThreads.cpp:366 (libhdbbasis.so)
  11: 0x00007fbf860b04ab in Execution::JobWorker::run(Execution::ThreadRC&)+0x877 at Basis/Execution/impl/JobExecutorThreads.cpp:1354 (libhdbbasis.so)
  12: 0x00007fbf861009de in Execution::Thread::staticMainImp(Execution::Thread*)+0x53a at Basis/Execution/impl/Thread.cpp:574 (libhdbbasis.so)
  13: 0x00007fbf86108c15 in Execution::pthreadFunctionWrapper(Execution::Thread*)+0x1c1 at Basis/Execution/impl/ThreadInterposition.cpp:703 (libhdbbasis.so)
  14: 0x0000000000000000 in <no symbol>+0x0 (<unknown>)
[144261]{-1}[1738/9563233304] 2024-04-22 01:16:04.734247 e Basis            FaultProtectionImpl.cpp(01061) : MCE: not for us to handle, let's hope the user installed a handler with DIAG_SEH_START. Terminating with ContinueSearch.

 

  • var/log/messages shows the actual crash here, referencing the same JobWrk6463019 thread that dumped in HANA : (OS sent message Exactly 3 hours later to the second @ 2024-04-22T04:15:27, so could be a timezone lag between the OS and HDB timezones.)
2024-04-22T04:15:27.524646+00:00 Hostname kernel: [11607325.863014] mce: Uncorrected hardware memory error in user-access at 26819f09000
2024-04-22T04:15:27.524673+00:00 Hostname kernel: [11607325.863934] mce: [Hardware Error]: Machine check events logged
2024-04-22T04:15:27.527929+00:00 Hostname kernel: [11607325.867046] Memory failure: 0x26819f09: Sending SIGBUS to hdbindexserver:17834 due to hardware memory corruption
2024-04-22T04:15:27.527935+00:00 Hostname kernel: [11607325.867052] Memory failure: 0x26819f09: recovery action for dirty LRU page: Recovered

2024-04-22T04:16:04.737972+00:00 Hostname kernel: [11607363.074869] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544610
2024-04-22T04:16:04.737985+00:00 Hostname kernel: [11607363.075368] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544590
2024-04-22T04:16:04.737993+00:00 Hostname kernel: [11607363.075386] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544591
2024-04-22T04:16:04.737994+00:00 Hostname kernel: [11607363.075402] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544592


2024-04-22T04:16:04.857910+00:00 Hostname kernel: [11607363.194882] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544ffc
2024-04-22T04:16:04.857913+00:00 Hostname kernel: [11607363.194899] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544ffd
2024-04-22T04:16:04.857915+00:00 Hostname kernel: [11607363.194916] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544ffe
2024-04-22T04:16:04.857915+00:00 Hostname kernel: [11607363.194934] MCE: Killing JobWrk6463019:144261 due to hardware memory corruption fault at 76add6544fff



Read more...

Environment

SAP HANA Platform Edition 2.0

Product

SAP HANA, platform edition 2.0

Keywords

KBA , HAN-DB , SAP HANA Database , BC-OP-LNX , Linux , Problem

About this page

This is a preview of a SAP Knowledge Base Article. Click more to access the full version on SAP for Me (Login required).

Search for additional results

Visit SAP Support Portal's SAP Notes and KBA Search.